Delta Lake Blogs
by Bo Gao, Matthew Powers
This blog post explains how to write a Kafka stream to a Delta table with Spark Structured Streaming.
by Ion Koutsouris
This post explains the new features in the Python deltalake 0.12.0 release
by Carly Akerly
This post describes the exiting features in the Delta Lake 3.0.0 release
This post demonstrates how to create PyTorch DataLoaders using Delta tables as data sources for training deep learning models.
This blog explains the Delta Lake transaction log protocol and its various implementation.
by Shingo OKAWA
This post explains Kotosiro Delta Sharing server basic instructions
by Matthew Powers, Scott Sandre
This post explains Delta Lake performance optimizations that make some aggregations execute quicker
by Nick Karpov
How to use deltalake in AWS Lambda with AWS SDK for pandas
This post explains how to create and append to Delta Lake tables with pandas
by Will Jones, Matthew Powers
This post explains the new features in the deltalake 0.7.0 release
by Matthew Powers, Ryan Zhu
This post shows add partitions and remove partitions from Delta Lake tables.
This blog post explains how to remove files marked for deletion from storage with the Delta Lake Vacuum command.
Building a more efficient data infrastructure for machine learning with Open Source using Delta Lake, Amazon SageMaker, and EMR
by Vedant Jain, Denny Lee
In this blog, we’ll explore how connecting Delta Lake, Amazon SageMaker Studio, and Amazon EMR can simplify the end-to-end workflow required to support data engineering and data science projects.
by Li Yu, Mubashir Kazia, Jon D. Ceanfaglione, Prabha Rajendran, Purushotam Shrestha, Shawn A. Benjamin
This post shows how government agencies are sharing data with Delta Sharing.
This post shows you why PySpark overwrite operations are safer with Delta Lake and how the different save mode operations are implemented under the hood.
This post shows you how to version your pandas datasets and the benefits you'll enjoy with versioned data.
by Will Girten
We are excited to announce the release of Delta Sharing 0.5.0.
This post shows you how to rollback Delta Lake tables to previous versions with restore.
by Robert Thompson, Geoff Freeman
In this post, we will discuss the how and why we migrated from databases and data lakes to a data lakehouse on Delta Lake. Our lakehouse architecture allows reading and writing of data without blocking and scales out linearly....
by Krzysztof Chmielewski, Scott Sandre, Denny Lee
We are excited to announce the release of Delta Connectors 0.5.0, which introduces the new Flink/Delta Source Connector on Apache Flink™ 1.13 that can read directly from Delta tables using Flink’s DataStream API.
by Tathagata Das, Denny Lee
We are happy to announce the release of the Delta Lake 2.0 on Apache Spark™ 3.2! The significance of Delta Lake 2.0 is not just a number - though it is timed quite nicely with Delta Lake’s 3rd birthday....
by Scott Sandre, Denny Lee, Mariusz Kryński (Samba TV)
While Delta Lake has supported concurrent reads from multiple clusters since its inception, there were limitations for multi-cluster writes specifically to Amazon S3. Note, this was not a limitation for Azure ADLSgen2 nor Google GCS, as S3 currently lacks...
by Venki Korukanti, Scott Sandre, Tathagata Das, Allison Portis, Denny Lee, Vini Jaiswal
Introducing performance optimizations that will supercharge your data pipelines at any scale.
by Fabian Paul, Pawel Kubit, Scott Sandre, Tathagata Das, Denny Lee
Learn more about how you can write from Apache Flink to Delta Lake about the latest release of the open-source project Delta Sharing and how it enables sharing on Google Cloud Storage, among other enhancements.
by Allison Portis
We are excited to announce the release of Delta Connectors 0.3.0.
by Scott Sandre
We are excited to announce the release of Delta Lake 1.1.0.
by Lin Zhou
We are excited to announce the release of Delta Sharing 0.3.0.
by Tathagata Das
We are excited to announce the release of Delta Lake 1.0.0 on Apache Spark 3.1.
by Denny Lee
We are happy to announce the Salesforce Engineering Delta Lake Tech Talk Series for March and April 2021.
by Denny Lee
At Salesforce, we maintain a platform to capture customer activity — various kinds of sales events such as emails, meetings, and videos. These events are either consumed by downstream products in real time or stored in our data lake, which...
by Denny Lee
We have a couple of exciting call outs this week!
by Denny Lee
We're really excited for the numerous Delta Lake training and conference sessions that will be showcased throughout Spark+AI Summit NA 2020.
by Denny Lee
This edition of the Delta Lake Newsletter, find out more about the latest and upcoming webinars, meetups, and publications. For this edition, we will also focus on the many sessions at Spark+AI Summit EU 2019 in Amsterdam.