Delta Lake Blogs
Building the Medallion Architecture with Delta Lake
by Matthew Powers,
Using the Medallion Architecture with Delta Lake
Delta Lake 4.0 Preview
by Tathagata “TD” Das, Allison Portis, Scott Sandre, Susan Pierce, Carly Akerly,
We are pleased to announce the preview release of Delta Lake 4.0 (release notes) on Apache Spark™ 4.0 Preview.
Unlocking the Power of Delta Lake 3.0+: Introducing the New StarTree Connector with Delta Kernel
by Vibhuti Bhushan,
In the rapidly evolving landscape of data management, staying up-to-date with the latest advancements is key to maintaining a competitive edge.
Unifying the open table formats with Delta Lake Universal Format (UniForm) and Apache XTable
by Jonathan Brito, Kyle Weller,
Delta Lake Universal Format (UniForm) enables Delta tables to be read by any engine that supports Delta, Iceberg, and now, through code contributed by Apache XTable, Hudi.
Delta Kernel - Building Delta Lake connectors, made simple
by Nick Lanham, Tathagata “TD” Das,
Delta Lake recently hit an impressive milestone of being downloaded more than 20M times per month!
Query Delta Lake natively using BigQuery
by Gaurav Saxena, Justin Levandoski,
Users working with Delta Lake tables can now easily integrate their workloads with BigQuery, ensuring secure and more managed interoperability.
A Guide to Delta Lake Sessions at Data+AI Summit
by Carly Akerly,
The Data+AI Summit returns to San Francisco from June 10-13, 2024.
Use Delta Lake from Jupyter Notebook
by Avril Aysha,
Learn how to use Delta Lake from a Jupyter Notebook
Scaling Graph Data Processing with Delta Lake: Lessons from a Real-World Use Case
by Yeshwanth Vijayakumar, Director of Engineering, Adobe,
The Adobe Experience Platform includes a set of analytics, social, advertising, media optimization, targeting, Web experience management, journey orchestration, and content management products.
Delta Lake vs Data Lake - What's the Difference?
by Avril Aysha,
Understand the difference between Delta Lake and a data lake
Delta Lake 3.2
by Carly Akerly,
We are pleased to announce the release of Delta Lake 3.2 (release notes) on Apache Spark 3.5, with features that improve the performance and interoperability of Delta Lake.
Efficient Delta Vacuum with File Inventory
by Arun Ravi M V (Grab),
Today, Delta Lake is rapidly making its mark as a highly popular hybrid data format, earning widespread adoption across various organizations.
Rivian expands the Delta Lake ecosystem with Delta-Go
by Chelsea Jones, Staff Data Engineer, Rivian; Rahul Madnawat, Software Engineer II, Rivian; Jason Shiverick, Director of AI Platforms, Rivian,
Real-time data ingestion for high-volume transactions, now available in open source
Pros and cons of Hive-style partitioning
by Matthew Powers, Martin Bode,
This post discusses the pros and cons of Hive-style partioning.
Structured Spark Streaming with Delta Lake: A Comprehensive Guide
by Delta Lake,
The webinar demonstrates how to embrace structured streaming seamlessly from data emission to your final Delta table destination.
High-Performance Querying on Massive Delta Lake Tables with Daft
by Clark Zinzow, Jay Chia,
This post introduces the distributed + parallel Delta Lake reader in Daft.
Delta Lake - State of the Project - Part 2
by Tathagata "TD" Das, Susan Pierce, Carly Akerly,
Delta Lake, a project hosted under The Linux Foundation, has been growing by leaps and bounds. To celebrate the achievements of the project, we’re publishing a 2-part series on Delta Lake.
Delta Lake Announces Pandas Enhancement: Real Pandas to Optimize Data Lakehouse Performance
by Carly Akerly,
The Delta Lake project is thrilled to announce its latest and most exciting collaboration with the Pandas community!
Delta Lake - State of the Project - Part 1
by Tathagata "TD" Das, Susan Pierce, Carly Akerly,
Delta Lake, a project hosted under The Linux Foundation, has been growing by leaps and bounds. To celebrate the achievements of the project, we’re publishing a 2-part series on Delta Lake.
Delta Lake 3.1.0
by Carly Akerly,
This post describes the exiting features in the Delta Lake 3.1.0 release
Delta Lake replaceWhere
by Matthew Powers,
Selectively overriding rows or partitions of a Delta Lake table with replaceWhere.
Delta Lake Performance
by Joe Harris,
This post shows explains why Delta Lake is fast and describes improvements to Delta Lake performance over time.
Writing a Kafka Stream to Delta Lake with Spark Structured Streaming
by Bo Gao, Matthew Powers,
This blog post explains how to write a Kafka stream to a Delta table with Spark Structured Streaming.
Using Delta Lake with AWS Glue
by Keerthi Josyula, Matthew Powers,
This post shows how to register Delta tables in the AWS Glue Data Catalog with the AWS Glue Crawler.
New features in the Python deltalake 0.12.0 release
by Ion Koutsouris,
This post explains the new features in the Python deltalake 0.12.0 release
Delta Lake 3.0.0
by Carly Akerly,
This post describes the exiting features in the Delta Lake 3.0.0 release
Delta Lake vs. Parquet Comparison
by Matthew Powers,
This post compares the stengths and weaknesses of Delta Lake vs Parquet.
Delta Lake vs. ORC Comparison
by Avril Aysha,
This post compares the stengths and weaknesses of Delta Lake vs ORC.
Unlock Delta Lakes for PyTorch Training with DeltaTorch
by Daniel Liden, Michael Shtelma,
This post demonstrates how to create PyTorch DataLoaders using Delta tables as data sources for training deep learning models.
Introducing Delta Lake Table Features
by Nick Karpov,
This introduces Delta Lake Table Features, a discrete feature-based compatibility scheme that replaces the traditional integer protocol versioning for Delta Lake tables and clients.
Delta Lake Change Data Feed (CDF)
by Nick Karpov, Matthew Powers,
This blog shows how to enable and use the Delta Lake Change Data Feed.
Delta Lake’s transaction log protocol and its implementations
by Matthew Powers,
This blog explains the Delta Lake transaction log protocol and its various implementation.
Delta Lake Deletion Vectors
by Nick Karpov,
This blog introduces the new Deletion Vectors table feature for Delta Lake tables, and explains how Deletion Vectors speed up operations that modify existing data in your lakehouse.
Using Ibis with PySpark on Delta Lake tables
by Marlene Mhangami, Matthew Powers,
This post explains how to use Ibis to query Delta tables with PySpark
Delta Lake Z Order
by Matthew Powers,
This post explains how to use Delta Lake Z Order to make your queries run faster
Delta Lake 2.3.0 Released
by Allison Portis, Matthew Powers,
This post explains some of the key features in the Delta Lake 2.3.0 release
Open source self-hosted Delta Sharing server
by Shingo OKAWA,
This post explains Kotosiro Delta Sharing server basic instructions
How Delta Lake uses metadata to make certain aggregations much faster
by Matthew Powers, Scott Sandre,
This post explains Delta Lake performance optimizations that make some aggregations execute quicker
How to use Delta Lake generated columns
by Matthew Powers,
How to create Delta Lake tables with generated columns and the benefits of this feature
Introducing Support for Delta Lake Tables in AWS Lambda
by Nick Karpov,
How to use deltalake in AWS Lambda with AWS SDK for pandas
How to create and append to Delta Lake tables with pandas
by Matthew Powers,
This post explains how to create and append to Delta Lake tables with pandas
Running ML Workflows with Delta Lake and Ray
by Jim Hibbard,
This post explains how you can read Delta Lake with the Ray compute framework
How to Convert from CSV to Delta Lake
by Matthew Powers,
This post explains how to convert from a CSV data lake to Delta Lake, which offers much better features.
Getting started contributing to Delta Lake Spark
by Nick Karpov,
This post explains the full development loop with the Delta Lake Spark connector. You'll learn how to retrieve and navigate the codebase, make changes, and package and debug custom builds.
New features in the Python deltalake 0.7.0 release of delta-rs
by Will Jones, Matthew Powers,
This post explains the new features in the deltalake 0.7.0 release
Delta Lake Schema Evolution
by Matthew Powers,
This post shows how to enable schema evolution in Delta tables and when this is a good option.
Delta Lake Time Travel
by Matthew Powers,
This post shows how to time travel between different versions of a Delta table.
Delta Lake Small File Compaction with OPTIMIZE
by Matthew Powers,
This post shows compact small files in Delta tables with OPTMIZE.
Adding and Deleting Partitions in Delta Lake tables
by Matthew Powers, Ryan Zhu,
This post shows add partitions and remove partitions from Delta Lake tables.
Remove old files with the Delta Lake Vacuum Command
by Matthew Powers, Nick Karpov,
This blog post explains how to remove files marked for deletion from storage with the Delta Lake Vacuum command.
Reading Delta Lake Tables into Polars DataFrames
by Matthew Powers, Chitral Verma,
This post shows how to read Delta Lake tables into Polars DataFrames.
Building a more efficient data infrastructure for machine learning with Open Source using Delta Lake, Amazon SageMaker, and EMR
by Vedant Jain, Denny Lee,
In this blog, we’ll explore how connecting Delta Lake, Amazon SageMaker Studio, and Amazon EMR can simplify the end-to-end workflow required to support data engineering and data science projects.
Data Sharing across Government Agencies using Delta Sharing
by Li Yu, Mubashir Kazia, Jon D. Ceanfaglione, Prabha Rajendran, Purushotam Shrestha, Shawn A. Benjamin,
This post shows how government agencies are sharing data with Delta Sharing.
How to Delete Rows from a Delta Lake Table
by Matthew Powers,
This post teaches you how to delete rows from a Delta Lake table and how the operation is implemented under the hood.
Delta Lake Constraints and Checks
by Matthew Powers,
This post shows how to add constraints to your Delta table to avoid certain types of values from getting appended.
Delta Lake Schema Enforcement
by Matthew Powers,
This post teaches you about schema enforcement in Delta Lake and why it's better than what's offered by data lakes
Why PySpark append and overwrite write operations are safer in Delta Lake than Parquet tables
by Matthew Powers,
This post shows you why PySpark overwrite operations are safer with Delta Lake and how the different save mode operations are implemented under the hood.
How to Create Delta Lake tables
by Matthew Powers,
This post shows you how to create Delta Lake tables with Python, SQL, and PySpark.
How to Version Your Data with pandas and Delta Lake
by Matthew Powers,
This post shows you how to version your pandas datasets and the benefits you'll enjoy with versioned data.
Sharing a Delta Table’s Change Data Feed with Delta Sharing 0.5.0
by Will Girten,
We are excited to announce the release of Delta Sharing 0.5.0.
How to Rollback a Delta Lake Table to a Previous Version with Restore
by Matthew Powers,
This post shows you how to rollback Delta Lake tables to previous versions with restore.
Converting from Parquet to Delta Lake
by Matthew Powers,
This post shows how to convert a Parquet table to a Delta Lake.
Why we migrated to a Data Lakehouse on Delta Lake for T-Mobile Data Science and Analytics Team
by Robert Thompson, Geoff Freeman,
In this post, we will discuss the how and why we migrated from databases and data lakes to a data lakehouse on Delta Lake. Our lakehouse architecture allows reading and writing of data without blocking and scales out linearly....
How to drop columns from a Delta Lake table
by Matthew Powers,
This post shows you two ways to drop columns from Delta Lake tables.
Apache Flink Source Connector for Delta Lake tables
by Krzysztof Chmielewski, Scott Sandre, Denny Lee,
We are excited to announce the release of Delta Connectors 0.5.0, which introduces the new Flink/Delta Source Connector on Apache Flink™ 1.13 that can read directly from Delta tables using Flink’s DataStream API.
Delta 2.0 - The Foundation of your Data Lakehouse is Open
by Tathagata Das, Denny Lee,
We are happy to announce the release of the Delta Lake 2.0 on Apache Spark™ 3.2! The significance of Delta Lake 2.0 is not just a number - though it is timed quite nicely with Delta Lake’s 3rd birthday....
Multi-cluster writes to Delta Lake Storage in S3
by Scott Sandre, Denny Lee, Mariusz Kryński (Samba TV),
While Delta Lake has supported concurrent reads from multiple clusters since its inception, there were limitations for multi-cluster writes specifically to Amazon S3. Note, this was not a limitation for Azure ADLSgen2 nor Google GCS, as S3 currently lacks...
Delta Lake 1.2 - More Speed, Efficiency and Extensibility Than Ever
by Venki Korukanti, Scott Sandre, Tathagata Das, Allison Portis, Denny Lee, Vini Jaiswal,
Introducing performance optimizations that will supercharge your data pipelines at any scale.
Writing to Delta Lake from Apache Flink
by Fabian Paul, Pawel Kubit, Scott Sandre, Tathagata Das, Denny Lee,
Learn more about how you can write from Apache Flink to Delta Lake about the latest release of the open-source project Delta Sharing and how it enables sharing on Google Cloud Storage, among other enhancements.
Extending Delta Sharing to Google Cloud Storage
by Will Girten, Shixiong Zhu,
Learn more about the latest release of the open-source project Delta Sharing and how it enables sharing on Google Cloud Storage, among other enhancements.
Delta Connectors 0.3.0 Released
by Allison Portis,
We are excited to announce the release of Delta Connectors 0.3.0.
Delta Lake 1.1.0 Released
by Scott Sandre,
We are excited to announce the release of Delta Lake 1.1.0.
Delta Sharing 0.3.0 Released
by Lin Zhou,
We are excited to announce the release of Delta Sharing 0.3.0.
Power BI Delta Sharing Connector
by Denny Lee,
We are excited about the recently announced preview of the Power BI Delta Sharing connector
Delta Lake User Survey (2021 H2)
by Denny Lee,
We would like to invite you to provide your feedback on Delta Lake OSS.
Delta Lake 1.0.0 Released
by Tathagata Das,
We are excited to announce the release of Delta Lake 1.0.0 on Apache Spark 3.1.
Salesforce Engineering: Delta Lake Tech Talk Series
by Denny Lee,
We are happy to announce the Salesforce Engineering Delta Lake Tech Talk Series for March and April 2021.
AMA: Growing the Delta Lake ecosystem
by Denny Lee,
On March 11th, 2021 9:00 am PT, join us for this fun Delta Lake AMA session where we discuss with QP Hou, Christian Williams, and Alexander Kushnir from Scribd on growing the Delta Lake open-source ecosystem.
Salesforce Engineering: Delta Lake Blog Series
by Denny Lee,
Salesforce Engineering has published a series of blogs on how they use Delta Lake.
Salesforce Engineering: Global Synchronousness and Ordering in Delta Lake
by Denny Lee,
At Salesforce, we maintain a platform to capture customer activity — various kinds of sales events such as emails, meetings, and videos. These events are either consumed by downstream products in real time or stored in our data lake, which...
Salesforce Engineering: Engagement Activity Delta Lake, Redshift Sectrum supports Delta Lake
by Denny Lee,
We have a couple of exciting call outs this week!
Getting Started with Delta Lake
by Denny Lee,
Want to learn more about Delta Lake? Check out this series of Delta Lake videos.
Delta Lake Sessions at Spark+AI Summit North America 2020
by Denny Lee,
We're really excited for the numerous Delta Lake training and conference sessions that will be showcased throughout Spark+AI Summit NA 2020.
Delta Lake 0.7.0 Released
by Denny Lee,
We are excited to announce the release of Delta Lake 0.7.0 on Apache Spark 3.0. This is the first release on Spark 3.x and adds support for metastore-defined tables and SQL DDLs.
Delta Lake 0.6.1 Released
by Denny Lee,
We are excited to announce the release of Delta Lake 0.6.1, which fixes a few critical bugs in merge operation and operation metrics. If you are using version 0.6.0, it is strongly recommended that you upgrade to version 0.6.1.
Delta Lake 0.6.0 Released
by Denny Lee,
We are excited to announce the release of Delta Lake 0.6.0, which introduces schema evolution and performance improvements in merge, and operation metrics in table history.
Delta Lake Newsletter: 2020-03-20 Edition
by Denny Lee,
For this edition of the Delta Lake Newsletter, find out more about the latest and upcoming tech talks and videos.
Diving into Delta Lake Online Tech Talk Series
by Denny Lee,
For our next series of Delta Lake online tech talks, we're excited to dive into the internals with our Diving into Delta Lake series. This will be a fun set of tech talks with live demos and Q&A. Check them...
Delta Lake Online Tech Talks
by Denny Lee,
We’re excited to announce the next series of Delta Lake online tech talks over the next few weeks. This will be a fun set of tech talks with live demos and Q&A. Check them out!
Delta Lake 0.5.0 Released
by Denny Lee,
We are excited to announce the release of Delta Lake 0.5.0, which introduces Presto/Athena support and improved concurrency.
Delta Lake Newsletter: 2019-10-03 Edition (incl. SAIS EU 2019 Sessions)
by Denny Lee,
This edition of the Delta Lake Newsletter, find out more about the latest and upcoming webinars, meetups, and publications. For this edition, we will also focus on the many sessions at Spark+AI Summit EU 2019 in Amsterdam.
Delta Lake 0.4.0 Released
by Denny Lee,
We are excited to announce the release of Delta Lake 0.4.0 which introduces Python APIs for manipulating and managing data in Delta tables.
Delta Lake 0.3.0 Released
by Denny Lee,
We are happy to announce the availability of Delta Lake 0.3.0! Features include: Scala Java APIs for DML commands, Scala/Java APIs for query commit history, and Scala/Java APIs for vacuuming old files.
Delta Lake 0.2.0 Released
by Denny Lee,
We are happy to announce the availability of Delta Lake 0.2.0! It brings support for cloud storage (e.g. Amazon S3 and Azure Blob Storage) and improved concurrency.
Delta Lake 0.1.0 Released
by Denny Lee,
We are happy to announce the availability of Delta Lake 0.1.0! Initial version of the open source Delta Lake.