Delta Lake Blogs

Learn how to use Apache Sedona with Delta Lake

Delta Lake 3.3

by Allison Portis, Susan Pierce,

Announcing Delta Lake 3.3 on Apache Spark 3.5, with features that improve the performance and interoperability of Delta Lake.

Understanding Open Table Formats

Learn about open table formats

Delta Lake Liquid Clustering

Learn how to use Delta Lake Liquid Clustering feature

Delta Lake on Azure Data Lake Storage

Learn how to use Delta Lake on Azure Data Lake Storage

Delta Lake Upsert

Learn how to perform upserts with Delta Lake

Delta Lake on GCP

Learn how to use Delta Lake on GCP

Building the Medallion Architecture with Delta Lake

Using the Medallion Architecture with Delta Lake

Delta Lake Clone

Learn how to clone Delta tables

Delta Lake on S3

Learn how to use Delta Lake on S3

Delta Lake for ETL

Learn how to use Delta Lake for ETL workloads

Delta Lake 4.0 Preview

by Tathagata “TD” Das, Allison Portis, Scott Sandre, Susan Pierce, Carly Akerly,

We are pleased to announce the preview release of Delta Lake 4.0 (release notes) on Apache Spark™ 4.0 Preview.

Delta Lake Optimize

Learn how to optimize your Delta Lake tables

Unlocking the Power of Delta Lake 3.0+: Introducing the New StarTree Connector with Delta Kernel

by Vibhuti Bhushan,

In the rapidly evolving landscape of data management, staying up-to-date with the latest advancements is key to maintaining a competitive edge.

Unifying the open table formats with Delta Lake Universal Format (UniForm) and Apache XTable

by Jonathan Brito, Kyle Weller,

Delta Lake Universal Format (UniForm) enables Delta tables to be read by any engine that supports Delta, Iceberg, and now, through code contributed by Apache XTable, Hudi.

Delta Kernel - Building Delta Lake connectors, made simple

by Nick Lanham, Tathagata “TD” Das,

Delta Lake recently hit an impressive milestone of being downloaded more than 20M times per month!

Query Delta Lake natively using BigQuery

by Gaurav Saxena, Justin Levandoski,

Users working with Delta Lake tables can now easily integrate their workloads with BigQuery, ensuring secure and more managed interoperability.

A Guide to Delta Lake Sessions at Data+AI Summit

by Carly Akerly,

The Data+AI Summit returns to San Francisco from June 10-13, 2024.

Delta Lake without Spark

Learn how to use Delta Lake without Spark

Use Delta Lake from Jupyter Notebook

Learn how to use Delta Lake from a Jupyter Notebook

Scaling Graph Data Processing with Delta Lake: Lessons from a Real-World Use Case

by Yeshwanth Vijayakumar, Director of Engineering, Adobe,

The Adobe Experience Platform includes a set of analytics, social, advertising, media optimization, targeting, Web experience management, journey orchestration, and content management products.

Delta Lake vs Data Lake - What's the Difference?

Understand the difference between Delta Lake and a data lake

Delta Lake 3.2

by Carly Akerly,

We are pleased to announce the release of Delta Lake 3.2 (release notes) on Apache Spark 3.5, with features that improve the performance and interoperability of Delta Lake.

Efficient Delta Vacuum with File Inventory

by Arun Ravi M V (Grab),

Today, Delta Lake is rapidly making its mark as a highly popular hybrid data format, earning widespread adoption across various organizations.

Rivian expands the Delta Lake ecosystem with Delta-Go

by Chelsea Jones, Staff Data Engineer, Rivian; Rahul Madnawat, Software Engineer II, Rivian; Jason Shiverick, Director of AI Platforms, Rivian,

Real-time data ingestion for high-volume transactions, now available in open source

Pros and cons of Hive-style partitioning

by Matthew Powers, Martin Bode,

This post discusses the pros and cons of Hive-style partioning.

Structured Spark Streaming with Delta Lake: A Comprehensive Guide

by Delta Lake,

The webinar demonstrates how to embrace structured streaming seamlessly from data emission to your final Delta table destination.

High-Performance Querying on Massive Delta Lake Tables with Daft

by Clark Zinzow, Jay Chia,

This post introduces the distributed + parallel Delta Lake reader in Daft.

Delta Lake - State of the Project - Part 2

by Tathagata "TD" Das, Susan Pierce, Carly Akerly,

Delta Lake, a project hosted under The Linux Foundation, has been growing by leaps and bounds. To celebrate the achievements of the project, we’re publishing a 2-part series on Delta Lake.

Delta Lake Announces Pandas Enhancement: Real Pandas to Optimize Data Lakehouse Performance

by Carly Akerly,

The Delta Lake project is thrilled to announce its latest and most exciting collaboration with the Pandas community!

Delta Lake - State of the Project - Part 1

by Tathagata "TD" Das, Susan Pierce, Carly Akerly,

Delta Lake, a project hosted under The Linux Foundation, has been growing by leaps and bounds. To celebrate the achievements of the project, we’re publishing a 2-part series on Delta Lake.

Delta Lake 3.1.0

by Carly Akerly,

This post describes the exiting features in the Delta Lake 3.1.0 release

Delta Lake replaceWhere

Selectively overriding rows or partitions of a Delta Lake table with replaceWhere.

Delta Lake Performance

by Joe Harris,

This post shows explains why Delta Lake is fast and describes improvements to Delta Lake performance over time.

Writing a Kafka Stream to Delta Lake with Spark Structured Streaming

by Bo Gao, Matthew Powers,

This blog post explains how to write a Kafka stream to a Delta table with Spark Structured Streaming.

Using Delta Lake with AWS Glue

by Keerthi Josyula, Matthew Powers,

This post shows how to register Delta tables in the AWS Glue Data Catalog with the AWS Glue Crawler.

New features in the Python deltalake 0.12.0 release

by Ion Koutsouris,

This post explains the new features in the Python deltalake 0.12.0 release

Delta Lake 3.0.0

by Carly Akerly,

This post describes the exiting features in the Delta Lake 3.0.0 release

Delta Lake vs. Parquet Comparison

This post compares the stengths and weaknesses of Delta Lake vs Parquet.

Delta Lake vs. ORC Comparison

This post compares the stengths and weaknesses of Delta Lake vs ORC.

Unlock Delta Lakes for PyTorch Training with DeltaTorch

by Daniel Liden, Michael Shtelma,

This post demonstrates how to create PyTorch DataLoaders using Delta tables as data sources for training deep learning models.

Introducing Delta Lake Table Features

This introduces Delta Lake Table Features, a discrete feature-based compatibility scheme that replaces the traditional integer protocol versioning for Delta Lake tables and clients.

Delta Lake Change Data Feed (CDF)

by Nick Karpov, Matthew Powers,

This blog shows how to enable and use the Delta Lake Change Data Feed.

Delta Lake’s transaction log protocol and its implementations

This blog explains the Delta Lake transaction log protocol and its various implementation.

Delta Lake Deletion Vectors

This blog introduces the new Deletion Vectors table feature for Delta Lake tables, and explains how Deletion Vectors speed up operations that modify existing data in your lakehouse.

Using Ibis with PySpark on Delta Lake tables

by Marlene Mhangami, Matthew Powers,

This post explains how to use Ibis to query Delta tables with PySpark

Delta Lake Z Order

This post explains how to use Delta Lake Z Order to make your queries run faster

Delta Lake 2.3.0 Released

by Allison Portis, Matthew Powers,

This post explains some of the key features in the Delta Lake 2.3.0 release

Open source self-hosted Delta Sharing server

by Shingo OKAWA,

This post explains Kotosiro Delta Sharing server basic instructions

How Delta Lake uses metadata to make certain aggregations much faster

by Matthew Powers, Scott Sandre,

This post explains Delta Lake performance optimizations that make some aggregations execute quicker

How to use Delta Lake generated columns

How to create Delta Lake tables with generated columns and the benefits of this feature

Introducing Support for Delta Lake Tables in AWS Lambda

How to use deltalake in AWS Lambda with AWS SDK for pandas

How to create and append to Delta Lake tables with pandas

This post explains how to create and append to Delta Lake tables with pandas

Running ML Workflows with Delta Lake and Ray

by Jim Hibbard,

This post explains how you can read Delta Lake with the Ray compute framework

How to Convert from CSV to Delta Lake

This post explains how to convert from a CSV data lake to Delta Lake, which offers much better features.

Getting started contributing to Delta Lake Spark

This post explains the full development loop with the Delta Lake Spark connector. You'll learn how to retrieve and navigate the codebase, make changes, and package and debug custom builds.

New features in the Python deltalake 0.7.0 release of delta-rs

by Will Jones, Matthew Powers,

This post explains the new features in the deltalake 0.7.0 release

Delta Lake Merge

This post shows how to use MERGE with Delta tables.

Delta Lake Schema Evolution

This post shows how to enable schema evolution in Delta tables and when this is a good option.

Delta Lake Time Travel

This post shows how to time travel between different versions of a Delta table.

Delta Lake Small File Compaction with OPTIMIZE

This post shows compact small files in Delta tables with OPTMIZE.

Adding and Deleting Partitions in Delta Lake tables

by Matthew Powers, Ryan Zhu,

This post shows add partitions and remove partitions from Delta Lake tables.

Remove old files with the Delta Lake Vacuum Command

by Matthew Powers, Nick Karpov,

This blog post explains how to remove files marked for deletion from storage with the Delta Lake Vacuum command.

Reading Delta Lake Tables into Polars DataFrames

by Matthew Powers, Chitral Verma,

This post shows how to read Delta Lake tables into Polars DataFrames.

Building a more efficient data infrastructure for machine learning with Open Source using Delta Lake, Amazon SageMaker, and EMR

by Vedant Jain, Denny Lee,

In this blog, we’ll explore how connecting Delta Lake, Amazon SageMaker Studio, and Amazon EMR can simplify the end-to-end workflow required to support data engineering and data science projects.

Data Sharing across Government Agencies using Delta Sharing

by Li Yu, Mubashir Kazia, Jon D. Ceanfaglione, Prabha Rajendran, Purushotam Shrestha, Shawn A. Benjamin,

This post shows how government agencies are sharing data with Delta Sharing.

How to Delete Rows from a Delta Lake Table

This post teaches you how to delete rows from a Delta Lake table and how the operation is implemented under the hood.

Delta Lake Constraints and Checks

This post shows how to add constraints to your Delta table to avoid certain types of values from getting appended.

Delta Lake Schema Enforcement

This post teaches you about schema enforcement in Delta Lake and why it's better than what's offered by data lakes

Why PySpark append and overwrite write operations are safer in Delta Lake than Parquet tables

This post shows you why PySpark overwrite operations are safer with Delta Lake and how the different save mode operations are implemented under the hood.

How to Create Delta Lake tables

This post shows you how to create Delta Lake tables with Python, SQL, and PySpark.

How to Version Your Data with pandas and Delta Lake

This post shows you how to version your pandas datasets and the benefits you'll enjoy with versioned data.

Sharing a Delta Table’s Change Data Feed with Delta Sharing 0.5.0

by Will Girten,

We are excited to announce the release of Delta Sharing 0.5.0.

How to Rollback a Delta Lake Table to a Previous Version with Restore

This post shows you how to rollback Delta Lake tables to previous versions with restore.

Converting from Parquet to Delta Lake

This post shows how to convert a Parquet table to a Delta Lake.

Why we migrated to a Data Lakehouse on Delta Lake for T-Mobile Data Science and Analytics Team

by Robert Thompson, Geoff Freeman,

In this post, we will discuss the how and why we migrated from databases and data lakes to a data lakehouse on Delta Lake. Our lakehouse architecture allows reading and writing of data without blocking and scales out linearly....

How to drop columns from a Delta Lake table