Build Lakehouses with Delta Lake

Delta lake is an open-source project that enables building a Lakehouse architecture on top of existing storage systems such as S3, ADLS, GCS, and HDFS.
Get Started

Key features

ACID Transactions
Data lakes typically have multiple data pipelines reading and writing data concurrently, and data engineers have to go through a tedious process to ensure data integrity, due to the lack of transactions. Delta Lake brings ACID transactions to your data lakes. It provides serializability, the strongest level of isolation level. Learn more at Diving into Delta Lake: Unpacking the Transaction Log.
Scalable Metadata Handling
In big data, even the metadata itself can be "big data." Delta Lake treats metadata just like data, leveraging Spark's distributed processing power to handle all its metadata. As a result, Delta Lake can handle petabyte-scale tables with billions of partitions and files at ease.
Time Travel (data versioning)
Delta Lake provides snapshots of data enabling developers to access and revert to earlier versions of data for audits, rollbacks or to reproduce experiments. Learn more in Introducing Delta Lake Time Travel for Large Scale Data Lakes.
Open Format
All data in Delta Lake is stored in Apache Parquet format enabling Delta Lake to leverage the efficient compression and encoding schemes that are native to Parquet.
Unified Batch and Streaming Source and Sink
A table in Delta Lake is both a batch table, as well as a streaming source and sink. Streaming data ingest, batch historic backfill, and interactive queries all just work out of the box.
Schema Enforcement
Delta Lake provides the ability to specify your schema and enforce it. This helps ensure that the data types are correct and required columns are present, preventing bad data from causing data corruption. For more information, refer to Diving Into Delta Lake: Schema Enforcement & Evolution.
Schema Evolution
Big data is continuously changing. Delta Lake enables you to make changes to a table schema that can be applied automatically, without the need for cumbersome DDL. For more information, refer to Diving Into Delta Lake: Schema Enforcement & Evolution.
Audit History
Delta Lake transaction log records details about every change made to data providing a full audit trail of the changes.
Updates and Deletes
Delta Lake supports Scala, Java, Python, and SQL APIs to merge, update and delete datasets. This allows you to easily comply with GDPR and CCPA and also simplifies use cases like change data capture. For more information, refer to Announcing the Delta Lake 0.3.0 Release and Simple, Reliable Upserts and Deletes on Delta Lake Tables using Python APIs which includes code snippets for merge, update, and delete DML commands.
100% Compatible with Apache Spark API
Developers can use Delta Lake with their existing data pipelines with minimal change as it is fully compatible with Spark, the commonly used big data processing engine.
Delta Everywhere
Use the language, services, connectors, or database of your choice with Delta Lake with connectors including Rust, Python, DBT, Hive, Presto, and more!
Instead of parquet...
dataframe
   .write
   .format("parquet")
   .save("/data")
… simply say delta
dataframe
   .write
   .format("delta")
   .save("/data")
Together, the features of Delta Lake improve both the manageability and performance of working with data in cloud storage objects, and enable a "lakehouse" paradigm that combines the key features of data warehouses and data lakes: standard DBMS management functions usable against low-cost object stores.

Organizations using and contributing to Delta Lake

Thousands of companies are processing exabytes of data per month with Delta Lake.

To add your organization here, email us at info@delta.io.

Join the Delta Lake Community

Communicate with fellow Delta Lake users and contributors, ask questions and share tips.

Project Governance

Delta Lake is an independent open-source project and not controlled by any single company. To emphasize this we joined the Delta Lake Project in 2019, which is a sub-project of the Linux Foundation Projects.

 

Within the project, we make decisions based on these rules.

Copyright © 2020 Delta Lake, a Series of LF Projects, LLC. For web site terms of use, trademark policy and other project policies please see https://lfprojects.org.
twitterstack-overflow