linux-foundation

Delta Lake Documentation

This page contains a list of documentation links for various Delta Lake projects.

Delta Lake

Delta Lake is a storage layer that brings data reliability via scalable, ACID transactions to Apache Spark™, Flink, Hive, Presto, Trino, and other big-data engines.

Visit the Delta Lake Documentation for the latest Delta Lake documentation and reference guide.

For more information

In addition, refer to the following links for the API documentation

Delta Sharing

Delta Sharing is an open protocol for secure real-time exchange of large datasets, which enables organizations to share data in real time regardless of which computing platforms they use. It is a simple REST protocol that securely shares access to part of a cloud dataset and leverages modern cloud storage systems, such as S3, ADLS, or GCS, to reliably transfer data.

For more information:

Delta Lake Connectors

We are building connectors to bring Delta Lake to popular big-data engines outside Apache Spark (e.g., Apache Hive, Presto) and also to common reporting tools like Microsoft Power BI.

For more information:

  • Delta Standalone, formerly known as the Delta Standalone Reader (DSR), is a JVM library to read and write Delta Lake tables.
    • PrestoDB/Delta Connector: This connector allows reading Delta Lake tables in Presto. The connector uses the Delta Standalone Library (DSR) provided by Delta Lake project to read the table metadata.
  • Hive Connector: This project is a library to make Hive read Delta Lake tables.
  • sql-delta-import: Imports data from a relational database or any other JDBC source into your Delta Lake. Import either entire table or only a subset of columns, control level of parallelism, include any custom transformations
  • Power BI: Reading Delta Lake tables natively in PowerBI
  • Flink Delta Lake Connector: Official Delta Lake connector for Apache Flink.

delta-rs (Delta Rust API)

delta-rs: This library provides low level access to Delta tables in Rust, which can be used with data processing frameworks like datafusion, ballista, polars, vega, etc. It also provides bindings to other higher level languages such as Python.

For more information:

kafka-delta-ingest

The kafka-delta-ingest project aims to build a highly efficient daemon for streaming data through Apache Kafka into Delta Lake. This project is currently highly experimental and evolving in tandem with the delta-rs bindings.