Delta Lake 0.7.0 Released

June 18, 2020
Denny Lee

Delta Lake 0.7.0 Released

Delta Lake 0.7.0 Released

News

June 18, 2020


Delta Lake 0.7.0 Released!

We are excited to announce the release of Delta Lake 0.7.0 on Apache Spark 3.0. This is the first release on Spark 3.x and adds support for metastore-defined tables and SQL DDLs. The key features in this release are as follows.

  • Support for defining tables in the Hive metastore (#85) - You can now define Delta tables in the Hive metastore and use the table name in all SQL operations. Specifically, we have added support for: This integration uses Catalog APIs introduced in Spark 3.0. You must enable the Delta Catalog by setting additional configurations when starting your SparkSession. See the documentation for details.
  • Support for SQL Delete, Update and Merge - With Spark 3.0, you can now use SQL DML operations DELETE, UPDATE and MERGE. See the documentation for details.
  • Support for automatic and incremental Presto/Athena manifest generation (#453) - You can now use ALTER TABLE SET TBLPROPERTIES to enable automatic regeneration of the Presto/Athena manifest files on every operation on a Delta table. This regeneration is incremental, that is, manifest files are updated for only the partitions that have been updated by the operation. See the documentation for details.
  • Support for controlling the retention of the table history - You can now use ALTER TABLE SET TBLPROPERTIES to configure how long the table history and delete files are maintained in Delta tables. See the documentation for details.
  • Support for adding user-defined metadata in Delta table commits - You can now add user-defined metadata as strings in commits made to a Delta table by any operation. For DataFrame.write and DataFrame.writeStream operations, you can set the option userMetadata. For other operations, you can set the SparkSession configuration spark.databricks.delta.commitInfo.userMetadata. See the documentation for details.
  • Support Azure Data Lake Storage Gen2 (#288) - Spark 3.0 has support for Hadoop 3.2 libraries which enables support for Azure Data Lake Storage Gen2. See the documentation for details on how to configure Delta Lake with the correct versions of Spark and Hadoop libraries for Azure storage systems.
  • Improved support for streaming one-time triggers - With Spark 3.0, we now ensure that one-time trigger (also known as Trigger.Once) processes all outstanding data in a Delta table in a single micro-batch even if rate limits are set with the DataStreamReader option maxFilesPerTrigger.
Due to the significant internal changes, workloads on previous versions of Delta using the DeltaTable programmatic APIs may require additional changes to migrate to 0.7.0. See the Migration Guide for details.

Credits

Alan Jin, Alex Ott, Burak Yavuz, Jose Torres, Pranav Anand, QP Hou, Rahul Mahadev, Rob Kelly, Shixiong Zhu, Subhash Burramsetty, Tathagata Das, Wesley Hoffman, Yin Huai, Youngbin Kim, Zach Schuermann, Eric Chang, Herman van Hovell, Mahmoud Mahdi

Thank you for your contributions.

Visit the release notes to learn more about the release.

Join the Delta Lake Community

Communicate with fellow Delta Lake users and contributors, ask questions and share tips
Slack ChannelGoogle Group


Project Governance

Delta Lake is an independent open-source project and not controlled by any single company. To emphasize this we joined the Delta Lake Project in 2019, which is a sub-project of the Linux Foundation Projects.

 

Within the project, we make decisions based on these rules.

Copyright © 2020 Delta Lake, a Series of LF Projects, LLC. For web site terms of use, trademark policy and other project policies please see https://lfprojects.org.
twitterstack-overflow