Delta Lake 0.8.0 Released

February 4, 2021
Denny Lee

Delta Lake 0.8.0 Released

Delta Lake 0.8.0 Released

News

February 4, 2021


Delta Lake 0.8.0 Released!

We are excited to announce the release of Delta Lake 0.8.0. The key features in this release are as follows.

  • Unlimited MATCHED and NOT MATCHED clauses for merge operations in Scala, Java, and Python - merge operations now support any number of whenMatched and whenNotMatched clauses. In addition, merge queries that unconditionally delete matched rows no longer throw errors on multiple matches. See the documentation for details.
  • MERGE operation now supports schema evolution of nested columns - Schema evolution of nested columns now has the same semantics as that of top-level columns. For example, new nested columns can be automatically added to a StructType column. See Automatic schema evolution in Merge for details.
  • MERGE INTO and UPDATE operations now resolve nested struct columns by name - Update operations UPDATE and MERGE INTO commands now resolve nested struct columns by name. That is, when comparing or assigning columns of type StructType, the order of the nested columns does not matter (exactly in the same way as the order of top-level columns). To revert to resolving by position, set the Spark configuration ”spark.databricks.delta.resolveMergeUpdateStructsByName.enabled” to ”false”.
  • Check constraints on Delta tables - Delta now supports CHECK constraints. When supplied, Delta automatically verifies that data added to a table satisfies the specified constraint expression. To add CHECK constraints, use the ALTER TABLE ADD CONSTRAINTS command. See the documentation for details.
  • Start streaming a table from a specific version (#474) - When using Delta as a streaming source, you can use the options startingTimestamp or startingVersion to start processing the table from a given version and onwards. You can also set startingVersion to latest to skip existing data in the table and stream from the new incoming data. See the documentation for details.
  • Ability to perform parallel deletes with VACUUM (#395) - When using VACUUM, you can set the session configuration "spark.databricks.delta.vacuum.parallelDelete.enabled" to "true" in order to use Spark to perform the deletion of files in parallel (based on the number of shuffle partitions). See the documentation for details.
  • Use Scala implicits to simplify read and write APIs - You can import io.delta.implicits._ to use the delta method with Spark read and write APIs such as spark.read.delta("/my/table/path"). See the documentation for details.

Credits

Adam Binford, Alan Jin, Alex liu, Ali Afroozeh, Andrew Fogarty, Burak Yavuz, David Lewis, Gengliang Wang, HyukjinKwon, Jacek Laskowski, Jose Torres, Kian Ghodoussi, Linhong Liu, Liwen Sun, Mahmoud Mahdi, Maryann Xue, Michael Armbrust, Mike Dias, Pranav Anand, Rahul Mahadev, Scott Sandre, Shixiong Zhu, Stephanie Bodoff, Tathagata Das, Wenchen Fan, Wesley Hoffman, Xiao Li, Yijia Cui, Yuanjian Li, Zach Schuermann, contrun, ekoifman, Yi Wu

Thank you for your contributions.

Visit the release notes to learn more about the release.

Join the Delta Lake Community

Communicate with fellow Delta Lake users and contributors, ask questions and share tips.

Project Governance

Delta Lake is an independent open-source project and not controlled by any single company. To emphasize this we joined the Delta Lake Project in 2019, which is a sub-project of the Linux Foundation Projects.

 

Within the project, we make decisions based on these rules.

Copyright © 2020 Delta Lake, a Series of LF Projects, LLC. For web site terms of use, trademark policy and other project policies please see https://lfprojects.org.
twitterstack-overflow