Diving into Delta Lake Online Tech Talk Series
Delta Lake Internals Online Tech Talks
News
March 13, 2020
For our next series of Delta Lake online tech talks, we're excited to dive into the internals with our Diving into Delta Lake series. This will be a fun set of tech talks with live demos and Q&A. Check them out!
Diving into Delta Lake: Unpacking the Transaction Log
March 26, 2020 | 09:00 am PTWatch now!

- What is the Delta Lake Transaction Log
- What is the transaction log used for?
- How does the transaction log work?
- Reviewing the Delta Lake transaction log at the file level
- Dealing with multiple concurrent reads and writes
- How the Delta Lake transaction log solves other use cases including Time Travel and Data Lineage and Debugging
Speakers:
Burak Yavuz
Senior Software Engineer @Databricks
Burak Yavuz is a Software Engineer at Databricks. He has been contributing to Spark since Spark 1.1, and is the maintainer of Spark Packages. Burak received his BS in Mechanical Engineering at Bogazici University, Istanbul, and his MS in Management Science & Engineering at Stanford.
Denny Lee
Developer Advocate @Databricks
Denny Lee is a Developer Advocate at Databricks. He is a hands-on distributed systems and data sciences engineer with extensive experience developing internet-scale infrastructure, data platforms, and predictive analytics systems for both on-premise and cloud environments. He also has a Masters of Biomedical Informatics from Oregon Health and Sciences University and has architected and implemented powerful data solutions for enterprise Healthcare customers. His current technical focuses include Distributed Systems, Apache Spark, Deep Learning, Machine Learning, and Genomics.
Diving into Delta Lake: Schema Enforcement and Evolution
April 2, 2020 | 09:00 am PTWatch Now!

This brings us to schema management. As business problems and requirements evolve over time, so too does the structure of your data. With Delta Lake, as the data changes, incorporating new dimensions is easy. Users have access to simple semantics to control the schema of their tables. These tools include schema enforcement, which prevents users from accidentally polluting their tables with mistakes or garbage data, as well as schema evolution, which enables them to automatically add new columns of rich data when those columns belong. In this webinar, we’ll dive into the use of these tools. In this webinar you will learn about:
- Understanding table schemas and schema enforcement
- How does schema enforcement work?
- How is schema enforcement useful?
- Preventing data dilution
- How does schema evolution work?
- How is schema evolution useful?
Speakers:
Andreas Neumann
Staff Software Engineer @Databricks
Andreas Neumann is a software engineer at Databricks, where he focuses on Structured Streaming and Delta Lake. He has previously built big data systems at Google, Cask Data, Yahoo! and IBM. Andreas holds a PhD in computer science from the University of Trier, Germany.
Denny Lee
Developer Advocate @Databricks
Denny Lee is a Developer Advocate at Databricks. He is a hands-on distributed systems and data sciences engineer with extensive experience developing internet-scale infrastructure, data platforms, and predictive analytics systems for both on-premise and cloud environments. He also has a Masters of Biomedical Informatics from Oregon Health and Sciences University and has architected and implemented powerful data solutions for enterprise Healthcare customers. His current technical focuses include Distributed Systems, Apache Spark, Deep Learning, Machine Learning, and Genomics.
Diving into Delta Lake: DML Internals
April 16, 2020 | 9:00 am PTWatch now!

- A quick primer on the Delta Lake Transaction Log
- Understand the fundamentals when running DELETE, UPDATE, and MERGE
- Understand the actions performed when performing these tasks
- Understand the basics of partition pruning in Delta Lake
- How do streaming queries work within Delta Lake
Speakers:
Tathagata Das
Staff Software Engineer @Databricks
Tathagata Das is an Apache Spark committer and a member of the PMC. He’s the lead developer behind Spark Streaming and currently develops Structured Streaming. Previously, he was a grad student at the UC Berkeley at AMPLab, where he conducted research about data-center frameworks and networks with Scott Shenker and Ion Stoica.
Denny Lee
Developer Advocate @Databricks
Denny Lee is a Developer Advocate at Databricks. He is a hands-on distributed systems and data sciences engineer with extensive experience developing internet-scale infrastructure, data platforms, and predictive analytics systems for both on-premise and cloud environments. He also has a Masters of Biomedical Informatics from Oregon Health and Sciences University and has architected and implemented powerful data solutions for enterprise Healthcare customers. His current technical focuses include Distributed Systems, Apache Spark, Deep Learning, Machine Learning, and Genomics.