Delta Lake Video Gallery

Watch the latest videos and webinars for the open-source Delta Lake project.
Data Reliability for Data Lakes | Databricks

ABOUT THE KEYNOTE ( Building a modern data lake requires dealing with a lot of complexity: querying historical data + streaming data simultaneously (lambda architecture), validation to ensure data isn't too messy for data science and machine learning, reprocessing to handle failures, and ensuring ACID-compliant data updates. We created the Delta Lake project, open sourced […]

SmartSQL Queries powered by Delta Engine on Lakehouse

Welcome to the Data Collab Lab with Franco and Denny! This online meetup series brings together various data experts and we collaborate together to (hopefully) solve the problem! For this session, we will discuss: SmartSQL Queries powered by Delta Engine on Lakehouse As a data analyst have you ever wanted to be able to simply […]

Tutorial: How Delta Lake Supercharges Data Lakes

Delta Lake’s transaction log brings high reliability, performance, and ACID compliant transactions to data lakes. But exactly how does it accomplish this? Working through concrete examples, we will take a close look at how the transaction logs are managed and leveraged by Delta to supercharge data lakes. In this tech talk you will learn: - […]

Generating Surrogate Keys for your Data Lakehouse with Spark SQL and Delta Lake

For this tech chat, we will discuss a popular data warehousing fundamental - surrogate keys. As we had discussed in various other Delta Lake tech talks, the reliability brought to data lakes by Delta Lake has brought a resurgence of many of the data warehousing fundamentals such as Change Data Capture in data lakes. Surrogate […]

Delta Lake 0.7.0 + Spark 3.0 AMA

On August 18th, 2020, join Apache Spark and Delta Lake committers Burak Yavuz, Tathagata Das, and Denny Lee for an illuminating “Ask Me Anything” session. Whether you would like to know more about the history of Apache Spark to the current bleeding edge use cases of Spark 3.0 and Delta Lake, this is the session […]

Simplifying Disaster Recovery with Delta Lake

There’s a need to develop a recovery process for Delta table in a DR scenario. Cloud multi-region sync is Asynchronous. This type of replication does not guarantee the chronological order of files at the target (DR) region. In some cases, we can expect large files to arrive later than small files. With Delta Lake, this […]

Real-Time Forecasting at Scale using Delta Lake and Delta Caching

GumGum receives around 30 billion programmatic inventory impressions amounting to 25 TB of data each day. Inventory impression is the real estate to show potential ads on a publisher page. By generating near-real-time inventory forecast based on campaign-specific targeting rules, GumGum enables the account managers to set up successful future campaigns. This talk will highlight […]

Patterns and Operational Insights from the First Users of Delta Lake

Cyber threat detection and response requires demanding work loads over large volumes of log and telemetry data. A few years ago I came to Apple after building such a system at another FAANG company, and my boss asked me to do it again. I learned a lot from my prior experience using Apache Spark and […]

Machine Learning Data Lineage with MLflow and Delta Lake

Many organizations using machine learning are facing challenges storing and versioning their complex ML data as well as a large number of models generated from those data. To simplify this process, organizations tend to start building their customized ‘ML platforms.’ However, even such platforms are limited to only a few supported algorithms and they tend […]

Best Practices for Building Robust Data Platform with Apache Spark and Delta

This talk will focus on Journey of technical challenges, trade offs and ground-breaking achievements for building performant and scalable pipelines from the experience working with our customers. The problems encountered are shared by many organizations and so the lessons learned and best practices are widely applicable. These include: - Operational tips and best practices with […]

Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflow

As Atlassian continues to scale to more and more customers, the demand for our legendary support continues to grow. Atlassian needs to maintain balance between the staffing levels needed to service this increasing support ticket volume with the budgetary constraints needed to keep the business healthy – automated ticket volume forecasting is at the centre […]

Building the Petcare Data Platform using Delta Lake and ‘Kyte’: Our Spark ETL Pipeline

At Mars Petcare (in a division known as Kinship Data & Analytics) we are building out the Petcare Data Platform – a cloud based Data Lake solution. Leveraging Microsoft Azure, we were faced with important decisions around tools and design. We chose Delta Lake as a storage layer to build out our platform and bring […]

1 2 3 5

Join the Delta Lake Community

Communicate with fellow Delta Lake users and contributors, ask questions and share tips
Slack ChannelGoogle Group

Project Governance

Delta Lake is an independent open-source project and not controlled by any single company. To emphasize this we joined the Delta Lake Project in 2019, which is a sub-project of the Linux Foundation Projects.


Within the project, we make decisions based on these rules.

Copyright © 2020 Delta Lake, a Series of LF Projects, LLC. For web site terms of use, trademark policy and other project policies please see