Delta Lake Video Gallery

Watch the latest videos and webinars for the open-source Delta Lake project.
Parallelization of Structured Streaming Jobs Using Delta Lake

We’ll tackle the problem of running streaming jobs from another perspective using Databricks Delta Lake, while examining some of the current issues that we faced at Tubi while running regular structured streaming. A quick overview on why we transitioned from parquet data files to delta and the problems it solved for us in running our […]

VIP Ask Me Anything (AMA) Session: Delta Lake

Delta Lake VIP AMA: with Joe Widen, Franco Patano, Palla Lentz, Chris Hoshino-Fish, and Denny Lee

A Thorough Comparison of Delta Lake, Iceberg and Hudi

Recently, a set of modern table formats such as Delta Lake, Hudi, Iceberg spring out. Along with Hive Metastore these table formats are trying to solve problems that stand in traditional data lake for a long time with their declared features like ACID, schema evolution, upsert, time travel, incremental consumption etc. This talk will share […]

Operationalizing Big Data Pipelines At Scale

Running a global, world-class business with data-driven decision making requires ingesting and processing diverse sets of data at tremendous scale. How does a company achieve this while ensuring quality and honoring their commitment as responsible stewards of data? This session will detail how Starbucks has embraced big data, building robust, high-quality pipelines for faster insights […]

Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake

Change Data Capture CDC is a typical use case in Real-Time Data Warehousing. It tracks the data change log -binlog- of a relational database [OLTP], and replay these change log timely to an external storage to do Real-Time OLAP, such as delta/kudu. To implement a robust CDC streaming pipeline, lots of factors should be concerned, […]

Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with Delta Lake

Columbia is a data-driven enterprise, integrating data from all line-of-business-systems to manage its wholesale and retail businesses. This includes integrating real-time and batch data to better manage purchase orders and generate accurate consumer demand forecasts. It also includes analyzing product reviews to increase customer satisfaction. In this presentation, we’ll walk through how we achieved a […]

Building Data Quality Audit Framework using Delta Lake at Cerner

Cerner needs to know what assets it owns, where they are located, and the status of those assets. A configuration management system is an inventory of IT assets and IT things like servers, network devices, storage arrays, and software licenses. There was a need to bring all the data sources into one place so that […]

Powering Interactive BI Analytics with Presto and Delta Lake

Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Airbnb, Comcast, GrubHub, Facebook, FINRA, LinkedIn, Lyft, Netflix, Twitter, and Uber, in the last few years Presto experienced an unprecedented growth […]

VIP Ask Me Anything (AMA) Session: Delta Lake 0.7.0 Early Preview

Delta Lake 0.7.0 Early Preview VIP AMA: with Burak Yavuz, Tathagata Das, and Denny Lee (Databricks)

How Starbucks is Achieving Enterprise Data and ML at Scale | Keynote Spark + AI Summit 2020

Starbucks makes sure that everything we do is through the lens of humanity – from our commitment to the highest quality coffee in the world to the way we engage with our customers and communities to do business responsibly. A key aspect of ensuring that world-class customer experiences is data. This talk highlights the Enterprise […]

Realizing the Vision of the Data Lakehouse | Ali Ghodsi | Keynote Spark + AI Summit 2020

Data warehouses have a long history of decision support and business intelligence applications. But, data warehouses were not well suited to dealing with the unstructured, semi-structured, and streaming data common in modern enterprises. This led to organizations building data lakes of raw data about a decade ago. But, they also lacked important capabilities. The need […]

Building a Better Delta Lake with Talend and Databricks

With the introduction of Delta Lake last year, a well-tested pattern of building out the bronze, silver, and gold data architecture approach has proven useful. This session will review how to use Talend Data Fabric to accelerate the development of a Delta Lake using highly productive, scalable, and enterprise ready data flow tools. Covered in […]

Join the Delta Lake Community

Communicate with fellow Delta Lake users and contributors, ask questions and share tips
Slack ChannelGoogle Group


Project Governance

Delta Lake is an independent open-source project and not controlled by any single company. To emphasize this we joined the Delta Lake Project in 2019, which is a sub-project of the Linux Foundation Projects.

 

Within the project, we make decisions based on these rules.

Copyright © 2020 Delta Lake, a Series of LF Projects, LLC. For web site terms of use, trademark policy and other project policies please see https://lfprojects.org.
twitterstack-overflow