Delta Lake Video Gallery

Watch the latest videos and webinars for the open-source Delta Lake project.
Beyond Lambda: Introducing Delta Architecture

Online Tech Talk with Denny Lee, Developer Advocate @ Databricks Lambda architecture is a popular technique where records are processed by a batch system and streaming system in parallel. The results are then combined during query time to provide a complete answer. Strict latency requirements to process old and recently generated events made this architecture […]

Getting Data Ready for Data Science with Delta Lake and MLflow

Online Tech Talk with Denny Lee, Developer Advocate @ Databricks One must take a holistic view of the entire data analytics realm when it comes to planning for data science initiatives. Data engineering is a key enabler of data science helping furnish reliable, quality data in a timely fashion. Delta Lake, an open-source storage layer […]

The Genesis of Delta Lake: An Interview with Burak Yavuz

We're re-igniting the Spark Online Meetup! In this live meetup, Denny Lee (Engineer and Developer Advocate at Databricks) interviews Delta Lake engineer Burak Yavuz.

Reliability and Data Quality for Data Lakes and Apache Spark by Michael Armbrust

Delta Lake is an open source storage layer that brings reliability to data lakes. Delta Lake offers ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. It runs on top of your existing data lake and is fully compatible with Apache Spark APIs.

LoyaltyOne Simplifies and Scales Data & Analytics Pipelines With Delta Lake

Learn how LoyaltyOne uses Data Lake to simplify and scale data and analytics pipelines.

Petabytes, Exabytes, and Beyond Managing Delta Lakes for Interactive Queries at Scale

Data production continues to scale up and the techniques for managing it need to scale too. Building pipelines that can process petabytes per day in turn create data lakes with exabytes of historical data. At Databricks, we help our customers turn these data lakes into gold mines of valuable information using Apache Spark. This talk […]

Training: Building Reliable Data Lakes at Scale with Delta Lake

Most data practitioners grapple with data reliability issues—it's the bane of their existence. Data engineers, in particular, strive to design, deploy, and serve reliable data in a performant manner so that their organizations can make the most of their valuable corporate data assets. Delta Lake is an open-source storage layer that brings ACID transactions to […]

Designing ETL Pipelines with Structured Streaming and Delta Lake

Structured Streaming has proven to be the best platform for building distributed stream processing applications. Its unified SQL/Dataset/DataFrame APIs and Spark's built-in functions make it easy for developers to express complex computations. Delta Lake, on the other hand, is the best way to store structured data because it is a open-source storage layer that brings […]

ACID ORC, Iceberg, and Delta Lake

The reality of most large scale data deployments includes storage decoupled from computation, pipelines operating directly on files and metadata services with no locking mechanisms or transaction tracking. For this reason, attempts at achieving transactional behavior, snapshot isolation, safe schema evolution or performant support for CRUD operations has always been marred with tradeoffs. This talk […]

Building an AI Powered Retail Experience with Delta Lake

Zalando SE is Europe's leading online fashion platform and connects customers, brands and partners. With millions of visitors each month, we have petabytes of purchase, click-stream, product and other data in our data lake. This data is crucial to powering insights on shopper behavior and driving an AI-first strategy to improve site engagement. Over 7 […]

Winning the Audience with AI: How Comcast Built An Agile Data And Ai Platform At Scale

Comcast is the largest cable and internet provider in the US, reaching more than 30 million customers. Over the last couple years, Comcast has transformed the customer experience using machine learning. For example, Comcast uses machine learning to power the X1 voice remote, which was used over 8B times in 2018 by our customers to […]

Building Data Pipelines Using Structured Streaming and Delta Lake

Given the rise of IoT and other real-time sources and businesses’ desire to draw fast insights, there is a growing imperative for data professionals to build streaming data pipelines. Given the plethora of different tools and frameworks in the big data community, it is challenging to architect such pipelines correctly that achieve the desired performance […]

Join the Delta Lake Community

Communicate with fellow Delta Lake users and contributors, ask questions and share tips
Slack ChannelGoogle Group


Project Governance

Delta Lake is an independent open-source project and not controlled by any single company. To emphasize this we joined the Delta Lake Project in 2019, which is a sub-project of the Linux Foundation Projects.

 

Within the project, we make decisions based on these rules.

Copyright © 2020 Delta Lake, a Series of LF Projects, LLC. For web site terms of use, trademark policy and other project policies please see https://lfprojects.org.
twitterstack-overflow