Delta Lake Newsletter, 2019-10-03 Edition
October 3, 2019
This edition of the Delta Lake Newsletter, find out more about the latest and upcoming webinars, meetups, and publications. For this edition, we will also focus on the many sessions at Spark+AI Summit EU 2019 in Amsterdam. Please share this newsletter with anyone who would like to know more about Delta Lake!
With the release of Delta Lake 0.4.0, we have also published the blog Simple, Reliable Upserts and Deletes on Delta Lake Tables using Python APIs. A key feature of this release is the Python APIs - review the blog to see how you can get started with Delta Lake in minutes!
- Diving Into Delta Lake: Schema Enforcement & Evolution: Understand the details behind how schema enforcement and schema evolution work with this Diving into Delta Lake blog. This is a follow up to popular Diving Into Delta Lake: Unpacking The Transaction Log.
- Brand Safety with Structured Streaming, Delta Lake, and Databricks: To ensure the safety of a company’s brand when serving advertisements, Eyeview uses Delta Lake to support its streaming and batch scenarios while also solving issues surrounding underutilized resources and concurrent reads.
- Engineering population scale Genome-Wide Association Studies with Apache Spark™, Delta Lake, and MLflow: To perform populate-scale Genome-Wide Association Studies with Apache Spark, the underlying storage structure to support streaming and batch with ACID transactions is Delta Lake.
- Scaling Bioinformatics Methods with Apache Spark™: Parallelizing SAIGE Across Hundreds of Cores: To parallelize Genome-Wide Association Studies workflows such as SAIGE to hundreds of cores, the new Pipe Transformer tool integrates command-line tools with Apache Spark and Delta Lake.
- Monitor Medical Device Data with Machine Learning using Delta Lake, Keras and MLflow: A follow up for the Automated Monitoring of Medical Device Data with Data Science webinar where you can learn to build a streaming pipeline for EKG data using Structured Streaming and Delta Lake.
- Productionizing Machine Learning with Delta Lake: Learn how data scientists can productionize their machine learning workflows by having reliable data with Delta Lake and MLflow.
WebinarsOur most recent webinar was with Tathagata Das on his insightful session Building Data Pipelines Using Delta Lake and Structured Streaming. Previous webinars that have a lot of great Delta Lake information include:
- Simplify and Scale Data Engineering Pipelines with Delta Lake
- Delta Architecture, a step beyond Lambda Architecture
- Making Apache Spark™ Better with Delta Lake
- Getting Data Ready for Data Science
- Delta Lake - Open Source Reliability for Data Lakes
- Simplifying Streaming Analytics using Delta Lake and Apache Spark™
Spark+AI Summit Delta Lake SessionsAt the Spark+AI Summit EU 2019 in Amsterdam, there are a lot of great Delta Lake sessions!
- Building Data Pipelines for Apache Spark™ with Delta Lake Training Session
- New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, and Koalas Keynote with Michael Armbrust and Brooke Wenig
- Ask Me Anything (AMA): Delta Lake
- Building Reliable Data Lakes at Scale with Delta Lake (Tutorials)
- Building an AI-Powered Retail Experience with Delta Lake, Spark, and Databricks
- Simplify and Scale Data Engineering Pipelines with Delta Lake
- Powering Asurion’s Connected Home Platform with Spark Structured Streaming, Delta Lake, and MLflow
- Petabytes, Exabytes, and Beyond: Managing Delta Lakes for Interactive Queries at Scale
- Power Your Delta Lake with Streaming Transactional Changes
- Driver Location Intelligence at Scale using Apache Spark, Delta Lake, and MLflow on Databricks
- Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Architect Things Right
- Building Data Intensive Analytic Application on Top of Delta Lakes
- Data Reproducibility, Audits, Immediate Rollbacks, and Other Applications of Time Travel with Delta Lake
- Near Real-Time Data Warehousing with Apache Spark and Delta Lake
Meetups and EventsLast week, we had a great Delta Lake session in Seattle, WA (United States) featuring Michael Armbrust presenting at Delta Lake: Open Source Reliability for Data Lake with Apache Spark™. Featured here is Michael Armbrust speaking about Delta Lake and Judy Nash (organizer of the event at Salesforce Bellevue Offices) and Michael taking a picture with the Apache Spark cake! In the past month, we’ve had the fol
- 2019-08-22: Mahdi Askari presenting Making Apache Spark™ Better with Delta Lake in Melbourne, Australia.
- 2019-08-27: (Bonus) Denny Lee presenting Simplifying the Machine Learning Lifecycle with MLflow and Koalas in Seattle, USA
- 2019-09-03: Daniel Arrizza presenting Making Apache Spark™ Better with Delta Lake in Montréal, Canada
- 2019-09-04: Tathagata Das presenting Making Data Lakes More Reliable with Apache Spark in Portland, United States
- 2019-09-05: Mladen Kovacevic presenting Making Apache Spark™ Better with Delta Lake in Toronto, Canada.
- 2019-09-05: Quentin Ambard presenting Delta Lake: Bring data reliability and performance to your data lakes in Paris, France.
- 2019-09-12: Sajith Appukuttan presenting Delta Lake: Open Source Reliability w/ Apache Spark™ in Vancouver, Canada. Bonus, Bilal Obeidat presenting Customer (Residual) Lifetime Value CLV on Databricks.
- 2019-09-18: Boudewijn Braams presenting Parquet Optimisations and Building Spark Data Pipelines in London, England.
- 2019-09-19: Reza Soltani Rezvandeh from Antares presenting Making Apache Spark™ Better with Delta Lake
- 2019-09-24: Vincent Jolivet presenting Delta Lake: Open Source Reliability for Data Lake with Apache Spark™ in Lisbon, Portugal
- 2019-09-26: Xiao Li presenting New Developments in Open Source Ecosystem: Apache Spark 3.0, Koalas, Delta Lake in Hangzhou, China.
- 2019-10-03: Daniel Arriza presenting Productionizing Machine Learning with Delta Lake, Koalas, and MLflow in Toronto, Canada
- 2019-10-08: Daniel Arriza presenting Making Apache Spark™ Better with Delta Lake in Montreal, Canada
- 2019-10-17: Xiao Li presenting Delta Lake: Open Source Reliability for Data Lake with Apache Spark™ at QCon in Shanghai (10/17/2019-10/19/2019)
- 2019-10-24: [Bonus Session]: Matei Zaharia presenting Simplifying Production Machine Learning with MLflow in Toronto, Canada.
We are also planning more meetups - and if you’re interested in presenting or hosting one, please contact us! If you have any questions about how to run a meetup please do not hesitate to ping us via the Delta Lake Slack #events channel.
If you have any questions or feedback, please do not hesitate to provide feedback on the #deltalake-oss Slack channel. Join the Delta Lake Channel (Register | Login) and join the Delta Users Email Distribution List today!
Thanks!Denny Lee, Developer Advocate