Take a walk through the daily struggles of a data engineer in this presentation as we cover what is truly needed to create robust end to end Big Data solutions.
We will discuss a popular online analytics processing (OLAP) fundamental - slowly changing dimensions (SCD) - specifically Type-2. As we have discussed in various other Delta Lake tech talks, the reliability brought to data lakes by Delta Lake has brought a resurgence of many of the data warehousing fundamentals such as Change Data Capture in […]
Cloud computing has fundamentally changed how companies operate - users are no longer subject to the restrictions of on-prem hardware deployments such as physical limits of resources and onerous environment upgrade processes. With the convenience and flexibility comes challenges on how to properly monitor how your users utilize these conveniently available resources. Failure to do […]
We are happy to have Matei Zaharia join this month’s Data and AI Talk Matei Zaharia is an assistant professor at Stanford CS, where he works on computer systems and machine learning as part of Stanford DAWN. He is also co-founder and Chief Technologist of Databricks, the data and AI platform startup. During his Ph.D., […]
Join us for an online tech talk on Delta Lake. Tech talks include a technical presentation with slides and a demo, with time for Q&A at the end. Abstract: The General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) both aim to guarantee strong protection for individuals regarding their personal data and […]
Join us for an online tech talk on Delta Lake. Tech talks include a technical presentation with slides and a demo, with time for Q&A at the end. While it is common to use Delta Lake as a sink for change data captured from traditional data sources; customers are increasingly asking how to use Delta […]
Join us for an online tech talk on Delta Lake. Tech talks include a technical presentation with slides and a demo, with time for Q&A at the end. Predictive Maintenance (PdM) is different from other routine or time-based maintenance approaches as it combines various sensor readings and sophisticated analytics on thousands of logged events in […]
In the earlier Delta Lake Internals tech talk series sessions, we described how the Delta Lake transaction log works. In this session, we will dive deeper into how commits, snapshot isolation, and partition and files change when performing deletes, updates, merges, and structured streaming. In this webinar you will learn about: - A quick primer […]
Online Tech Talk hosted by Denny Lee, Developer Advocate @ Databricks with Andreas Neumann, Staff Software Engineer @ Databricks Link to Slides - https://github.com/dennyglee/databric... Link to Notebook - https://github.com/dennyglee/databric... Link to Diving into Delta Lake Part 1: https://www.youtube.com/watch?v=F91G4... Link to Online Meetups Playlist: https://dbricks.co/youtube-meetups Abstract: Data, like our experiences, is always evolving and accumulating. To […]
Online Tech Talk hosted by Denny Lee, Developer Advocate @ Databricks with Burak Yavuz, Software Engineer @ Databricks Link to Notebook: https://github.com/dennyglee/databric... The transaction log is key to understanding Delta Lake because it is the common thread that runs through many of its most important features, including ACID transactions, scalable metadata handling, time travel, and […]
Online Tech Talk with Denny Lee, Developer Advocate @ Databricks A common data engineering pipeline architecture uses tables that correspond to different quality levels, progressively adding structure to the data: data ingestion (“Bronze” tables), transformation/feature engineering (“Silver” tables), and machine learning training or prediction (“Gold” tables). Combined, we refer to these tables as a “multi-hop” […]
Online Tech Talk with Denny Lee, Developer Advocate @ Databricks Lambda architecture is a popular technique where records are processed by a batch system and streaming system in parallel. The results are then combined during query time to provide a complete answer. Strict latency requirements to process old and recently generated events made this architecture […]
Within the project, we make decisions based on these rules.