Delta Lake Blogs

Why we migrated to a Data Lakehouse on Delta Lake for T-Mobile Data Science and Analytics Team

September 14, 2022 by Robert Thompson, Geoff Freeman

In this post, we will discuss the how and why we migrated from databases and data lakes to a data lakehouse on Delta Lake. Our lakehouse architecture allows reading and writing of data without blocking and scales out linearly. Business partners can easily adopt advanced analytics and derive new insights. These new insights promote innovation across disparate workstreams and solidify the decentralized approach to analytics taken by T-Mobile.

Multi-cluster writes to Delta Lake Storage in S3

May 18, 2022 by Scott Sandre, Denny Lee, Mariusz Kryński (Samba TV)

While Delta Lake has supported concurrent reads from multiple clusters since its inception, there were limitations for multi-cluster writes specifically to Amazon S3. Note, this was not a limitation for Azure ADLSgen2 nor Google GCS, as S3 currently lacks “put-If-Absent” consistency guarantees. Thus, to guarantee ACID transactions on S3, one would need to have concurrent writes originating from the same Apache Spark™ driver. This was one of the most requested issues by the community and we are excited to announce that Delta Lake 1.2 (release notes, blog) now supports writing data from multiple clusters to S3 while maintaining the transactionality of the writes.