Salesforce Engineering: Global Synchronousness and Ordering in Delta Lake
At Salesforce, we maintain a platform to capture customer activity --- various kinds of sales events such as emails, meetings, and videos. These events are either consumed by downstream products in real time or stored in our data lake, which we built using Delta Lake. This data lake supports the creation of up-to-date dashboards by downstream components like Einstein Analytics and the training of machine learning models for our customers who are using Sales Cloud Einstein to intelligently convert their leads and create new opportunities.
One of the great features provided by Delta Lake is ACID Transactions. This feature is critical to maintain data integrity when multiple independent write streams are modifying the same delta table. Running this in the real world, we observe frequent Conflicting Commits errors which fail our pipeline. We realize that, while ACID Transactions maintain data integrity, there is no mechanism to resolve writing conflicts. In this blog, we share a solution to ensure global synchronousness and ordering of multiple process streams that perform concurrent writes to the shared Delta Lake. With this mechanism, we greatly improved our pipeline stability by eliminating Conflicting Commits errors and maintaining data integrity.