The Linux Foundation Projects
Delta Lake

Delta 4.2.0 Released

By Scott Haines , Zheng Hu , Alex Jiang

The Delta Lake 4 journey has marked a shift from the file system to the catalog. Each release has deepened support for catalog-managed tables and extended that design philosophy across the Delta ecosystem. Delta Lake 4.2 advances on two fronts: Kernel expands outward with a new Flink connector, streaming improvements, and broader data type support. Catalog-managed tables also mature with atomic operations, schema evolution from SQL, and synchronous UniForm.

Expanding the Delta Ecosystem with Delta Kernel

Delta Kernel has been the native and general Delta API for all engines to integrate with Delta Table, so that all engines will have the same and consistent behavior and semantics while interacting with Delta tables.

In 4.2, Delta-Kernel powers a brand-new Apache Flink connector with catalog-managed table support from day one. This connector replaces the legacy Flink connector that was deprecated back in 4.0 alongside Delta Standalone.

The new connector supports transactionally consistent writes coordinated through the catalog, with exactly-once semantics backed by a Flink Sink Writer and Committer. The connector is experimental today, and we’ll be marching toward stability in subsequent releases.

Let’s see what the new connector looks like in practice. Here, FlinkSQL is creating a new unity catalog-managed table with clickstream data and writing to it:

-- Create the clickstream landing table as a Unity Catalog managed table
CREATE TEMPORARY TABLE clickstream_raw (
  event_date STRING,
  event_type STRING,
  user_id STRING
) WITH (
  'connector' = 'delta',
  'table_name' = 'clickstream_raw',
  'unitycatalog.name' = 'prod',
  'unitycatalog.endpoint' = '<endpoint>',
  'unitycatalog.token' = '<token>',
  'partitions' = 'event_date',
  'uid' = 'clickstream-ingest'
);

-- Stream events into the table via Flink SQL
INSERT INTO clickstream_raw VALUES
  ('2026-04-20', 'click',    'user_1'),
  ('2026-04-20', 'purchase', 'user_2'),
  ('2026-04-22', 'click',    'user_4');

Schema Evolution

Schema evolution also gets smoother. INSERT INTO … BY NAME now supports automatic schema evolution when autoMerge is enabled, adding new columns to the table schema as part of the commit. For SQL-first teams, this removes one of the last reasons to drop into a DataFrame notebook just to evolve a schema. And a new table property (delta.stats.skipping.forceOptimizeStatsCollection) forces per-file stats collection during query planning, so data skipping works on newly-evolved columns immediately — no OPTIMIZE required.

For example, let’s say a clickstream has been missing device_type — the kind of surface (mobile, web, tablet) an event was recorded from. The upstream producer has already started emitting the new field into prod.consumer.clickstream_raw, and we want to fold it into the main table without a preliminary schema change.

SET spark.databricks.delta.schema.autoMerge.enabled = true;

INSERT INTO prod.consumer.clickstream BY NAME
SELECT event_date, event_type, user_id, device_type
FROM prod.consumer.clickstream_raw
WHERE event_date = '2026-04-23';

device_type is added to the table schema as part of the commit. Existing rows carry NULL for the new column, and every downstream reader continues to work without intervention. For SQL-first teams, this removes one of the last reasons to drop into a DataFrame notebook just to get a pipeline unstuck.

Data Type Support

In Delta Kernel, we add support for geospatial, collation, and variant types:

  • Geospatial support lands in Kernel for the first time: native reading and writing of geometry and geography columns, along with bounding-box data skipping to accelerate spatial queries at the protocol level.
  • Collation — locale-aware and case-insensitive string comparison at the column level — gains protocol-level support in Kernel, bringing Kernel-powered connectors in line with Spark’s existing capabilities.
  • Variant lets you store semi-structured data as a single column and query into it at read time. Variant shredding — which decomposes frequently accessed fields into separate columns for faster reads while preserving the full Variant payload — graduates from preview. The Spark connector also gains full schema conversion for Variant columns.

To put this in context, here’s how a clickstream pipeline can push event-specific properties into a single Variant payload:

CREATE TABLE prod.consumer.clickstream_v2 (
  event_date DATE,
  event_type STRING,
  user_id STRING,
  device_type STRING,
  properties VARIANT
)
USING DELTA
PARTITIONED BY (event_date);

INSERT INTO prod.consumer.clickstream_v2 BY NAME
SELECT event_date, event_type, user_id, device_type,
       parse_json(raw_properties) AS properties
FROM prod.consumer.clickstream_raw WHERE event_date = '2026-04-24';

Strengthening Catalog Managed Tables

Designing Delta Lake around the catalog creates a shared foundation between the two leading open table formats in the Lakehouse ecosystem: Apache Iceberg and Delta Lake. A centralized catalog can vend credentials, enforce governance, and connect multiple compute engines — bringing the two formats closer together with every release. 4.2 continues that work, and catalog-managed tables continue to mature.

Atomic RTAS and Dynamic Partition Overwrite

One of the most significant reliability upgrades in 4.2 is fully atomic execution of REPLACE TABLE AS SELECT (RTAS) and Dynamic Partition Overwrite (DPO) on catalog-managed tables. Previously, these operations lacked strict atomicity in certain managed environments, leaving the door open for partial failures to corrupt table state. In 4.2, both execute as single atomic commits — if an operation fails midway, the table state remains completely untouched. Readers never see a half-applied state.

Synchronous UniForm

Catalog-managed tables also unlock a long-awaited improvement in Delta UniForm. Iceberg metadata generation moves from asynchronous post-commit hooks to synchronous generation at commit time. The result is cleaner — Iceberg reads land immediately, not after an async hook eventually fires.

Additionally, support for the legacy Hive Metastore (HMS) in UniForm is deprecated. HMS has no concept of catalog-managed tables, and synchronous metadata generation requires a catalog that can broker commits.

Conclusion

Delta 4.2 advances on two fronts. Kernel powers more of the ecosystem with a new Flink connector, broader streaming support, and Variant, Collation, and geospatial capabilities. At the same time, catalog-managed tables get stronger — atomic RTAS and DPO, synchronous UniForm, and schema evolution from SQL. Together, they move Delta Lake closer to a future where the catalog centralizes and Kernel connects.

For the complete list of changes, fixes, and contributor acknowledgments, see the Delta 4.2.0 release notes on GitHub.

Follow our authors on LinkedIn