[SPARK-55690] Schema evolution in DSv2 AppendData, OverwriteByExpression, OverwritePartitionsDynamic by johanl-db · Pull Request #54488 · apache/spark

johanl-db · 2026-02-25T17:01:40Z

What changes were proposed in this pull request?

Adds support for schema evolution during INSERT operations (AppendData, OverwriteByExpression, OverwritePartitionsDynamic)

When the table reports capability AUTOMATIC_SCHEMA_EVOLUTION, a new analyzer rule ResolveInsertSchemaEvolution collects new columns and nested fields present in the source query but not in the table schema, and adds them to the target table by calling catalog.alterTable()

Identifying new columns/fields respects the resolution semantics of INSERT operations: matching fields by-name vs by-position.

This builds on previous from @szehon-ho , in particular #51698.
The first two commits move this previous code around to reuse it, the core of the implementation is in the third commit.

Why are the changes needed?

The WITH SCHEMA EVOLUTION syntax for SQL inserts was added recently: #53732. This actually implements schema evolution behind this syntax.

Does this PR introduce any user-facing change?

Yes, when the WITH SCHEMA EVOLUTION clause is specified in SQL INSERT operations, new columns and nested fields in the source data will be added to the target table - assuming the data source supports schema evolution (capability AUTOMATIC_SCHEMA_EVOLUTION):

CREATE TABLE target (id INT);
INSERT INTO target VALUES (1);
INSERT WITH SCHEMA EVOLUTION INTO target SELECT 2 AS id, "two" AS value;

SELECT * FROM target;
| id | value |
|----|-------|
| 1  |  null |
| 2  | "two" |

How was this patch tested?

Added basic testing in DataSourceV2SQLSuite.
Integrated with Delta and ran extensive Delta test harness for schema evolution against this implementation.
See delta-io/delta#6140. A number of expected failures for tests that would need to be updated on Delta side (different error class returned, negative tests checking something specifically doesn't work if a fix is disabled, ..)

…volution.scala

szehon-ho

Thanks, I think this is a great pr! The tests coverage can be improved on various cases, but functionally its a good change:
eg

INSERT OVERWRITE with PartitionOverwriteMode.DYNAMIC + schema evolution
Case-insensitive column name matching
Static partition overwrite with schema evolution
Table without AUTOMATIC_SCHEMA_EVOLUTION capability should no-op

etc

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSchemaEvolution.scala

szehon-ho

This looks good to me!

suggestion: add tests like:

type evolution
2 level structs
non-partitioned table
constraints

Also do we run the same tests Dataframe API? (I think we only test with SQL?)

szehon-ho · 2026-02-27T00:13:21Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSchemaEvolution.scala

+    updates ++ adds
+  }
+
+  def toFieldMap(fields: Array[StructField]): Map[String, StructField] = {


nit: private

szehon-ho · 2026-02-27T00:15:06Z

sql/core/src/test/scala/org/apache/spark/sql/connector/InsertIntoTests.scala

+      sql(s"CREATE TABLE $t1 (id bigint, data string) USING $v2Format")
+      checkError(
+        exception = intercept[AnalysisException] {
+          doInsert(t1, Seq((2L, "b", true)).toDF("id", "data", "active"))


miss byName

szehon-ho

lgtm! cc @aokolnychyi @cloud-fan

johanl-db added 4 commits February 25, 2026 16:06

Move ResolveMergeIntoSchemaEvolution.scala -> ResolveMergeIntoSchemaE…

b3963d0

…volution.scala

Move schemaChanges from MergeIntoTable to ResolveSchemaEvolution

05b32cb

Schema evolution for DSv2 INSERT

7be9d2a

Merge branch 'master' into dsv2-schema-evolution-insert

6bc2956

szehon-ho reviewed Feb 26, 2026

View reviewed changes

johanl-db mentioned this pull request Feb 26, 2026

[Spark][Prototype] Schema evolution in DSv2 INSERT delta-io/delta#6140

Open

Add tests, address comments

a3042e3

johanl-db changed the title ~~[WIP][SPARK-55690] Schema evolution in DSv2 AppendData, OverwriteByExpression, OverwritePartitionsDynamic~~ [SPARK-55690] Schema evolution in DSv2 AppendData, OverwriteByExpression, OverwritePartitionsDynamic Feb 26, 2026

johanl-db requested a review from szehon-ho February 26, 2026 14:09

szehon-ho reviewed Feb 27, 2026

View reviewed changes

Add tests

aeae2b4

szehon-ho approved these changes Feb 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-55690] Schema evolution in DSv2 AppendData, OverwriteByExpression, OverwritePartitionsDynamic#54488

[SPARK-55690] Schema evolution in DSv2 AppendData, OverwriteByExpression, OverwritePartitionsDynamic#54488
johanl-db wants to merge 6 commits intoapache:masterfrom
johanl-db:dsv2-schema-evolution-insert

johanl-db commented Feb 25, 2026 •

edited

Loading

Uh oh!

szehon-ho left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

szehon-ho left a comment

Uh oh!

szehon-ho Feb 27, 2026

Uh oh!

szehon-ho Feb 27, 2026

Uh oh!

szehon-ho left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

johanl-db commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

szehon-ho left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

szehon-ho left a comment

Choose a reason for hiding this comment

Uh oh!

szehon-ho Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

szehon-ho Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

szehon-ho left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

johanl-db commented Feb 25, 2026 •

edited

Loading