[SPARK-55716][SQL] Support NOT NULL constraint enforcement for V1 file source table inserts by yaooqinn · Pull Request #54517 · apache/spark

yaooqinn · 2026-02-26T17:15:28Z

What changes were proposed in this pull request?

V1 file-based DataSource writes (Parquet, ORC, JSON, etc.) silently accept null values into NOT NULL columns. This PR adds opt-in NOT NULL constraint enforcement controlled by spark.sql.fileSource.insert.enforceNotNull.

Changes:

CreateDataSourceTableCommand: Preserves user-specified nullability by recursively merging nullability flags from the user schema into the resolved dataSource.schema. Previously it stored dataSource.schema directly, which is all-nullable due to DataSource.resolveRelation() calling dataSchema.asNullable (SPARK-13738).
PreprocessTableInsertion: Restores nullability flags from the catalog schema before null checks, ensuring AssertNotNull is injected when needed. Gated behind spark.sql.fileSource.insert.enforceNotNull.
New config: spark.sql.fileSource.insert.enforceNotNull (default false) — when set to true, enables NOT NULL constraint enforcement for V1 file-based tables, consistent with the behavior for other data sources and V2 catalog tables.
SparkGetColumnsOperation: Fixed IS_NULLABLE to respect column.nullable instead of always returning "YES".

Why are the changes needed?

DataSource.resolveRelation() calls dataSchema.asNullable (added in SPARK-13738 for read safety), which strips all NOT NULL constraints recursively. CreateDataSourceTableCommand then stores this all-nullable schema in the catalog, permanently losing NOT NULL information. As a result, PreprocessTableInsertion never injects AssertNotNull for V1 file source tables.

Note: InsertableRelation (e.g., SimpleInsertSource) does NOT have this problem because it preserves the original schema (SPARK-24583).

Does this PR introduce any user-facing change?

No change in default behavior. Users can opt in to NOT NULL enforcement for V1 file source tables by setting spark.sql.fileSource.insert.enforceNotNull to true.

How was this patch tested?

Added 7 new tests in InsertSuite covering top-level, nested struct, array, and map null constraint enforcement.
Fixed 3 existing interval column test assertions in SparkMetadataOperationSuite.

Was this patch authored or co-authored using generative AI tooling?

Yes, co-authored with GitHub Copilot.

yaooqinn · 2026-02-26T17:32:29Z

cc @dongjoon-hyun @cloud-fan @gengliangwang, Is the fix direction correct? is this a genuine bug or design choice. I haven't found any public discussions on this area.

dongjoon-hyun · 2026-02-26T17:54:25Z

Hi, @yaooqinn

Apache Spark didn't claim to support SQL CONSTRAINT for v1 yet, did it? IIUC, it should be handled explicitly by the user in FROM SELECT clause of that INSERT statement.
SPARK-51207: SPIP: Constraints in DSv2 was a fairly new feature of Apache Spark 4.1.0 only.

For me, this PR seems to introduce a new feature instead of bugs.

cc @aokolnychyi , @peter-toth , too.

yaooqinn · 2026-02-27T02:47:24Z

Apache Spark didn't claim to support SQL CONSTRAINT for v1 yet, did it? IIUC, it should be handled explicitly by the user in FROM SELECT clause of that INSERT statement.

@dongjoon-hyun
I'm not sure,StoreAssignmentPolicy is introduced in 3.0, in the meanwhile NOT NULL keywords are long-term support（but may not be fully or well documented） features. It looks like that V2 support is an addition based on that. In legacy v1, it partially work for some datasources with no asNullable called

dongjoon-hyun · 2026-02-27T03:52:45Z

IIUC, StoreAssignmentPolicy is supposed to check type casting only. For a single nullable type, I'm not sure that was achieved by SPARK-28730.

[SPARK-28730][SQL] Configurable type coercion policy for table insertion #25453

dongjoon-hyun · 2026-02-27T03:55:14Z

Anyway, let's wait for @gengliangwang and @cloud-fan 's comment. Maybe, I lost the track of code change.

yaooqinn · 2026-02-27T05:56:24Z

IIUC, StoreAssignmentPolicy is supposed to check type casting only.

@dongjoon-hyun
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala#L447

FYI, Null constraints for table output relies on it too

dongjoon-hyun · 2026-02-27T06:22:19Z

Got it. So, from Apache Spark 3.5.0 via the following? Then, cc @aokolnychyi , too.

cloud-fan · 2026-02-27T16:01:18Z

This is probably a too breaking change, and many users may already treat this bug as a feature.

If users need strict not null enforcement, they should migrate to DS v2.

gengliangwang · 2026-02-27T16:40:59Z

+1 with @cloud-fan . At least, the default behavior should not be changed in 4.2.

yaooqinn · 2026-02-27T16:53:16Z

Yeah, I agree with you that this is too breaking. However, many of our clients and developers here have raised doubts about this strange behavior of Spark. There are no clear docs for this.

they should migrate to DS v2.

Do you mean by using built-in sources with v2 code path or suggesting users migrate to third-party formats?

cloud-fan · 2026-02-27T17:03:39Z

I think using plain file source table is not recommended now, people should switch to lakehouse table formats.

dongjoon-hyun · 2026-02-27T17:15:10Z

Since DSv2 migration is an independent topic from this DSv1 bug, I made a PR for DSv2 specifically, @cloud-fan , @gengliangwang , @yaooqinn .

[SPARK-55748][SQL] Use DSv2 for avro|csv|json|kafka|orc|parquet|text by default #54547

yaooqinn · 2026-02-27T17:30:59Z

Hi @dongjoon-hyun builtin file sources with DSv2 code path have the same issue based on my unit tests

yaooqinn · 2026-02-27T17:36:12Z

@cloud-fan Instead of making NOT NULL as a default behavior, would you mind if we give it a shot that provide an alternative option for users if they want built-in parquet and delta lake to behave the same on NOT NULL constraints?

For steaming source, we have such an option for toggling asNullable.

…ble inserts V1 file-based DataSource writes (parquet/orc/json) silently accept null values into NOT NULL columns. This PR adds opt-in NOT NULL constraint enforcement by: 1. CreateDataSourceTableCommand: Preserve user-specified nullability by recursively merging nullability flags from the user schema into the resolved dataSource.schema. 2. PreprocessTableInsertion: Restore nullability flags from the catalog schema before null checks. This ensures AssertNotNull is injected when needed, gated by spark.sql.fileSource.insert.enforceNotNull. 3. Config: spark.sql.fileSource.insert.enforceNotNull (default false) - when true, enforces NOT NULL constraints for file-based tables. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

yaooqinn force-pushed the SPARK-55716 branch from 30258f5 to 8aeab4f Compare February 26, 2026 17:18

yaooqinn requested review from cloud-fan and dongjoon-hyun February 26, 2026 17:23

yaooqinn force-pushed the SPARK-55716 branch from 8aeab4f to 28f9608 Compare February 27, 2026 03:04

yaooqinn closed this Feb 27, 2026

yaooqinn force-pushed the SPARK-55716 branch from 28f9608 to 4b7a2fd Compare February 27, 2026 06:55

yaooqinn reopened this Feb 27, 2026

yaooqinn force-pushed the SPARK-55716 branch from 4b7a2fd to 68ad307 Compare February 27, 2026 11:39

yaooqinn force-pushed the SPARK-55716 branch from 68ad307 to d9631e2 Compare February 28, 2026 04:51

yaooqinn force-pushed the SPARK-55716 branch from d9631e2 to dd394fc Compare February 28, 2026 05:57

yaooqinn changed the title ~~[SPARK-55716][SQL] Fix V1 file source NOT NULL constraint enforcement~~ [SPARK-55716][SQL] Support NOT NULL constraint enforcement for V1 file source table inserts Feb 28, 2026

Conversation

yaooqinn commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

yaooqinn commented Feb 26, 2026

Uh oh!

dongjoon-hyun commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yaooqinn commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yaooqinn commented Feb 27, 2026

Uh oh!

dongjoon-hyun commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloud-fan commented Feb 27, 2026

Uh oh!

gengliangwang commented Feb 27, 2026

Uh oh!

yaooqinn commented Feb 27, 2026

Uh oh!

cloud-fan commented Feb 27, 2026

Uh oh!

dongjoon-hyun commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yaooqinn commented Feb 27, 2026

Uh oh!

yaooqinn commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yaooqinn commented Feb 26, 2026 •

edited

Loading

dongjoon-hyun commented Feb 26, 2026 •

edited

Loading

yaooqinn commented Feb 27, 2026 •

edited

Loading

dongjoon-hyun commented Feb 27, 2026 •

edited

Loading

dongjoon-hyun commented Feb 27, 2026 •

edited

Loading

dongjoon-hyun commented Feb 27, 2026 •

edited

Loading

dongjoon-hyun commented Feb 27, 2026 •

edited

Loading

yaooqinn commented Feb 27, 2026 •

edited

Loading