[GH-3044] Fix aggregate function registration in sessions cloned from FunctionRegistry.builtin by jiayuasu · Pull Request #3050 · apache/sedona

jiayuasu · 2026-06-12T08:16:21Z

Did you read the Contributor Guide?

Yes, I have read the Contributor Rules and Contributor Development Guide

Is this PR related to a ticket?

Yes, and the PR name follows the format [GH-XXX] my subject. Closes In Sedona 1.9.0 aggregate functions fail on the second call within a Spark-Connect session #3044

What changes were proposed in this PR?

Sedona registers aggregate functions in two places: the real UDAF via udf.register in the session registry, and an entry in the global FunctionRegistry.builtin that throws UnsupportedOperationException on invocation (added in #2420 for permanent VIEW support). Every new SparkSession clones FunctionRegistry.builtin into its session registry, so any session created after the first SedonaContext.create() in the JVM starts out holding the throwing placeholder under the aggregate names.

The functionExists() guard added in #2642 mistook that cloned placeholder for a real registration and skipped udf.register, so the real UDAF was never installed and the session failed with Aggregate function ST_Envelope_Aggr cannot be used as a regular function. Spark Connect creates a new server-side session per client session, which is why the reported repro fails on the second session (1.8.1 re-registered unconditionally, which silently overwrote the cloned placeholder).

Two commits:

Re-register over the placeholder. registerAggregateFunction now probes the existing entry by invoking its builder: the placeholder throws, a real entry builds an expression. The real UDAF is re-registered only when the entry is not invocable, so the Simply the warning messages of SedonaContext.create() #2640 behavior (no re-registration noise on repeated create()) is preserved. Also extends the scalar skip check to cover Sedona expression classes under org.apache.spark.sql.sedona_sql, which the org.apache.sedona prefix never matched.
Back the builtin entry with the real builder. FunctionRegistry.builtin now receives the actual ScalaAggregator builder instead of a throwing placeholder, so cloned registries resolve Sedona aggregates by construction (even in sessions that never ran SedonaContext.create). The required Spark API is package-private and moved between versions, so it is loaded reflectively: UserDefinedAggregator.scalaAggregator on Spark 3.4/3.5, the ScalaAggregator companion apply on Spark 4.x. If neither is found, the previous placeholder behavior is kept and the probe from commit 1 still applies.

The first commit is intentionally minimal and self-contained so it can be cherry-picked to a 1.9.1 patch release on its own.

How was this patch tested?

Two new tests in aggregateFunctionTestScala:

ST_Envelope_Aggr, ST_Envelope_Agg, and ST_Union_Aggr in a newSession() after SedonaContext.create — reproduces In Sedona 1.9.0 aggregate functions fail on the second call within a Spark-Connect session #3044 without Spark Connect; fails on current master with the exact exception from the report.
ST_Envelope_Aggr in a newSession() that never ran SedonaContext.create — verifies the builtin entry is the real builder.

Full suite run locally on Spark 3.4 / Scala 2.12 / JDK 11 and Spark 4.0 / Scala 2.13 / JDK 17 (30/30 passing), covering both reflection paths.

Did this PR include necessary documentation updates?

No, this PR does not affect any public API so no need to change the documentation.

…der cloned into new sessions A new SparkSession clones FunctionRegistry.builtin, which contains Sedona's non-invocable placeholder for aggregate functions. The functionExists guard added for apacheGH-2640 mistook that placeholder for a real registration and skipped udf.register, so every session created after the first one (e.g. each Spark Connect session) failed with 'Aggregate function X cannot be used as a regular function'. Probe the registered builder instead and re-register when it is not invocable. Also extend the scalar skip check to cover Sedona expression classes under org.apache.spark.sql.sedona_sql, which the org.apache.sedona prefix missed.

…he real builder Register the actual ScalaAggregator builder in FunctionRegistry.builtin instead of a placeholder that throws on invocation, so sessions that clone the builtin registry resolve Sedona aggregates without depending on re-registration, and views referencing them resolve for real. The required Spark API is package-private and moved between versions, so it is loaded reflectively: UserDefinedAggregator.scalaAggregator on Spark 3.4/3.5, the ScalaAggregator companion on Spark 4.x. If neither is found the previous placeholder behavior is kept, where the apacheGH-3044 probe still applies.

Copilot

Pull request overview

Fixes Sedona SQL aggregate function resolution/registration for SparkSessions that are created by cloning FunctionRegistry.builtin (notably impacting Spark Connect), ensuring Sedona aggregate UDAFs are correctly invocable in later sessions and can also be resolved purely from the cloned built-in registry.

Changes:

Update aggregate registration to detect and overwrite non-invocable placeholder entries in a session’s function registry.
Register Sedona aggregates into FunctionRegistry.builtin using a real aggregate builder when possible (via Spark-version-specific reflective access), falling back to the prior placeholder only when necessary.
Add regression tests covering aggregate usage in a newSession() both with and without calling SedonaContext.create() on that session.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
spark/common/src/main/scala/org/apache/sedona/sql/UDF/AbstractCatalog.scala	Fixes session/builtin aggregate registration by probing invocability and (when possible) registering real built-in aggregate builders via reflection across Spark versions.
spark/common/src/test/scala/org/apache/sedona/sql/aggregateFunctionTestScala.scala	Adds regression tests reproducing #3044 and validating aggregate resolution in cloned/new sessions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

All expressions registered through AbstractCatalog live under org.apache.spark.sql.sedona_sql; the previous wording claimed classes exist under org.apache.sedona as well, which is not the case for the registered set.

jiayuasu added 2 commits June 12, 2026 00:51

jiayuasu requested a review from Copilot June 12, 2026 08:20

Copilot started reviewing on behalf of jiayuasu June 12, 2026 08:20 View session

Copilot AI reviewed Jun 12, 2026

View reviewed changes

jiayuasu added this to the sedona-1.9.1 milestone Jun 12, 2026

jiayuasu mentioned this pull request Jun 13, 2026

In Sedona 1.9.0 aggregate functions fail on the second call within a Spark-Connect session #3044

Closed

jiayuasu added the bug label Jun 13, 2026

jiayuasu merged commit 02f54ab into apache:master Jun 13, 2026
42 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GH-3044] Fix aggregate function registration in sessions cloned from FunctionRegistry.builtin#3050

[GH-3044] Fix aggregate function registration in sessions cloned from FunctionRegistry.builtin#3050
jiayuasu merged 3 commits into
apache:masterfrom
jiayuasu:fix/aggregate-registration-new-session

jiayuasu commented Jun 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jiayuasu commented Jun 12, 2026

Did you read the Contributor Guide?

Is this PR related to a ticket?

What changes were proposed in this PR?

How was this patch tested?

Did this PR include necessary documentation updates?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants