[GH-3044] Fix aggregate function registration in sessions cloned from FunctionRegistry.builtin#3050
Merged
jiayuasu merged 3 commits intoJun 13, 2026
Conversation
…der cloned into new sessions A new SparkSession clones FunctionRegistry.builtin, which contains Sedona's non-invocable placeholder for aggregate functions. The functionExists guard added for apacheGH-2640 mistook that placeholder for a real registration and skipped udf.register, so every session created after the first one (e.g. each Spark Connect session) failed with 'Aggregate function X cannot be used as a regular function'. Probe the registered builder instead and re-register when it is not invocable. Also extend the scalar skip check to cover Sedona expression classes under org.apache.spark.sql.sedona_sql, which the org.apache.sedona prefix missed.
…he real builder Register the actual ScalaAggregator builder in FunctionRegistry.builtin instead of a placeholder that throws on invocation, so sessions that clone the builtin registry resolve Sedona aggregates without depending on re-registration, and views referencing them resolve for real. The required Spark API is package-private and moved between versions, so it is loaded reflectively: UserDefinedAggregator.scalaAggregator on Spark 3.4/3.5, the ScalaAggregator companion on Spark 4.x. If neither is found the previous placeholder behavior is kept, where the apacheGH-3044 probe still applies.
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes Sedona SQL aggregate function resolution/registration for SparkSessions that are created by cloning FunctionRegistry.builtin (notably impacting Spark Connect), ensuring Sedona aggregate UDAFs are correctly invocable in later sessions and can also be resolved purely from the cloned built-in registry.
Changes:
- Update aggregate registration to detect and overwrite non-invocable placeholder entries in a session’s function registry.
- Register Sedona aggregates into
FunctionRegistry.builtinusing a real aggregate builder when possible (via Spark-version-specific reflective access), falling back to the prior placeholder only when necessary. - Add regression tests covering aggregate usage in a
newSession()both with and without callingSedonaContext.create()on that session.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| spark/common/src/main/scala/org/apache/sedona/sql/UDF/AbstractCatalog.scala | Fixes session/builtin aggregate registration by probing invocability and (when possible) registering real built-in aggregate builders via reflection across Spark versions. |
| spark/common/src/test/scala/org/apache/sedona/sql/aggregateFunctionTestScala.scala | Adds regression tests reproducing #3044 and validating aggregate resolution in cloned/new sessions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
All expressions registered through AbstractCatalog live under org.apache.spark.sql.sedona_sql; the previous wording claimed classes exist under org.apache.sedona as well, which is not the case for the registered set.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Did you read the Contributor Guide?
Is this PR related to a ticket?
[GH-XXX] my subject. Closes In Sedona 1.9.0 aggregate functions fail on the second call within a Spark-Connect session #3044What changes were proposed in this PR?
Sedona registers aggregate functions in two places: the real UDAF via
udf.registerin the session registry, and an entry in the globalFunctionRegistry.builtinthat throwsUnsupportedOperationExceptionon invocation (added in #2420 for permanent VIEW support). Every new SparkSession clonesFunctionRegistry.builtininto its session registry, so any session created after the firstSedonaContext.create()in the JVM starts out holding the throwing placeholder under the aggregate names.The
functionExists()guard added in #2642 mistook that cloned placeholder for a real registration and skippedudf.register, so the real UDAF was never installed and the session failed withAggregate function ST_Envelope_Aggr cannot be used as a regular function. Spark Connect creates a new server-side session per client session, which is why the reported repro fails on the second session (1.8.1 re-registered unconditionally, which silently overwrote the cloned placeholder).Two commits:
registerAggregateFunctionnow probes the existing entry by invoking its builder: the placeholder throws, a real entry builds an expression. The real UDAF is re-registered only when the entry is not invocable, so the Simply the warning messages of SedonaContext.create() #2640 behavior (no re-registration noise on repeatedcreate()) is preserved. Also extends the scalar skip check to cover Sedona expression classes underorg.apache.spark.sql.sedona_sql, which theorg.apache.sedonaprefix never matched.FunctionRegistry.builtinnow receives the actualScalaAggregatorbuilder instead of a throwing placeholder, so cloned registries resolve Sedona aggregates by construction (even in sessions that never ranSedonaContext.create). The required Spark API is package-private and moved between versions, so it is loaded reflectively:UserDefinedAggregator.scalaAggregatoron Spark 3.4/3.5, theScalaAggregatorcompanion apply on Spark 4.x. If neither is found, the previous placeholder behavior is kept and the probe from commit 1 still applies.The first commit is intentionally minimal and self-contained so it can be cherry-picked to a 1.9.1 patch release on its own.
How was this patch tested?
Two new tests in
aggregateFunctionTestScala:ST_Envelope_Aggr,ST_Envelope_Agg, andST_Union_Aggrin anewSession()afterSedonaContext.create— reproduces In Sedona 1.9.0 aggregate functions fail on the second call within a Spark-Connect session #3044 without Spark Connect; fails on current master with the exact exception from the report.ST_Envelope_Aggrin anewSession()that never ranSedonaContext.create— verifies the builtin entry is the real builder.Full suite run locally on Spark 3.4 / Scala 2.12 / JDK 11 and Spark 4.0 / Scala 2.13 / JDK 17 (30/30 passing), covering both reflection paths.
Did this PR include necessary documentation updates?