Skip to content

[GH-3044] Fix aggregate function registration in sessions cloned from FunctionRegistry.builtin#3050

Merged
jiayuasu merged 3 commits into
apache:masterfrom
jiayuasu:fix/aggregate-registration-new-session
Jun 13, 2026
Merged

[GH-3044] Fix aggregate function registration in sessions cloned from FunctionRegistry.builtin#3050
jiayuasu merged 3 commits into
apache:masterfrom
jiayuasu:fix/aggregate-registration-new-session

Conversation

@jiayuasu

Copy link
Copy Markdown
Member

Did you read the Contributor Guide?

Is this PR related to a ticket?

What changes were proposed in this PR?

Sedona registers aggregate functions in two places: the real UDAF via udf.register in the session registry, and an entry in the global FunctionRegistry.builtin that throws UnsupportedOperationException on invocation (added in #2420 for permanent VIEW support). Every new SparkSession clones FunctionRegistry.builtin into its session registry, so any session created after the first SedonaContext.create() in the JVM starts out holding the throwing placeholder under the aggregate names.

The functionExists() guard added in #2642 mistook that cloned placeholder for a real registration and skipped udf.register, so the real UDAF was never installed and the session failed with Aggregate function ST_Envelope_Aggr cannot be used as a regular function. Spark Connect creates a new server-side session per client session, which is why the reported repro fails on the second session (1.8.1 re-registered unconditionally, which silently overwrote the cloned placeholder).

Two commits:

  1. Re-register over the placeholder. registerAggregateFunction now probes the existing entry by invoking its builder: the placeholder throws, a real entry builds an expression. The real UDAF is re-registered only when the entry is not invocable, so the Simply the warning messages of SedonaContext.create() #2640 behavior (no re-registration noise on repeated create()) is preserved. Also extends the scalar skip check to cover Sedona expression classes under org.apache.spark.sql.sedona_sql, which the org.apache.sedona prefix never matched.
  2. Back the builtin entry with the real builder. FunctionRegistry.builtin now receives the actual ScalaAggregator builder instead of a throwing placeholder, so cloned registries resolve Sedona aggregates by construction (even in sessions that never ran SedonaContext.create). The required Spark API is package-private and moved between versions, so it is loaded reflectively: UserDefinedAggregator.scalaAggregator on Spark 3.4/3.5, the ScalaAggregator companion apply on Spark 4.x. If neither is found, the previous placeholder behavior is kept and the probe from commit 1 still applies.

The first commit is intentionally minimal and self-contained so it can be cherry-picked to a 1.9.1 patch release on its own.

How was this patch tested?

Two new tests in aggregateFunctionTestScala:

Full suite run locally on Spark 3.4 / Scala 2.12 / JDK 11 and Spark 4.0 / Scala 2.13 / JDK 17 (30/30 passing), covering both reflection paths.

Did this PR include necessary documentation updates?

  • No, this PR does not affect any public API so no need to change the documentation.

jiayuasu added 2 commits June 12, 2026 00:51
…der cloned into new sessions

A new SparkSession clones FunctionRegistry.builtin, which contains Sedona's
non-invocable placeholder for aggregate functions. The functionExists guard
added for apacheGH-2640 mistook that placeholder for a real registration and
skipped udf.register, so every session created after the first one (e.g.
each Spark Connect session) failed with 'Aggregate function X cannot be
used as a regular function'. Probe the registered builder instead and
re-register when it is not invocable.

Also extend the scalar skip check to cover Sedona expression classes under
org.apache.spark.sql.sedona_sql, which the org.apache.sedona prefix missed.
…he real builder

Register the actual ScalaAggregator builder in FunctionRegistry.builtin
instead of a placeholder that throws on invocation, so sessions that clone
the builtin registry resolve Sedona aggregates without depending on
re-registration, and views referencing them resolve for real. The required
Spark API is package-private and moved between versions, so it is loaded
reflectively: UserDefinedAggregator.scalaAggregator on Spark 3.4/3.5, the
ScalaAggregator companion on Spark 4.x. If neither is found the previous
placeholder behavior is kept, where the apacheGH-3044 probe still applies.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes Sedona SQL aggregate function resolution/registration for SparkSessions that are created by cloning FunctionRegistry.builtin (notably impacting Spark Connect), ensuring Sedona aggregate UDAFs are correctly invocable in later sessions and can also be resolved purely from the cloned built-in registry.

Changes:

  • Update aggregate registration to detect and overwrite non-invocable placeholder entries in a session’s function registry.
  • Register Sedona aggregates into FunctionRegistry.builtin using a real aggregate builder when possible (via Spark-version-specific reflective access), falling back to the prior placeholder only when necessary.
  • Add regression tests covering aggregate usage in a newSession() both with and without calling SedonaContext.create() on that session.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
spark/common/src/main/scala/org/apache/sedona/sql/UDF/AbstractCatalog.scala Fixes session/builtin aggregate registration by probing invocability and (when possible) registering real built-in aggregate builders via reflection across Spark versions.
spark/common/src/test/scala/org/apache/sedona/sql/aggregateFunctionTestScala.scala Adds regression tests reproducing #3044 and validating aggregate resolution in cloned/new sessions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

All expressions registered through AbstractCatalog live under
org.apache.spark.sql.sedona_sql; the previous wording claimed classes
exist under org.apache.sedona as well, which is not the case for the
registered set.
@jiayuasu jiayuasu added this to the sedona-1.9.1 milestone Jun 12, 2026
@jiayuasu jiayuasu added the bug label Jun 13, 2026
@jiayuasu jiayuasu merged commit 02f54ab into apache:master Jun 13, 2026
42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

In Sedona 1.9.0 aggregate functions fail on the second call within a Spark-Connect session

2 participants