reverse_tunnel: proactively replace draining connections instead of GOAWAY by aakugan · Pull Request #45519 · envoyproxy/envoy

aakugan · 2026-06-09T10:41:09Z

Commit Message

reverse_tunnel: proactively replace draining connections instead of GOAWAY

Description

Reworks the drain-aware HTTP connection manager used by reverse tunnels so that draining a connection no longer creates a window where the cluster has nowhere to route.

Why this change:
The HCM drains an HTTP/2 connection in two phases — shutdownNotice() (a graceful GOAWAY that stops new streams) followed by goAway() after drain_timeout. The problem for reverse tunnels: the moment a GOAWAY is observed, the responder's upstream nghttp2/oghttp2 adapter marks the connection Draining and refuses new streams on it. If no replacement tunnel exists yet, requests fail (503s) until a fresh tunnel is independently established.

What we do now:

DrainAwareServerConnection wraps the server codec and swallows the phase-1 shutdownNotice() so no premature GOAWAY is emitted. The only GOAWAY ever put on the wire is the single, guarded final goAway().
There are two drain triggers, both handled:
1. Connection-level drain (max_connection_duration, max_requests, ...): the HCM drives it. We swallow shutdownNotice() and instead call reestablishConnection() to bring up a REPLACEMENT reverse tunnel while the old connection keeps serving
  during drain_timeout. The HCM's own goAway() then closes the old connection, by which point the cluster already has the new tunnel to route to.
2. Listener-level drain (hot-restart, /drain_listeners, healthcheck fail): the HCM only checks drainClose() while encoding a response, so it never notices a drain on an idle connection. We poll drainClose() on a 100ms timer and, on detection, schedule the final goAway() after drain_timeout (no reconnect as the listener is dying), giving any in-flight request the same graceful window.
A bool guard ensures the final GOAWAY is sent at most once across both paths.
Net effect: earlier, a GOAWAY made both client and server stop using the connection immediately, opening a connectivity gap. Now the server brings up a new tunnel first and only sends the GOAWAY once a replacement is ready, so the cluster transparently shifts to the new connection.

Future

Need to find some way of passing the info of draining to the cluster without preventing adapter from rejecting new stream creation which will allow us to notify the client consumers in advance to not "prefer" this tunnel.

Testing

Unit and Integ tests.

…OAWAY Signed-off-by: aakugan <aakashganapathy2@gmail.com>

KBaichoo · 2026-06-11T16:46:16Z

/assign @botengyao

aakugan requested review from agrawroh, botengyao and yanavlasov as code owners June 9, 2026 10:41

aakugan marked this pull request as draft June 9, 2026 10:41

reverse_tunnel: proactively replace draining connections instead of G…

423aa92

…OAWAY Signed-off-by: aakugan <aakashganapathy2@gmail.com>

aakugan force-pushed the fix/use-shutdown-notice branch from 9ee5bed to 423aa92 Compare June 9, 2026 13:12

aakugan marked this pull request as ready for review June 9, 2026 14:13

repokitteh-read-only Bot assigned botengyao Jun 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reverse_tunnel: proactively replace draining connections instead of GOAWAY#45519

reverse_tunnel: proactively replace draining connections instead of GOAWAY#45519
aakugan wants to merge 1 commit into
envoyproxy:mainfrom
aakugan:fix/use-shutdown-notice

aakugan commented Jun 9, 2026

Uh oh!

KBaichoo commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

aakugan commented Jun 9, 2026

Commit Message

Description

Future

Testing

Uh oh!

KBaichoo commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants