Skip to content

🐛 fix(timeout): reclaim abandoned fiber.Ctx via ScheduleReclaim latch (#4359)#4400

Merged
ReneWerner87 merged 3 commits into
mainfrom
claude/middleware-component-bug-794OK
Jun 4, 2026
Merged

🐛 fix(timeout): reclaim abandoned fiber.Ctx via ScheduleReclaim latch (#4359)#4400
ReneWerner87 merged 3 commits into
mainfrom
claude/middleware-component-bug-794OK

Conversation

@ReneWerner87
Copy link
Copy Markdown
Member

Summary

Deterministic fix for the timeout context leak reported in #4359 (and the related goroutine-survival aspect of #4347). Replaces the existing TODO in middleware/timeout/timeout.go that documented exactly this design.

Closes #4359.

Approach

Adds an opt-in reclamation latch on *DefaultCtx:

  • ScheduleReclaim(handlerDone <-chan struct{}, cancel context.CancelFunc) arms a single goroutine that waits for BOTH:
    • handlerDone close (the handler goroutine has fully exited, including panic-path), AND
    • the request handler released the context (signalled from ReleaseCtx / releaseDefaultCtx).
  • Only then does it call ForceRelease, returning the ctx to the pool. Race-free by construction: every concurrent holder (handler goroutine, requestHandler running ErrorHandler) is guaranteed done.

The timeout middleware closes handlerDone in a defer covering both the normal and panic exit paths, then calls ScheduleReclaim via a narrow type assertion. Custom Ctx implementations fall back to the prior behavior (no auto-reclaim).

Why not runtime.SetFinalizer (PR #4390)

Documented in detail at #4390 (comment). Short version:

  • The finalizer would only reclaim the ~720 B fiber DefaultCtx shell, not the ~10 - 20 KiB fasthttp RequestCtx graph that fasthttp itself permanently abandons after TimeoutErrorWithResponse. About 90 % of the leak would survive.
  • The finalizer pins each ctx for an extra GC cycle, adds heap pressure under bursty timeouts, and degrades silently under tuned GOGC values.
  • The pattern (SetFinalizer back into a sync.Pool) has zero stdlib or production-library precedent; the Go team is migrating away from SetFinalizer even for safety-net uses.
  • The deterministic latch here pays zero overhead on the non-abandoned hot path and adds two atomic field reads + one goroutine on the timeout path.

Changes introduced

  • Benchmarks: hot path is unchanged (reclaim *reclaimLatch is nil unless armed; reads are nil-checks). Timeout path replaces the existing cleanup goroutine with the reclamation goroutine, same goroutine count.
  • Changelog/What's New: Timed-out requests no longer leak their fiber.Ctx. The Abandon() contract is unchanged for existing callers (SSE); a new opt-in ScheduleReclaim arms automatic safe reclamation.
  • API Longevity: Adds one method (ScheduleReclaim) on *DefaultCtx and one private signal hook (signalReleased) called from ReleaseCtx/releaseDefaultCtx. No breaking change.
  • Examples: timeout middleware demonstrates the intended usage.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Code consistency (non-breaking change which improves code reliability and robustness)

Checklist

  • Conducted a self-review of the code and provided comments for complex or critical parts.
  • Added or updated unit tests to validate the effectiveness of the changes.
  • Ensured that new and existing unit tests pass locally with -race.
  • Aimed for optimal performance with minimal allocations in the new code.

Test plan

  • TestTimeout_AbandonedCtxReclaimed: confirms the ctx is NOT reclaimed while the handler is still running, and IS reclaimed (via IsAbandoned() flipping to false on the same pointer) once it exits.
  • TestTimeout_PanicAfterTimeoutReclaimed: same assertion for the panic exit path.
  • TestTimeout_AbandonWithoutReclaimNotPooled: guards the SSE-style Abandon() contract: an abandoned-but-unarmed ctx stays unpooled.
  • All existing timeout tests pass unchanged.

Refs: #4359, #4347, #4390.

🤖 Generated with Claude Code

The timeout middleware abandoned the fiber.Ctx on every timeout and never
called ForceRelease, so timed-out requests leaked their context permanently
even after the handler eventually completed. Under sustained timeouts this is
an unbounded memory leak (the code's own TODO acknowledged it).

Add an opt-in reclamation latch on *DefaultCtx. ScheduleReclaim arms a single
goroutine that returns the context to the pool via ForceRelease only after BOTH
the handler goroutine finishes AND the request handler releases the context
(signalled from ReleaseCtx/releaseDefaultCtx). This keeps reclamation race-free
and adds zero overhead to the non-abandoned hot path. SSE's use of Abandon()
without arming the latch is unchanged: such contexts stay unpooled as before.

The timeout middleware now closes a handlerDone channel when its handler
goroutine exits (covering normal and panic paths) and arms reclamation via a
narrow type assertion, falling back to the previous behavior for custom Ctx
implementations.

Tests (run with -race) verify the context is not reclaimed while the handler is
still running, is reclaimed once it finishes (normal and panic paths), and that
an abandoned-but-unarmed context is left unpooled.
Copilot AI review requested due to automatic review settings June 4, 2026 09:05
@ReneWerner87 ReneWerner87 requested a review from a team as a code owner June 4, 2026 09:05
@ReneWerner87 ReneWerner87 requested review from efectn, gaby and sixcolors June 4, 2026 09:05
@ReneWerner87
Copy link
Copy Markdown
Member Author

@claude please pick up this PR and drive it to merge.

Context: this branch was created earlier as the deterministic alternative to PR #4390's runtime.SetFinalizer approach. The full rationale for choosing this design over the finalizer one is documented at #4390 (comment).

Next steps for you:

  1. Rebase onto current main (HEAD eb5b1746); merge-base is ee98695f, conflict check is clean.
  2. Re-run the full test suite locally with -race (go test -race ./...) and the timeout-focused benchmarks.
  3. Address any CI feedback, codecov gates, and lint findings as they come in.
  4. Respond to maintainer review (cc @ReneWerner87).
  5. If reviewers ask about the design trade-off vs. PR 🐛 fix: return abandoned timeout contexts to pool via GC finalizer #4390, point them to the linked comment thread; the multi-perspective analysis is already there.

Verify before requesting review:

  • The signalReleased hook is wired through both ReleaseCtx (the public one) and releaseDefaultCtx (the internal one called by defaultRequestHandler's defer). Both paths must signal.
  • The reclaim *reclaimLatch field is reset to nil in release() so a pooled ctx that gets reused does not carry a stale latch.
  • The SSE middleware path (middleware/sse/sse.go:39 calls Abandon() without ScheduleReclaim) still leaves contexts unpooled, matching prior behavior. TestTimeout_AbandonWithoutReclaimNotPooled guards this.
  • The interface assertion in timeout.go uses an anonymous interface (not *DefaultCtx directly) so custom Ctx implementations that adopt ScheduleReclaim later get the benefit automatically.

If anything in the design needs revisiting, raise it here rather than silently changing the contract.

@ReneWerner87 ReneWerner87 added this to v3 Jun 4, 2026
@ReneWerner87 ReneWerner87 added this to the v3 milestone Jun 4, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 4, 2026

Codecov Report

❌ Patch coverage is 92.59259% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.33%. Comparing base (ee98695) to head (3381473).
⚠️ Report is 55 commits behind head on main.

Files with missing lines Patch % Lines
middleware/timeout/timeout.go 75.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4400      +/-   ##
==========================================
- Coverage   91.33%   91.33%   -0.01%     
==========================================
  Files         132      132              
  Lines       13110    13161      +51     
==========================================
+ Hits        11974    12020      +46     
- Misses        717      724       +7     
+ Partials      419      417       -2     
Flag Coverage Δ
unittests 91.33% <92.59%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a race-free reclamation mechanism for abandoned contexts to resolve memory leaks in timed-out requests. It implements a reclaimLatch and ScheduleReclaim method on DefaultCtx to coordinate safe pool reclamation once both the handler goroutine and the request handler have finished. The timeout middleware is updated to use this mechanism, and comprehensive tests are added. The review feedback highlights a potential leak when custom contexts wrap *DefaultCtx, recommending an interface check instead of a concrete type assertion to safely invoke signalReleased().

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread ctx_interface.go
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 4, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 566ad138-011a-4c9a-a85c-6055442ad29b

📥 Commits

Reviewing files that changed from the base of the PR and between f20d73b and 3381473.

📒 Files selected for processing (1)
  • ctx_reclaim_test.go

Walkthrough

Adds a reclaim latch and DefaultCtx.ScheduleReclaim to arm abandon + wait for handler completion, wires signalReleased() into ReleaseCtx/releaseDefaultCtx, and integrates with timeout middleware to safely ForceRelease abandoned default contexts after the handler finishes; includes tests covering timing, panic, cancel, and idempotency.

Changes

Abandoned Context Reclamation for Timeout Middleware

Layer / File(s) Summary
Reclamation latch infrastructure
ctx.go
Internal reclaimLatch type using sync.Once and a releasedCh channel; adds reclaim field and clears it in DefaultCtx.release().
ScheduleReclaim public API
ctx.go
DefaultCtx.ScheduleReclaim(handlerDone, cancel) calls Abandon(), arms the latch, and starts a goroutine that waits for handler completion, optionally invokes cancel, waits for the handler's release signal, then calls ForceRelease(). Adds signalReleased().
Context release coordination
ctx_interface.go
ReleaseCtx and releaseDefaultCtx call signalReleased() for abandoned DefaultCtx instances so pending reclaimers are notified; pooling remains deferred to the reclaimer.
Timeout middleware integration
middleware/timeout/timeout.go
Introduces handlerDone closed after handler goroutine exits; handleTimeout accepts handlerDone and, for *fiber.DefaultCtx, calls ScheduleReclaim(handlerDone, cancel) and returns the timeout error. Non-default contexts still use abandon+restore fallback.
Reclamation test coverage
middleware/timeout/timeout_test.go, ctx_reclaim_test.go
Adds tests ensuring abandoned contexts remain out of the pool while handler runs and are reclaimed after handler completion (including panic), verifies signalReleased idempotency and no-op behavior, and exercises cancel and nil-cancel reclamation flows.

Sequence Diagram(s)

sequenceDiagram
  participant Handler
  participant TimeoutMiddleware
  participant DefaultCtx
  participant Reclaimer
  TimeoutMiddleware->>DefaultCtx: Abandon() + ScheduleReclaim(handlerDone, cancel)
  TimeoutMiddleware->>Handler: return timeout error
  Reclaimer->>Handler: wait for handlerDone close
  Reclaimer->>TimeoutMiddleware: optional cancel() (if provided)
  Handler->>DefaultCtx: ReleaseCtx -> signalReleased()
  Reclaimer->>DefaultCtx: wait releasedCh then ForceRelease()
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • gofiber/fiber#4009: Modifies timeout middleware and abandoned-context lifecycle in ctx.go/ctx_interface.go with reclamation coordination.

Suggested labels

codex

Suggested reviewers

  • gaby
  • sixcolors
  • efectn

Poem

🐰 I found a ctx stuck in timeout's snare,

Abandoned and waiting in mid-air.
A latch was planted, patient and keen,
Wait for the handler, then tidy the scene.
Hooray — contexts hop back, clean and serene!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: fixing a timeout context leak via a new ScheduleReclaim latch mechanism.
Description check ✅ Passed The description comprehensively covers the problem, approach, design trade-offs, changes introduced, test plan, and checklist completion with clear references to linked issues.
Linked Issues check ✅ Passed The PR fully addresses the P0 objective from #4359: implementing deterministic reclamation of abandoned fiber.Ctx objects after handler completion, eliminating the unbounded memory leak when requests time out.
Out of Scope Changes check ✅ Passed All changes are tightly scoped to the timeout context leak fix: core reclamation mechanism (ctx.go, ctx_interface.go), timeout middleware integration, and comprehensive tests with no incidental refactoring.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/middleware-component-bug-794OK

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies"


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ctx.go`:
- Line 776: Spelling fix: update the comment that references
ReleaseCtx/releaseDefaultCtx by changing the US English spelling "signalled" to
"signaled" so the linter stops flagging it; locate the comment near the handler
release notice (mentions ReleaseCtx/releaseDefaultCtx) and replace the word
only, leaving the rest of the comment intact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c50a07ac-b3a3-4430-9b1a-f1b3a4f47159

📥 Commits

Reviewing files that changed from the base of the PR and between eb5b174 and 4ef11ed.

📒 Files selected for processing (4)
  • ctx.go
  • ctx_interface.go
  • middleware/timeout/timeout.go
  • middleware/timeout/timeout_test.go

Comment thread ctx.go Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses the timeout middleware’s abandoned fiber.Ctx leak by adding a safe, opt-in reclamation mechanism to *fiber.DefaultCtx and wiring the timeout path to schedule pool reclamation once it is race-free to do so.

Changes:

  • Add a DefaultCtx “reclaim latch” (ScheduleReclaim + signalReleased) to coordinate safe ForceRelease() after both the handler goroutine exits and Fiber releases the ctx.
  • Update timeout middleware to close a handlerDone signal and schedule reclamation on timeout instead of leaking abandoned contexts.
  • Add unit tests validating reclamation behavior (normal + panic paths) and preserving the “abandon without reclaim stays unpooled” contract.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
middleware/timeout/timeout.go Adds handlerDone and schedules ctx reclamation on timeout to avoid leaking abandoned contexts.
middleware/timeout/timeout_test.go Adds tests ensuring abandoned timed-out contexts are reclaimed only after handler exit; keeps SSE-style abandon semantics.
ctx.go Introduces reclaimLatch, ScheduleReclaim, and signalReleased on DefaultCtx, plus clears latch on release.
ctx_interface.go Ensures ReleaseCtx / releaseDefaultCtx signal the reclaimer when releasing an abandoned *DefaultCtx.

Comment thread middleware/timeout/timeout.go Outdated
Comment thread ctx.go
@ReneWerner87
Copy link
Copy Markdown
Member Author

@claude pls solve the review hints and linting issues

@ReneWerner87
Copy link
Copy Markdown
Member Author

@claude[agent] pls solve the review hints and linting issues

- Change type assertion in timeout.go from interface to concrete *fiber.DefaultCtx
  to prevent accidental matches with custom Ctx implementations that lack the
  signalReleased wiring, which would cause goroutine leaks
- Make ScheduleReclaim call Abandon() internally to prevent goroutine leaks if
  callers forget to call Abandon first
- Update timeout middleware to rely on ScheduleReclaim's internal Abandon call
  for DefaultCtx path, but still call Abandon explicitly for custom context fallback
- Fix misspelling: "signalled" → "signaled"

Agent-Logs-Url: https://github.com/gofiber/fiber/sessions/478a8456-02af-4103-97a0-e848a364daee

Co-authored-by: ReneWerner87 <7063188+ReneWerner87@users.noreply.github.com>
@Claude
Copy link
Copy Markdown
Contributor

Claude AI commented Jun 4, 2026

@claude pls solve the review hints and linting issues

Fixed both review issues in commit f20d73b:

  1. Changed type assertion to concrete *fiber.DefaultCtx to prevent custom implementations from accidentally matching
  2. Made ScheduleReclaim call Abandon() internally to prevent goroutine leaks

All tests pass with -race and linting is clean.

@claude[agent] pls solve the review hints and linting issues

Completed. See commit f20d73b which addresses both review issues and fixes the linting error.

@ReneWerner87
Copy link
Copy Markdown
Member Author

Status check after f20d73b

Review threads

All four review threads are now resolved:

  • ctx.go:776 (CodeRabbit: spelling signalled -> signaled) - fixed in f20d73b.
  • middleware/timeout/timeout.go:143 (Copilot: prefer concrete *fiber.DefaultCtx over interface assertion) - fixed in f20d73b.
  • ctx.go:780-787 (Copilot: ScheduleReclaim should not require callers to call Abandon first) - fixed in f20d73b: ScheduleReclaim now calls c.Abandon() internally, and the timeout middleware no longer calls Abandon itself on the DefaultCtx path.
  • ctx_interface.go:132-134 (Gemini: prefer interface assertion for signalReleased) - intentionally kept as concrete *DefaultCtx. Reasoning posted inline: signalReleased is unexported, so an interface assertion c.(interface{ signalReleased() }) can only match types inside package fiber; external custom Ctx implementations that embed *DefaultCtx cannot satisfy the interface from outside the package anyway. Keeping both the arming side (timeout.go) and the signaling side (ctx_interface.go) on the same concrete assertion avoids a mismatch where arming fires but signaling does not.

CI

All required checks pass on f20d73b9:

  • Linter, Spell check, govulncheck, CodeQL, Analyze (go/actions), CodeRabbit: pass.
  • Unit (1.25.x, 1.26.x on ubuntu / macos / windows, 6 jobs): pass.
  • Benchmark (Compare, repeated): pass.
  • codecov/project: pass.

codecov/patch

codecov/patch reports 37% on the patch and is the only red signal. Verified locally that the new functions are actually covered:

ctx.go: ScheduleReclaim   100.0%
ctx.go: signalReleased    100.0%
ctx.go: ForceRelease      100.0%
ctx.go: Abandon           100.0%
ctx.go: IsAbandoned       100.0%

The mismatch is purely a coverage-collection issue, not a missing-test issue. .github/workflows/test.yml runs go test ./... -coverprofile=coverage.txt without -coverpkg, so when middleware/timeout tests run, the coverage profile only records statements in middleware/timeout, not in the root fiber package, even though the timeout tests exercise ScheduleReclaim, signalReleased, and the new branches in ReleaseCtx / releaseDefaultCtx end-to-end.

Two ways to get the patch metric green:

  1. Add a small direct unit test in package fiber (e.g. ctx_reclaim_test.go) that calls ScheduleReclaim and signalReleased on a real *DefaultCtx obtained from app.AcquireCtx. This is the lowest-effort path and is mechanical given the existing middleware tests.
  2. Change test.yml to pass -coverpkg=./... so cross-package coverage attribution works repo-wide.

Recommend (1) since it is scoped to this PR; (2) is a separate infrastructure change.

Issue #4359 acceptance check

Re-reading #4359 against the current PR:

Issue criterion Status
(1) Implement a finalizer goroutine pool or sync.WaitGroup-based cleanup that eventually calls ForceRelease after the handler goroutine completes Implemented via ScheduleReclaim + reclaimLatch (handler-done channel plus released-channel from ReleaseCtx / releaseDefaultCtx, fused with sync.Once).
(2) Add a secondary timeout for the cleanup goroutine itself (e.g. 2x the original timeout) Not implemented. Documented design choice in ctx.go: "If handlerDone never closes - a handler that never returns - the context is intentionally never reclaimed, because the handler still owns it." This matches the issue's own note ("Note this part is partly inherent to the goroutine-per-request timeout pattern") and #4347's folded-in concern.
(3) Consider a bounded pool of reclaimer goroutines rather than spawning one per timeout Not implemented. Still one goroutine per timed-out request. The goroutine's lifetime is bounded by handler lifetime, so steady-state goroutine count is bounded by handler latency, not by timeout count.

Items (2) and (3) are deliberate non-goals of this PR. They address the never-returning-handler edge case, which the issue itself acknowledges is "partly inherent to the goroutine-per-request timeout pattern". Suggest tracking them in a follow-up issue if we want to chase a bounded-pool / hard-cap design (e.g. preemption via runtime.Goexit is not possible in Go, but capping the in-flight reclaimer count and shedding load when saturated is).

Net

If we accept that (2)/(3) are a follow-up, the only remaining gate is the codecov/patch signal, which is purely a coverage-collection artifact. Once a direct unit test in package fiber is added (or -coverpkg is set in CI), this PR is ready for human review and merge.

The reclaim path was already exercised end-to-end by the timeout middleware
tests, but the root-package coverage profile (test.yml runs `go test ./...`
without -coverpkg) only credits coverage of statements in the package being
tested, so codecov/patch saw 37% on the changes in ctx.go and ctx_interface.go.

Add direct unit tests in package fiber that drive ScheduleReclaim, the
release/handler-done ordering both ways, the cancel hook, the nil-cancel
branch, signalReleased's no-op and sync.Once semantics, and both ReleaseCtx
and releaseDefaultCtx release entry points. All run under -race.

Coverage after this change (root package run): ScheduleReclaim, signalReleased,
ForceRelease, Abandon, IsAbandoned, ReleaseCtx, releaseDefaultCtx all 100%.
@ReneWerner87 ReneWerner87 merged commit 2db96a7 into main Jun 4, 2026
20 checks passed
@ReneWerner87 ReneWerner87 deleted the claude/middleware-component-bug-794OK branch June 4, 2026 14:39
@github-project-automation github-project-automation Bot moved this to Done in v3 Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

🐛 bug: timeout middleware leaks fiber.Ctx on timeout (unbounded memory leak)

4 participants