Skip to content

fix(task-graph,storage): cache restart-resume + SharedInMemory sync barrier#552

Merged
sroussey merged 2 commits into
mainfrom
claude/beautiful-mayer-ECvVg
Jun 8, 2026
Merged

fix(task-graph,storage): cache restart-resume + SharedInMemory sync barrier#552
sroussey merged 2 commits into
mainfrom
claude/beautiful-mayer-ECvVg

Conversation

@sroussey

@sroussey sroussey commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Summary

Two targeted fixes against origin/main that landed in recent refactors and silently degraded restart-resume / cross-tab sync semantics. Each fix ships with a regression test under packages/test/.

Fix 1 — Critical: private cache key broke crash-restart-resume

RunPrivateCacheRepo rows are keyed __run:${runId}::${taskId}. Task.id defaults to uuid4() when config.id is not supplied (see packages/task-graph/src/task/Task.ts), so after a crash the same graph mints fresh UUIDs on construction and every prior run-private row is orphaned — the entire point of the private cache tier (restart-survival) silently fails.

Changes:

  • New Task.hasDeterministicId() (declared on ITaskState) — returns false for canonical v4-UUID-shaped ids, true for any other non-empty string id (i.e. the caller pinned one).
  • CacheCoordinator.cacheIdentityKey keys private entries by task.id only when hasDeterministicId(); otherwise it keys by task.type so a restart with new autogenerated ids still finds prior rows for the same task type within the same runId.
  • RunPrivateCacheRepo.noteFallbackKey() fires a single console.warn per process the first time the fallback is engaged ("pin task.id to enable crash-resume"); pinned-id graphs stay silent. Gated by a static boolean so happy-path operation does not log.
  • Updated the RunPrivateCacheRepo doc comment to spell out: durable restart-resume requires deterministic task ids; without one the cache is best-effort intra-process only.
  • Regression test packages/test/src/test/task-graph-cache/RunPrivateCacheKeyFallback.test.ts:
    • Seeds a prior partial run's row under the type-keyed fallback and verifies a fresh graph (with autogenerated ids) hits cache.
    • Confirms pinned ids still key by id and never trigger the fallback warn.
    • Pins the warn-once behavior across multiple graphs / runs.

Fix 2 — High: SharedInMemoryTabularStorage.setupDatabase lost its tab-sync barrier

Commit 64591e3 dropped the await on syncFromOtherTabs() inside setupDatabase, turning startup into fire-and-forget. Any caller that immediately reads/writes after setupDatabase() resolves now races other tabs' replayed rows.

Changes:

  • syncFromOtherTabs() now installs a syncSettled promise that resolves either on a SYNC_RESPONSE apply or on the SYNC_TIMEOUT fallback (so a peerless tab never deadlocks).
  • setupDatabase({ awaitTabSync?: boolean }) defaults to true (restoring previous behaviour) and awaits the sync barrier before applying migrations. Latency-sensitive callers can opt out with { awaitTabSync: false }.
  • Regression test packages/test/src/test/storage-tabular/SharedInMemoryTabularStorageSyncBarrier.test.ts:
    • Pre-populates a peer tab via BroadcastChannel, then asserts the first get on a freshly initialized tab sees the peer's row.
    • Confirms the awaitTabSync: false opt-out path resolves promptly and still functions.
    • Confirms the no-peer-on-channel timeout path resolves (no hang).

Test plan

  • bun run build:types (turbo, 36 packages) — all green
  • bunx vitest run packages/test/src/test/task-graph-cache/ — 15 files / 58 tests pass
  • bunx vitest run packages/test/src/test/storage-tabular/ — 19 files / 1075 tests pass (13 pre-existing skips)
  • bunx vitest run packages/test/src/test/task-graph/ packages/test/src/test/task-graph-output-cache/ — 51 files / 637 tests pass
  • bunx vitest run packages/test/src/test/task/ — 72 files / 1117 tests pass (24 pre-existing skips)
  • Both new regression tests pass on this branch and would fail on origin/main without the fixes.

Generated by Claude Code

claude added 2 commits June 8, 2026 08:31
…nerated ids

Restart-resume of run-private cache entries relies on a stable cache identity
per task. When config.id is not supplied, Task.constructor mints a fresh v4
UUID per process, so a crash + restart would mint new ids and orphan every
prior row. This adds Task.hasDeterministicId() (false for v4-UUID-shaped ids)
and makes CacheCoordinator key private entries by task type as a best-effort
fallback when the id is autogenerated. RunPrivateCacheRepo logs a single warn
per process the first time the fallback engages, pointing operators at pinning
config.id to enable exact crash-resume.

Adds a regression test that seeds a prior run's row under the type-keyed
fallback, then verifies a fresh graph with autogen ids reuses it; also pins
the warn-once behavior and confirms pinned-id tasks bypass the fallback.
…sync barrier

64591e3 dropped the await on syncFromOtherTabs() in setupDatabase, making the
initial cross-tab sync fire-and-forget. Callers that immediately put/get after
setupDatabase() resolves now race other tabs' replayed rows. Restores the
await as the default and gates it on a new awaitTabSync option (default true)
so latency-sensitive startup paths can still opt out explicitly. The internal
sync state now exposes a settle promise that resolves on either a SYNC_RESPONSE
apply or the SYNC_TIMEOUT fallback, so no caller can deadlock waiting for a
peer that never answers.

Adds a regression test that pre-populates a peer tab, then asserts the first
get on a freshly initialized tab sees the peer's row. Covers the opt-out and
no-peer-timeout paths.
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 61.99% 24862 / 40102
🔵 Statements 61.84% 25718 / 41586
🔵 Functions 62.94% 4699 / 7465
🔵 Branches 50.72% 12202 / 24053
File CoverageNo changed files found.
Generated in workflow #2522 for commit 6bfe6ab by the Vitest Coverage Report Action

@sroussey sroussey merged commit cbf82ab into main Jun 8, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants