feat(gateway): own and freeze the operator API contract#952
Conversation
Establish the operator API contract module as the single import authority for the operator-facing surface. The pinned version is build-time only and never negotiated over the wire; downstream consumers pin it. The barrel grows as the contract consolidates the lifecycle, identity, approval, and response shapes.
Define one OperatorIdentity in the contract and collapse the two byte-identical duplicate declarations (the approval actor and the engine requester identity) onto it as aliases, so there is a single structural definer. The distinct session-context and session-record identity shapes are left untouched, and the approval and requester unions keep discriminating on kind. Behavior is unchanged.
Project canonical run state into an operator-safe status carrying only the run id, repo reference, surface, phase, web status, start time, and a derived staleness flag — internal coordination fields are excluded by construction. The projection omits a run entirely when its repo is redaction-denylisted, so the repo reference cannot leak through a stored run's status. The lifecycle types are re-exported through the contract barrel.
Make the contract the sole definer of the permission reply type; the coordinator re-exports it so existing import sites are unchanged. Add the operator-facing decision-state set with an exhaustive mapping from the internal decision outcome, and a decision-input type that requires a transport-bound actor so a web decision cannot bypass the fail-closed approval gate with a free-form decider. Behavior is unchanged.
Publish named types for the operator HTTP responses and runtime validators that return a Result with fixed, no-oracle error reasons that never echo input. The session-info route now uses the named response type; its JSON output is unchanged. The exported surface stays plain TypeScript and Result — no Effect Schema crosses the contract boundary.
…ntract Embed the repo-redaction and authorization obligations as normative contract clauses, including a fail-closed stub that throws by default so a repo-data response path cannot ship without an explicit redaction check. Record in the gateway guide that this module owns the operator-surface types, that the sole approval gate settles every transport, that operator identity is constructed server-side, and that the contract version is never negotiated over the wire.
Make the run-status redaction signal a required predicate so a caller cannot forget to check the denylist and surface a redacted repo. Give the unknown-phase path a fail-closed fallback, drop the repo reference from the redaction stub's error, and reject non-finite or non-integer numeric fields in the validators. Replace issue-number comments with descriptive text and add the missing no-oracle, barrel, and scanner-hardening tests.
fro-bot
left a comment
There was a problem hiding this comment.
Verdict: PASS
This is a clean, behavior-preserving type consolidation. The contract correctly centralizes the operator-facing surface, the security obligations are encoded as load-bearing type/runtime constraints (not just prose), and verification is strong: gateway suite (1940 tests) passes, tsc --noEmit is clean, and no duplicate structural declarations of the collapsed types remain outside operator-contract/.
Highlights I verified directly:
- R5 (field exclusion):
OperatorRunStatusomitsholder_id/thread_id/detailsby construction;toOperatorRunStatusprojects only the safe fields fromRunState(field names matchcoordination/types.ts). Confirmed by key-absence tests. - R5/R6 (redaction leak):
isRepoDenylistedis a required predicate checked before any field read, returningnullfor a denylisted repo. The "forgot to check" state is unrepresentable. The unknown-phase fail-closed fallback to'failed'(vs. dropping the JSON key) is a nice defense-in-depth touch. - R7 (approval bypass):
toOperatorDecisionStateis exhaustive over the exact 5DecisionOutcomevariants with aneverguard;DecisionInputforces a transport-boundactor. The structuraldecidedBy: stringscan has self-tests for both required/optional and comment-skip cases. - No-oracle validators: fixed error strings; the parse tests assert no token/secret/input substrings leak across all error paths.
- Behavior preservation:
WebOperatorActor/WebOperatorIdentitybecome aliases ofOperatorIdentity;PermissionReplyis re-exported throughcoordinator.tsso all 9 import sites and theApprovalActorunion resolve unchanged.
Blocking issues
None
Non-blocking concerns
packages/gateway/src/operator-contract/approval.test.ts:180-188— theaccessSync(absPath, R_OK)existence guard beforereadFileSyncis what CodeQL flagged as a TOCTOU race (alert #56). In a test it's a false positive, but theaccessSyncis also redundant:readFileSyncalready throwsENOENT/EACCESon a renamed/unreadable file. You could drop theaccessSyncand wrapreadFileSyncin the sametry/catchto both satisfy CodeQL and simplify — purely optional.parse.tsvalidators ignore extra fields by design (documented). Reasonable for permissive response parsing, but worth keeping in mind that a future strict-mode need (e.g. rejecting unexpected fields on a sensitive endpoint) would require a separate guard — not a v1.0.0 concern.- The
assertRedactionAppliedstub throwingREDACTION_GATE_NOT_IMPLEMENTEDis unreachable today (no repo-data endpoint calls it). That's intentional per the deferred scope; just flagging that its enforcement value is latent until Units 4–6 land.
Missing tests
None for the shipped surface. Coverage is thorough: phase mapping, field exclusion, redaction omission (including membership-based predicate and no-partial-leak under EXECUTING), stale boundary (inclusive at nowMs - staleThresholdMs), unparseable-heartbeat fail-safe, decision-state exhaustiveness, and no-oracle validators. The deferred redaction-gate behavioral tests are correctly left as it.todo reminders tied to the gate implementation.
Risk assessment (LOW):
Type-level, behavior-preserving consolidation. Blast radius is contained to type ownership — runtime control flow (approval gate, route behavior, coordination internals) is unchanged, and the full suite + type-check confirm it. Regression likelihood low; security exposure reduced (single definer, fail-closed obligations). The only deferred risk is that the redaction/version/identity obligations remain prose+stub until consuming endpoints enforce them, which is the stated scope.
Run Summary
| Field | Value |
|---|---|
| Event | pull_request |
| Repository | fro-bot/agent |
| Run ID | 27802330134 |
| Cache | hit |
| Session | ses_122354e7bfferS1fNqE3JenAUM |
Collapse the existence check and file read into a single read so a renamed scan target surfaces from the read error itself, removing the file-system race the separate access check introduced.
fro-bot
left a comment
There was a problem hiding this comment.
Verdict: PASS
Re-review after the prior approval was dismissed. Since then, exactly one commit landed — ac93d2e test(gateway): read R7 scan targets without a check-then-read race — which directly resolves the only non-blocking item I flagged last round (the redundant accessSync TOCTOU in the R7 structural scan, CodeQL alert #56). The contract surface itself is unchanged from the previously-approved state.
What I verified on the new commit:
packages/gateway/src/operator-contract/approval.test.tsnow drops theaccessSyncimport and reads each scan target with a singlereadFileSyncwrapped intry/catch. A renamed/deleted target still fails loudly with the same actionable message, so coverage cannot silently drop — but the check-then-read window is gone. Clean fix.approval.test.tspasses (15/15). No other files changed; all prior verification still holds.
The original PASS rationale stands: behavior-preserving type consolidation with security obligations encoded as load-bearing type/runtime constraints, single structural definers, and thorough tests (gateway suite green, tsc --noEmit clean previously).
Blocking issues
None
Non-blocking concerns
parse.tsvalidators ignore extra fields by design (documented). Fine for permissive response parsing; a future sensitive endpoint needing strict rejection would add a separate guard — not a v1.0.0 concern.assertRedactionApplied(throwsREDACTION_GATE_NOT_IMPLEMENTED) has no call sites yet, so its enforcement value is latent until the repo-data endpoints in Units 4–6 land. Intentional per the deferred scope.
Missing tests
None for the shipped surface. Coverage remains thorough: phase mapping, field exclusion (R5), redaction omission incl. membership predicate and no-partial-leak under EXECUTING (R5/R6), inclusive stale boundary, unparseable-heartbeat fail-safe, decision-state exhaustiveness with never guard (R7), and no-oracle validators. The R7 scan retains its self-tests (required/optional decidedBy, comment-skip). Deferred redaction-gate behaviors remain it.todo reminders.
Risk assessment (LOW):
Type-level, behavior-preserving consolidation; the new commit only touches a test file to remove a filesystem race. Blast radius confined to type ownership — runtime control flow (approval gate, route behavior, coordination internals) unchanged. Regression likelihood low; security exposure reduced. The only deferred risk is that redaction/version/identity obligations stay prose+stub until consuming endpoints enforce them, which is the stated scope.
Run Summary
| Field | Value |
|---|---|
| Event | pull_request |
| Repository | fro-bot/agent |
| Run ID | 27802561126 |
| Cache | hit |
| Session | ses_122354e7bfferS1fNqE3JenAUM |
Closes #949.
What this adds
The gateway now owns a single, versioned source of truth for the operator API surface, under
packages/gateway/src/operator-contract/. Until now the operator-facing shapes were scattered and internal — run-state lifecycle in the runtime coordination layer, approval-decision semantics in the gateway approvals code, operator identity duplicated across two modules — with no canonical, exported, versioned contract for a client to validate against.This consolidates them behind one barrel (
OPERATOR_CONTRACT_VERSION = '1.0.0'):RunPhase/Surfaceand adds anOperatorRunStatusprojection that carries only operator-safe fields. Internal coordination fields (holder, thread, details) are excluded by construction, and the projection omits a run entirely when its repo is redaction-denylisted, so a redacted repo's reference cannot leak through a stored run's status. The denylist check is a required predicate the caller must supply — a caller cannot forget it and silently surface a redacted repo.OperatorIdentity; the two byte-identical duplicates become aliases of it, so there is a single structural definer.The exported surface is plain TypeScript plus result-returning validators — no Effect Schema crosses the contract boundary. The dashboard's mock operator client becomes a non-canonical downstream fixture that conforms to this contract rather than defining it.
Scope
Consolidation only. The net-new command/mission shapes and the real redaction-gate implementation are intentionally deferred to the units that build the endpoints needing them; this PR freezes what exists and binds the obligations so those units can't skip them. The contract changes are type-level and behavior-preserving — existing approval and route behavior is unchanged.
Verification
Type-check, lint, and the full gateway suite (1940 tests) pass. The consolidation cuts were verified behavior-preserving by keeping the existing approval/route suites green before and after. New tests cover the redaction-aware omission (including a denylist-predicate membership case), the decision-state mapping exhaustiveness, the no-oracle validators, and the fail-closed redaction stub.