Skip to content

feat(gateway): own and freeze the operator API contract#952

Merged
marcusrbrown merged 9 commits into
mainfrom
feat/operator-api-contract
Jun 19, 2026
Merged

feat(gateway): own and freeze the operator API contract#952
marcusrbrown merged 9 commits into
mainfrom
feat/operator-api-contract

Conversation

@marcusrbrown

Copy link
Copy Markdown
Collaborator

Closes #949.

What this adds

The gateway now owns a single, versioned source of truth for the operator API surface, under packages/gateway/src/operator-contract/. Until now the operator-facing shapes were scattered and internal — run-state lifecycle in the runtime coordination layer, approval-decision semantics in the gateway approvals code, operator identity duplicated across two modules — with no canonical, exported, versioned contract for a client to validate against.

This consolidates them behind one barrel (OPERATOR_CONTRACT_VERSION = '1.0.0'):

  • Lifecycle — re-exports RunPhase/Surface and adds an OperatorRunStatus projection that carries only operator-safe fields. Internal coordination fields (holder, thread, details) are excluded by construction, and the projection omits a run entirely when its repo is redaction-denylisted, so a redacted repo's reference cannot leak through a stored run's status. The denylist check is a required predicate the caller must supply — a caller cannot forget it and silently surface a redacted repo.
  • Identity — one canonical OperatorIdentity; the two byte-identical duplicates become aliases of it, so there is a single structural definer.
  • Approvals — the contract is the sole definer of the permission reply type (the coordinator re-exports it, so existing imports are unchanged), plus an operator decision-state set with an exhaustive mapping from the internal outcome, and a decision-input type that requires a transport-bound actor so a web decision cannot bypass the fail-closed approval gate.
  • Responses — named types for the operator HTTP responses with runtime validators that return a result and never echo input in their error reasons.
  • Obligations — the repo-redaction and authorization obligations are embedded as normative clauses, including a fail-closed stub that throws by default so a repo-data response path cannot ship without an explicit redaction check.

The exported surface is plain TypeScript plus result-returning validators — no Effect Schema crosses the contract boundary. The dashboard's mock operator client becomes a non-canonical downstream fixture that conforms to this contract rather than defining it.

Scope

Consolidation only. The net-new command/mission shapes and the real redaction-gate implementation are intentionally deferred to the units that build the endpoints needing them; this PR freezes what exists and binds the obligations so those units can't skip them. The contract changes are type-level and behavior-preserving — existing approval and route behavior is unchanged.

Verification

Type-check, lint, and the full gateway suite (1940 tests) pass. The consolidation cuts were verified behavior-preserving by keeping the existing approval/route suites green before and after. New tests cover the redaction-aware omission (including a denylist-predicate membership case), the decision-state mapping exhaustiveness, the no-oracle validators, and the fail-closed redaction stub.

Establish the operator API contract module as the single import authority for the
operator-facing surface. The pinned version is build-time only and never negotiated
over the wire; downstream consumers pin it. The barrel grows as the contract
consolidates the lifecycle, identity, approval, and response shapes.
Define one OperatorIdentity in the contract and collapse the two byte-identical
duplicate declarations (the approval actor and the engine requester identity)
onto it as aliases, so there is a single structural definer. The distinct
session-context and session-record identity shapes are left untouched, and the
approval and requester unions keep discriminating on kind. Behavior is unchanged.
Project canonical run state into an operator-safe status carrying only the run
id, repo reference, surface, phase, web status, start time, and a derived
staleness flag — internal coordination fields are excluded by construction. The
projection omits a run entirely when its repo is redaction-denylisted, so the
repo reference cannot leak through a stored run's status. The lifecycle types
are re-exported through the contract barrel.
Make the contract the sole definer of the permission reply type; the coordinator
re-exports it so existing import sites are unchanged. Add the operator-facing
decision-state set with an exhaustive mapping from the internal decision outcome,
and a decision-input type that requires a transport-bound actor so a web decision
cannot bypass the fail-closed approval gate with a free-form decider. Behavior is
unchanged.
Publish named types for the operator HTTP responses and runtime validators that
return a Result with fixed, no-oracle error reasons that never echo input. The
session-info route now uses the named response type; its JSON output is
unchanged. The exported surface stays plain TypeScript and Result — no Effect
Schema crosses the contract boundary.
…ntract

Embed the repo-redaction and authorization obligations as normative contract
clauses, including a fail-closed stub that throws by default so a repo-data
response path cannot ship without an explicit redaction check. Record in the
gateway guide that this module owns the operator-surface types, that the sole
approval gate settles every transport, that operator identity is constructed
server-side, and that the contract version is never negotiated over the wire.
Make the run-status redaction signal a required predicate so a caller cannot
forget to check the denylist and surface a redacted repo. Give the unknown-phase
path a fail-closed fallback, drop the repo reference from the redaction stub's
error, and reject non-finite or non-integer numeric fields in the validators.
Replace issue-number comments with descriptive text and add the missing
no-oracle, barrel, and scanner-hardening tests.
@marcusrbrown marcusrbrown requested a review from fro-bot as a code owner June 19, 2026 02:49
Comment thread packages/gateway/src/operator-contract/approval.test.ts Fixed
fro-bot
fro-bot previously approved these changes Jun 19, 2026

@fro-bot fro-bot left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: PASS

This is a clean, behavior-preserving type consolidation. The contract correctly centralizes the operator-facing surface, the security obligations are encoded as load-bearing type/runtime constraints (not just prose), and verification is strong: gateway suite (1940 tests) passes, tsc --noEmit is clean, and no duplicate structural declarations of the collapsed types remain outside operator-contract/.

Highlights I verified directly:

  • R5 (field exclusion): OperatorRunStatus omits holder_id/thread_id/details by construction; toOperatorRunStatus projects only the safe fields from RunState (field names match coordination/types.ts). Confirmed by key-absence tests.
  • R5/R6 (redaction leak): isRepoDenylisted is a required predicate checked before any field read, returning null for a denylisted repo. The "forgot to check" state is unrepresentable. The unknown-phase fail-closed fallback to 'failed' (vs. dropping the JSON key) is a nice defense-in-depth touch.
  • R7 (approval bypass): toOperatorDecisionState is exhaustive over the exact 5 DecisionOutcome variants with a never guard; DecisionInput forces a transport-bound actor. The structural decidedBy: string scan has self-tests for both required/optional and comment-skip cases.
  • No-oracle validators: fixed error strings; the parse tests assert no token/secret/input substrings leak across all error paths.
  • Behavior preservation: WebOperatorActor/WebOperatorIdentity become aliases of OperatorIdentity; PermissionReply is re-exported through coordinator.ts so all 9 import sites and the ApprovalActor union resolve unchanged.

Blocking issues

None

Non-blocking concerns

  • packages/gateway/src/operator-contract/approval.test.ts:180-188 — the accessSync(absPath, R_OK) existence guard before readFileSync is what CodeQL flagged as a TOCTOU race (alert #56). In a test it's a false positive, but the accessSync is also redundant: readFileSync already throws ENOENT/EACCES on a renamed/unreadable file. You could drop the accessSync and wrap readFileSync in the same try/catch to both satisfy CodeQL and simplify — purely optional.
  • parse.ts validators ignore extra fields by design (documented). Reasonable for permissive response parsing, but worth keeping in mind that a future strict-mode need (e.g. rejecting unexpected fields on a sensitive endpoint) would require a separate guard — not a v1.0.0 concern.
  • The assertRedactionApplied stub throwing REDACTION_GATE_NOT_IMPLEMENTED is unreachable today (no repo-data endpoint calls it). That's intentional per the deferred scope; just flagging that its enforcement value is latent until Units 4–6 land.

Missing tests

None for the shipped surface. Coverage is thorough: phase mapping, field exclusion, redaction omission (including membership-based predicate and no-partial-leak under EXECUTING), stale boundary (inclusive at nowMs - staleThresholdMs), unparseable-heartbeat fail-safe, decision-state exhaustiveness, and no-oracle validators. The deferred redaction-gate behavioral tests are correctly left as it.todo reminders tied to the gate implementation.

Risk assessment (LOW):

Type-level, behavior-preserving consolidation. Blast radius is contained to type ownership — runtime control flow (approval gate, route behavior, coordination internals) is unchanged, and the full suite + type-check confirm it. Regression likelihood low; security exposure reduced (single definer, fail-closed obligations). The only deferred risk is that the redaction/version/identity obligations remain prose+stub until consuming endpoints enforce them, which is the stated scope.


Run Summary
Field Value
Event pull_request
Repository fro-bot/agent
Run ID 27802330134
Cache hit
Session ses_122354e7bfferS1fNqE3JenAUM

Collapse the existence check and file read into a single read so a renamed scan
target surfaces from the read error itself, removing the file-system race the
separate access check introduced.

@fro-bot fro-bot left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: PASS

Re-review after the prior approval was dismissed. Since then, exactly one commit landed — ac93d2e test(gateway): read R7 scan targets without a check-then-read race — which directly resolves the only non-blocking item I flagged last round (the redundant accessSync TOCTOU in the R7 structural scan, CodeQL alert #56). The contract surface itself is unchanged from the previously-approved state.

What I verified on the new commit:

  • packages/gateway/src/operator-contract/approval.test.ts now drops the accessSync import and reads each scan target with a single readFileSync wrapped in try/catch. A renamed/deleted target still fails loudly with the same actionable message, so coverage cannot silently drop — but the check-then-read window is gone. Clean fix.
  • approval.test.ts passes (15/15). No other files changed; all prior verification still holds.

The original PASS rationale stands: behavior-preserving type consolidation with security obligations encoded as load-bearing type/runtime constraints, single structural definers, and thorough tests (gateway suite green, tsc --noEmit clean previously).

Blocking issues

None

Non-blocking concerns

  • parse.ts validators ignore extra fields by design (documented). Fine for permissive response parsing; a future sensitive endpoint needing strict rejection would add a separate guard — not a v1.0.0 concern.
  • assertRedactionApplied (throws REDACTION_GATE_NOT_IMPLEMENTED) has no call sites yet, so its enforcement value is latent until the repo-data endpoints in Units 4–6 land. Intentional per the deferred scope.

Missing tests

None for the shipped surface. Coverage remains thorough: phase mapping, field exclusion (R5), redaction omission incl. membership predicate and no-partial-leak under EXECUTING (R5/R6), inclusive stale boundary, unparseable-heartbeat fail-safe, decision-state exhaustiveness with never guard (R7), and no-oracle validators. The R7 scan retains its self-tests (required/optional decidedBy, comment-skip). Deferred redaction-gate behaviors remain it.todo reminders.

Risk assessment (LOW):

Type-level, behavior-preserving consolidation; the new commit only touches a test file to remove a filesystem race. Blast radius confined to type ownership — runtime control flow (approval gate, route behavior, coordination internals) unchanged. Regression likelihood low; security exposure reduced. The only deferred risk is that redaction/version/identity obligations stay prose+stub until consuming endpoints enforce them, which is the stated scope.


Run Summary
Field Value
Event pull_request
Repository fro-bot/agent
Run ID 27802561126
Cache hit
Session ses_122354e7bfferS1fNqE3JenAUM

@marcusrbrown marcusrbrown merged commit 92b621e into main Jun 19, 2026
12 checks passed
@marcusrbrown marcusrbrown deleted the feat/operator-api-contract branch June 19, 2026 03:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Own and freeze the canonical operator API contract (S1 surface)

3 participants