Skip to content

Validate Anthropic extended-thinking request invariants (#253)#256

Merged
jpr5 merged 3 commits into
mainfrom
blitz/aimock-253-thinking/integration
Jun 8, 2026
Merged

Validate Anthropic extended-thinking request invariants (#253)#256
jpr5 merged 3 commits into
mainfrom
blitz/aimock-253-thinking/integration

Conversation

@jpr5

@jpr5 jpr5 commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #253. When Anthropic extended thinking is enabled, the real API 400s if a tool-loop continuation request omits the prior assistant turn's leading thinking block (and its signature). aimock previously accepted such requests and replayed the next fixture — a false green that masked real claude-sdk-python bugs (dropped thinking blocks, last-only retention, un-replayed redacted_thinking).

What changed

  • validateThinkingInvariants (messages.ts): when thinking is enabled, an in-scope tool-loop continuation turn (a tool_use answered by the next turn's tool_result) must (a) lead with a thinking/redacted_thinking block, (b) carry a non-empty signature on a leading thinking block, (c) carry non-empty data on a leading redacted_thinking block. Text-only/end_turn turns are exempt (no false 400s). Strict mode → real Anthropic 400 error shape; non-strict → warn + replay (existing suites stay green). Untrusted-JSON hardening: null/non-object message and content-block entries are guarded in both the validator and claudeToCompletionRequest (no 500 on malformed input — the real API returns 400).
  • Round-trip-safe emission: emitted thinking blocks carry a non-empty placeholder signature (signature_delta + non-streaming assembled block) so aimock's own record→replay stays green under strict mode, while the streaming content_block_start signature stays "" per the real Anthropic wire shape. Applied consistently across all three response shapes — text, content+tool, and tool-only (ToolCallResponse gains an optional reasoning field).

Test plan

  • 30+ unit cases for validateThinkingInvariants (scope, three violation kinds, multi-turn ordering, malformed-input guards), strict-on/off + X-AIMock-Strict override integration, and round-trip tests proving aimock's emitted thinking turns (all shapes) replay cleanly under strict.
  • pnpm test 3377 passed; pnpm test:drift 80 passed (Anthropic drift contract aligned); prettier + eslint + tsc --noEmit + build all clean.

Follow-ups (out of scope for this PR)

  • Recorder (stream-collapse.ts) captures only thinking_delta text, not the upstream signature_delta value or redacted_thinking data — recording a real-Anthropic thinking turn loses signature/redacted data.
  • Validator scopes invariants to the leading content block by design; multi-thinking-block-per-turn signature enforcement is not covered.
  • Pre-existing fidelity nits: message_start emits full output_tokens; claudeStopReason passes unmapped finish reasons through; content+tool builder emits an empty text block when content is "".

@pkg-pr-new

pkg-pr-new Bot commented Jun 8, 2026

Copy link
Copy Markdown

Open in StackBlitz

npm i https://pkg.pr.new/@copilotkit/aimock@256

commit: ec8424f

jpr5 added 3 commits June 8, 2026 16:40
Add validateThinkingInvariants: when extended thinking is enabled, a tool-loop
continuation assistant turn (a tool_use answered by the next turn's tool_result)
must lead with a thinking/redacted_thinking block carrying a non-empty signature
(non-empty data for redacted_thinking). Strict mode emits the real Anthropic 400
error shape; non-strict warns and replays. Guards untrusted JSON against null /
non-object message and content-block entries. Emits a non-empty placeholder
signature on every emitted thinking block (text, content+tool, tool-only shapes)
so aimock's own record->replay round-trips stay green under strict mode; the
streaming content_block_start signature stays empty per the real wire shape.
Tool-only reasoning emission is capability-gated via resolveReasoningForModel,
consistent with the other dispatch branches.
Unit tests for validateThinkingInvariants (scope, three violation kinds,
multi-turn ordering, malformed-input guards), strict-on/off + header-override
integration, round-trip tests across all response shapes, and capability-gating
tests for the Claude tool-only reasoning path. Aligns the Anthropic drift
contract: non-empty placeholder signature on the assembled block, empty
signature on the streaming content_block_start.
@jpr5 jpr5 force-pushed the blitz/aimock-253-thinking/integration branch from 589340e to ec8424f Compare June 8, 2026 23:41
@jpr5 jpr5 merged commit 3c07705 into main Jun 8, 2026
23 checks passed
@jpr5 jpr5 deleted the blitz/aimock-253-thinking/integration branch June 8, 2026 23:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Validate Anthropic extended-thinking request invariants on tool-loop continuation

1 participant