Skip to content

docs(benchmark): write-up, charts + QA/translation tooling (split from #429)#430

Draft
SantiagoDePolonia wants to merge 3 commits into
mainfrom
docs/benchmark-writeup-and-tooling
Draft

docs(benchmark): write-up, charts + QA/translation tooling (split from #429)#430
SantiagoDePolonia wants to merge 3 commits into
mainfrom
docs/benchmark-writeup-and-tooling

Conversation

@SantiagoDePolonia

Copy link
Copy Markdown
Contributor

Draft - split out of #429 so the core benchmark PR stays focused. Decide separately whether/where this belongs in-repo.

Contains the parts of docs/2026-06-25_aws_gateway_benchmark/ that aren't the perf benchmark itself:

  • Write-up + visuals - ARTICLE.md (the blog narrative; note it duplicates the enterpilot.io post and will drift), cover.png + scripts/make_cover.py, and the four SVG charts/.
  • qa/ - a declarative quality/correctness suite (53 cases across chat / responses / messages, streaming + non-streaming, plus audio/embeddings), run against real providers through a gateway.
  • translation/ - a recording-mock harness that compares how GoModel, LiteLLM, Portkey, and Bifrost translate the same request.

The reproducible perf benchmark (harness, RESULTS.md) and the refreshed docs/about/benchmarks.mdx are in #429.

🤖 Generated with Claude Code

The narrative and visuals for the June 2026 AWS gateway benchmark (ARTICLE.md,
cover.png + scripts/make_cover.py, charts/), plus two tools that are co-located in
the benchmark folder but are separate from the perf benchmark itself:

- qa/          a declarative quality/correctness suite (53 cases across dialects
               and modalities, run against real providers through a gateway)
- translation/ a recording-mock harness comparing how each gateway translates the
               same request

Split out from the benchmark PR (#429) so the core benchmark stays focused.
Opened as a draft pending a decision on whether/where this belongs in-repo.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f47be301-74b7-4a5d-9071-5693e9de8ee5

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/benchmark-writeup-and-tooling

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@codecov-commenter

Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@mintlify

mintlify Bot commented Jun 26, 2026

Copy link
Copy Markdown

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
gomodel 🟢 Ready View Preview Jun 26, 2026, 12:36 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

Add ARTICLE2.md, the measured "Benchmarking AI Gateways" variant of the
benchmark write-up, alongside the existing ARTICLE.md, plus its cover
(cover-b.png) and generator (make_cover_b.py). Reuses the shared charts
and cover.png already in this PR.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
QA suite: isolate per-case errors (evaluate() now inside the try) and
support ${var} interpolation in expect blocks; assert conversation object
identity (get/update/delete/use_in_responses), batch-embedding ordering,
and a streaming usage record; drop non-primary "green" from the colors
oracle; coerce contains/not_contains operands to str; guard report
modality against non-list values.

Translation tooling: fail fast on a failed mock reset, reject unknown
--gateways values, pin peer gateway images by digest, escape AI-authored
Markdown cells, fix the GoModel port and a fenced-block language in the
README.

Write-up: clarify GoModel's open-source table cell ("Yes ‡").

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants