docs(benchmark): write-up, charts + QA/translation tooling (split from #429) by SantiagoDePolonia · Pull Request #430 · ENTERPILOT/GoModel

SantiagoDePolonia · 2026-06-26T12:30:55Z

Draft - split out of #429 so the core benchmark PR stays focused. Decide separately whether/where this belongs in-repo.

Contains the parts of docs/2026-06-25_aws_gateway_benchmark/ that aren't the perf benchmark itself:

Write-up + visuals - ARTICLE.md (the blog narrative; note it duplicates the enterpilot.io post and will drift), cover.png + scripts/make_cover.py, and the four SVG charts/.
qa/ - a declarative quality/correctness suite (53 cases across chat / responses / messages, streaming + non-streaming, plus audio/embeddings), run against real providers through a gateway.
translation/ - a recording-mock harness that compares how GoModel, LiteLLM, Portkey, and Bifrost translate the same request.

The reproducible perf benchmark (harness, RESULTS.md) and the refreshed docs/about/benchmarks.mdx are in #429.

🤖 Generated with Claude Code

The narrative and visuals for the June 2026 AWS gateway benchmark (ARTICLE.md, cover.png + scripts/make_cover.py, charts/), plus two tools that are co-located in the benchmark folder but are separate from the perf benchmark itself: - qa/ a declarative quality/correctness suite (53 cases across dialects and modalities, run against real providers through a gateway) - translation/ a recording-mock harness comparing how each gateway translates the same request Split out from the benchmark PR (#429) so the core benchmark stays focused. Opened as a draft pending a decision on whether/where this belongs in-repo. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-06-26T12:31:03Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f47be301-74b7-4a5d-9071-5693e9de8ee5

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch docs/benchmark-writeup-and-tooling

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

codecov-commenter · 2026-06-26T12:34:06Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

mintlify · 2026-06-26T12:36:21Z

Preview deployment for your docs. Learn more about Mintlify Previews.

Project	Status	Preview	Updated (UTC)
gomodel	🟢 Ready	View Preview	Jun 26, 2026, 12:36 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

Add ARTICLE2.md, the measured "Benchmarking AI Gateways" variant of the benchmark write-up, alongside the existing ARTICLE.md, plus its cover (cover-b.png) and generator (make_cover_b.py). Reuses the shared charts and cover.png already in this PR. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

QA suite: isolate per-case errors (evaluate() now inside the try) and support ${var} interpolation in expect blocks; assert conversation object identity (get/update/delete/use_in_responses), batch-embedding ordering, and a streaming usage record; drop non-primary "green" from the colors oracle; coerce contains/not_contains operands to str; guard report modality against non-list values. Translation tooling: fail fast on a failed mock reset, reject unknown --gateways values, pin peer gateway images by digest, escape AI-authored Markdown cells, fix the GoModel port and a fenced-block language in the README. Write-up: clarify GoModel's open-source table cell ("Yes ‡"). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

SantiagoDePolonia mentioned this pull request Jun 26, 2026

docs(benchmark): add AWS gateway benchmark and refresh benchmarks page #429

Merged

mintlify Bot deployed to staging - docs June 26, 2026 12:36 View deployment

mintlify Bot deployed to staging - docs June 26, 2026 15:33 View deployment

mintlify Bot deployed to staging - docs June 26, 2026 15:57 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

docs(benchmark): write-up, charts + QA/translation tooling (split from #429)#430

docs(benchmark): write-up, charts + QA/translation tooling (split from #429)#430
SantiagoDePolonia wants to merge 3 commits into
mainfrom
docs/benchmark-writeup-and-tooling

SantiagoDePolonia commented Jun 26, 2026

Uh oh!

coderabbitai Bot commented Jun 26, 2026 •

edited

Loading

Review skipped

Uh oh!

codecov-commenter commented Jun 26, 2026

Uh oh!

mintlify Bot commented Jun 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Uh oh!

Conversation

SantiagoDePolonia commented Jun 26, 2026

Uh oh!

coderabbitai Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

codecov-commenter commented Jun 26, 2026

Codecov Report

Uh oh!

mintlify Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Jun 26, 2026 •

edited

Loading

mintlify Bot commented Jun 26, 2026 •

edited

Loading