GitHub - op12no2/patchwork: An informal cumulative and competitive frontier model eval using a Javascript chess engine

Patchwork

An informal cumulative and competitive frontier model eval using a Javascript chess engine.

Procedure

Assume A is currently the leading engine (initially 0000_original). A model/CLI is selected to improve it by creating a new engine B via prompt.md at max effort. If a B v A SPRT passes, B becomes the new leader and is added to the ratings list via a gauntlet against the previous engines that passed.

Ratings

Rank	Engine	Elo	Games	Score	Draws
1	0010_fable_5	2241 ±24.4	1600	74.8%	27.0%
2	0009_opus_4_8	2179 ±23.9	1600	67.1%	27.7%
3	0008_opus_4_8	2157 ±24.0	1600	64.2%	27.6%
4	0007_opus_4_7	2137 ±24.1	1600	61.5%	26.6%
5	0006_gpt_5_5	2042 ±23.7	1600	48.1%	24.7%
6	0005_opus_4_7	2014 ±22.9	1600	44.1%	25.2%
7	0003_opus_4_7	2003 ±22.7	1600	42.6%	25.0%
8	0002_sonnet_4_6	1905 ±23.2	1600	29.7%	18.7%
9	0000_original	1800	1600	18.0%	10.5%

SPRT

Engine		Model	CLI	SPRT
0011_grok_4_3	diff	xAI Grok 4.3	Grok Build Beta	✗
0010_fable_5	diff	Anthropic Claude Fable 5	Claude Code	✓
0009_opus_4_8	diff	Anthropic Claude Opus 4.8	Claude Code	✓
0008_opus_4_8	diff	Anthropic Claude Opus 4.8	Claude Code	✓
0007_opus_4_7	diff	Anthropic Claude Opus 4.7	Claude Code	✓
0006_gpt_5_5	diff	OpenAI GPT 5.5	Codex	✓
0005_opus_4_7	diff	Anthropic Claude Opus 4.7	Claude Code	✓
0004_gpt_5_5	diff	OpenAI GPT 5.5	Codex	✗
0003_opus_4_7	diff	Anthropic Claude Opus 4.7	Claude Code	✓
0002_sonnet_4_6	diff	Anthropic Claude Sonnet 4.6	Claude Code	✓
0001_haiku_4_5	diff	Anthropic Claude Haiku 4.5	Claude Code	✗
0000_original

Acknowledgements

https://github.com/Disservin/fastchess - SPRT and tournament manager
https://github.com/michiguel/Ordo - Elo rating calculation

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
engines		engines
gauntlet_pgn		gauntlet_pgn
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
prompt.md		prompt.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Patchwork

Procedure

Ratings

SPRT

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Patchwork

Procedure

Ratings

SPRT

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages