Curated architecture pitfall scanners for AI agents and humans.
archscan is a library of architecture pitfall documents. Each pitfall is a self-contained detection check with profile references, grep hints, severity, and a concrete fix. The corpus is plain Markdown; the automated archscan CLI is a Python orchestrator that calls the Claude Agent SDK or the OpenAI Codex Python SDK. You can feed the documents to any AI coding tool that can read Markdown, or run the checks yourself.
- Python ≥ 3.10
uv— install withcurl -LsSf https://astral.sh/uv/install.sh | sh
bash <(curl -s https://raw.githubusercontent.com/ebarti/archscan/main/install.sh)Or clone and install manually:
git clone https://github.com/ebarti/archscan.git ~/.archscan
bash ~/.archscan/install.shThis:
- Clones the scanner library to
~/.archscan/ - Runs
uv sync --frozenin that directory to materialize the locked dependency set into~/.archscan/.venv/(claude-agent-sdk,codex-app-server-sdk, transitive deps) - Adds the
/archscanslash command to Claude Code globally - Symlinks the
archscanCLI (a Python entry point) to~/.local/bin/. The launcher detects~/.archscan/.venv/and re-execs under it automatically, so the symlink works from any shell
archscan needs API access to whichever backend you choose:
--cli claude(default) — setANTHROPIC_API_KEY(or one ofCLAUDE_CODE_USE_BEDROCK=1,CLAUDE_CODE_USE_VERTEX=1,CLAUDE_CODE_USE_FOUNDRY=1with the matching cloud credentials).--cli codex—CODEX_HOME=~/.codex-archscan codex loginonce. The pipeline isolates Codex state under~/.codex-archscanso it does not touch your normal Codex configuration.
The slash command is Claude Code-specific, but the terminal CLI can run the pipeline with either Claude Code or Codex.
Claude Code slash command -- open any repo and run:
/archscan python
Terminal CLI -- run the default Claude backend, or select Codex explicitly:
archscan python
archscan python --cli codexarchscan extracts your architecture profile, runs all pitfall checks in parallel, and writes a prioritized report outside the scanned repo by default. Relative --report paths resolve under ARCHSCAN_OUTPUT_DIR; the default location is $XDG_STATE_HOME/archscan/reports/ or $HOME/.local/state/archscan/reports/.
--model <model> Model for scan agents (default: claude-opus-4-7)
--effort <level> Reasoning effort: low|medium|high|max (default: high)
--profile-model <model> Model for profile extraction (default: value of --model)
--profile-effort <level> Effort for profile extraction (default: max)
--merge-model <model> Model for report merging (default: value of --model)
--merge-effort <level> Effort for report merging (default: high)
--output <format> markdown-backlog|github-issues|json|linear
--parallel <n> Max parallel agents (default: 10)
--report <file> Output path (absolute path or relative path under ARCHSCAN_OUTPUT_DIR)
--cli <claude|codex> Backend CLI for the full pipeline (default: claude)
--stability-check <sev> Re-check fail findings at/above critical|high|medium|low (default: none)
--dev Preserve the temp work dir after the run
--max-capability Shorthand: run every stage on claude-opus-4-7 at effort=max
(per-stage flags still win if set alongside it)
Examples:
archscan python --model haiku --effort low --parallel 20
archscan python --output github-issues --report audit.md
archscan python --stability-check high
archscan python --cli codexWhen --cli codex is selected, archscan runs Codex with
CODEX_HOME=~/.codex-archscan so the scan pipeline does not write into the
user's regular Codex state directory. Bootstrap that isolated home once with
CODEX_HOME=~/.codex-archscan codex login; if auth.json is missing there,
archscan fails fast with that exact command.
--cli selects the backend SDK:
claude— usesclaude-agent-sdk(Anthropic's agent runtime library).codex— usescodex-app-server(the official OpenAI Codex Python SDK). Both SDKs are installed automatically byinstall.sh. The legacyclaudeandcodexCLI binaries are no longer required.
Add .archscan.yml to your repo root to set defaults for your team:
scanner: python
model: opus
effort: high
output: github-issues
stability-check: high
exclude:
- some-pitfall-nameWith this file, contributors just run /archscan or archscan with no arguments. CLI flags always override .archscan.yml values.
bash ~/.archscan/install.shThe installer pulls the latest version, runs uv sync --frozen to reconcile the lockfile, and re-copies the slash command and CLI shim.
For development:
cd ~/Github/archscan
uv sync # match pyproject.toml + uv.lock
uv run python -m unittest discover tests
uv run archscan python # invoke the entry point in the project venvarchscan implements a single 7-stage pipeline. Architecture decision record: docs/adr/2026-04-17-single-pipeline-architecture.md. Runtime contract: spec/PIPELINE.md.
[1/7] repo knowledge -> canonical-index.json + profile.md
(cold: full canonical indexer,
warm: reconcile prior knowledge + repo diff)
|
v
[2/7] overlay resolver -> overlay-<pitfall>.json
(Python, deterministic, one per pitfall, parallel)
|
+--------+--------+
v v
needs_expansion? no missing concepts
| |
v v
[3/7] targeted expansion (skip)
(LLM, one per missing,
parallel subset)
| |
+--------+--------+
v
[4/7] evidence scan -> scan-<pitfall>.md
(LLM, one per pitfall, parallel)
|
+--------+
v
primary verdict = fail?
|
v
[5/7] challenge pass -> challenge-<pitfall>.md
(LLM, adversarial falsification,
parallel subset — label hidden)
|
v
[6/7] report merger -> final report
(LLM, serial, tools disabled)
|
v
[7/7] memory consolidation
(Python, deterministic, after report)
archscan requires every scanner to ship concepts.json, and every pitfall to declare concept_ids:. There is no fallback pipeline.
Prompts and tools used by the pipeline:
prompts/canonical-indexer.md— cold-start knowledge extraction, produces markdown profile pluscanonical-index.jsonprompts/knowledge-reconciler.md— warm-start update of persisted knowledge using the previous knowledge base plus repo diffbin/archscan-overlay— deterministic Python resolver mapping pitfallconcept_idsto canonical-index symbols, ranked evidence, token budgets, and relevant typed memoryprompts/overlay-expansion.md— only when an overlay'sneeds_expansionis trueprompts/scan-runner.md— one instance per pitfall, consumes overlay + optional expansion and may emit a reusable knowledge contribution blockprompts/challenge-pass.md— one instance perfailprimary verdict, adversarialprompts/report-merger.md— combines primary + challenge into one report, handlesupheld | weakened | overturnedoutcomesbin/archscan-knowledge-store consolidate-run— deduplicates durable memory, marks changed-file facts for revalidation, and writes memory ROI metrics
Knowledge evolution: archscan persists repo knowledge under $ARCHSCAN_CACHE_DIR/knowledge/. The first run does a cold canonical index; later runs diff the repo against the previous manifest and either reuse the stored knowledge snapshot or warm-reconcile it via prompts/knowledge-reconciler.md. Pitfall scans can append new reusable facts to the same knowledge base, and those facts are recorded in a per-run audit ledger. Accepted facts are also promoted into typed memory.json records. Challenge outcomes create durable false_positive_pattern or episodic_finding records, and future overlays inject only records relevant to the same pitfall or overlapping concepts. Because stage 4 dispatches scans in parallel, only scans that have not started yet can benefit from a contribution merged earlier in the same invocation; already-running scans keep the overlay they started with.
Persisted knowledge storage contains snapshot.json, profile.md, manifest.json, memory.json, metrics.json, and runs/<run-id>.json. memory.json stores typed records such as semantic_fact, negative_finding, false_positive_pattern, procedural_hint, repo_invariant, and episodic_finding.
Cost notes: The pipeline adds a challenge pass (~1 call per fail), an occasional expansion pass, and a deterministic memory consolidation pass. Counterbalances: the deterministic overlay cuts LLM work in stage 2, the persisted knowledge store avoids re-indexing unchanged repos, typed memory reduces repeated false positives, and content-addressed cache keys (bin/archscan-cache-key) eliminate duplicate work across reruns. The cache covers the canonical-index, evidence-scan, and challenge-pass stages under $ARCHSCAN_CACHE_DIR (default $XDG_CACHE_HOME/archscan); set ARCHSCAN_NO_CACHE=1 to bypass both cache replay and persisted repo knowledge for that run.
If you are not using the automated CLI pipeline, or prefer to run archscan step-by-step, follow the manual workflow below.
Browse scanners/ and choose the scanner matching your architecture:
| Scanner | Description | Pitfalls | Categories |
|---|---|---|---|
python |
Generic Python systems: async services, workers, SQLite persistence, subprocesses, credentials, logging, and schema evolution | 34 | 10 |
python-vulnerabilities |
Python web/API and backend vulnerabilities requiring cross-file trust-boundary reasoning | 7 | 7 |
python-agentic-runtime |
Python LLM/orchestrator runtimes: routing, MCP/tooling, autonomous queues, evals, and self-improvement loops | 51 | 13 |
distributed-architecture |
Distributed / event-driven / service-based Python architectures: pattern failure modes from Ford et al.'s Software Architecture Patterns, Antipatterns, and Pitfalls (orchestration, compensation, event contracts, supervisor, stamp coupling, sinkhole, stovepipe, externalized state) | 35 | 9 |
Give your AI agent these two documents:
prompts/profile-extractor.md-- tells the agent what to extractscanners/<scanner>/profile.md-- tells it what to look for in your specific architecture
The agent reads your codebase and produces a compact architecture profile (~1,000 tokens). This profile is shared context for all subsequent scans.
- Run
prompts/canonical-indexer.mdwith the scanner'sprofile.mdandconcepts.jsonto produceprofile.mdpluscanonical-index.json. - Run
bin/archscan-overlayfor each pitfall to produceoverlay-<pitfall>.json. - For overlays with
needs_expansion: true, runprompts/overlay-expansion.md. - Run
prompts/scan-runner.mdper pitfall using the profile, overlay, and optional expansion. - Run
prompts/challenge-pass.mdfor every primaryfail. - Run
prompts/report-merger.mdacross the scan and challenge results. - Run
bin/archscan-knowledge-store consolidate-runafter the report if persisted memory is enabled.
For an example of what the final report looks like, see examples/mestre-audit-report.md.
| Scanner | Description | Pitfalls | Categories |
|---|---|---|---|
python |
Generic Python systems with async concurrency, SQLite persistence, subprocesses, credentials, logging, and durable queues | 34 | async-python, sqlite, credential-management, security-boundaries, observability, pydantic-evolution, session-management, queue-autonomous-operation, cross-cutting |
python-vulnerabilities |
Python web/API and backend vulnerability scanner for tenant isolation, auth propagation, cache scope, webhook replay, SSRF, transaction side effects, and background-job privilege drift | 7 | tenant-isolation, authorization-propagation, cache-scope, outbound-http, webhook-intake, transaction-integrity, background-jobs |
python-agentic-runtime |
Python agentic runtimes with LLM calls, routing, MCP/tooling, autonomous queues, evals, and self-improvement workflows | 51 | async-python, multi-llm-orchestration, pattern-selection-routing, mcp-subprocess, security-boundaries, self-improvement-evals, pydantic-evolution, session-management, queue-autonomous-operation, cross-cutting |
distributed-architecture |
Distributed / event-driven / service-based Python architectures with pattern failure modes drawn from Ford et al.'s Software Architecture Patterns, Antipatterns, and Pitfalls | 35 | architecture-by-implication, contract-coupling, cross-cutting, cross-cutting-duplication, event-contract, layering-architecture, mcp-subprocess, queue-autonomous-operation, service-granularity |
Each scanner directory contains:
profile.md— extraction guide tailored to that architecture class_quick-scan.md— one-liner grep commands for rapid triage (seeManual triagebelow)concepts.json— machine-readable concept registry required by the runtime pipeline. 63 concepts forpython, 23 forpython-vulnerabilities, 99 forpython-agentic-runtime.- Individual pitfall
.mdfiles — one per pitfall, followingspec/FORMAT.md
Each scanner also ships _quick-scan.md — a human-facing cheatsheet of one-liner grep -rEn commands grouped by category. It is not part of the archscan pipeline (the CLI explicitly excludes it from parallel scanning). It is intended for humans who want a fast applicability check against a candidate codebase before committing to a full scan, or when onboarding to a new repo.
scanners/python/_quick-scan.mdscanners/python-vulnerabilities/_quick-scan.mdscanners/python-agentic-runtime/_quick-scan.md
- Run
prompts/profile-extractor.mdagainst your target architecture to understand the tech stack - Draft pitfall candidates for each category relevant to the profile, following
spec/FORMAT.mdexactly. Start from an existing scanner's pitfall as a template and adapt. - Curate the output: verify detection checks are mechanical, fixes are concrete, severities match
spec/SEVERITY.md - Place the profile and pitfall files in
scanners/<your-scanner-name>/ - Open a PR -- see
CONTRIBUTING.mdfor the quality bar
The report merger (prompts/report-merger.md) supports four output formats:
- markdown-backlog (default) -- severity-sorted table with file references, ready for sprint planning
- github-issues --
gh issue createcommands, one per finding, with labels - json -- structured array for programmatic consumption
- linear -- import-ready table with Linear priority mapping
The format and quality rules are defined in spec/:
spec/FORMAT.md-- pitfall file structure, frontmatter fields, detection check formatspec/SEVERITY.md-- severity level definitions with blast radius / frequency / reversibility matrixspec/DETECTION_METHODS.md-- LSP, grep, and read detection tiers with examples
See CONTRIBUTING.md.