feat(router): add intelligent model routing with auto selectors#431
feat(router): add intelligent model routing with auto selectors#431vfeitoza wants to merge 7 commits into
Conversation
Introduces an intelligent routing layer that classifies incoming chat
requests with a cheap analyzer model and selects the best concrete model
from the provider catalog before normal resolution runs.
Clients send an intelligent selector in the model field instead of a
concrete model name:
- auto / smart balanced cost-quality selection
- auto-cost prefer the cheapest eligible model
- auto-quality prefer the highest-quality eligible model
The router operates in three modes configured via intelligent_routing.mode:
- off intelligent selectors return model_not_found (default)
- observe classifies and logs the recommendation; executes the
original requested model unchanged (dry-run)
- enforce replaces the selector with the recommended model before
provider dispatch
A pool of analyzer models is tried in order with per-call timeout and
automatic failover. When all analyzers fail, requests fall back to
intelligent_routing.fallback_model or return model_not_found if unset.
Analyzer usage is attributed to a dedicated analysis_user_path
(/intelligent-router by default) so classification cost is separated
from application request cost in usage reports.
Key components:
- internal/intelligentrouter: classifier, selector, scorer, catalog,
prompt builder, and type definitions
- config/intelligent_routing.go: configuration struct, defaults,
validation, and env var overrides
- internal/gateway/inference_intelligent.go: orchestrator hook that
rewrites the model selector before resolution
- internal/server/model_validation.go: middleware skip for intelligent
selectors to prevent premature model_not_found errors
- internal/observability/metrics.go: Prometheus counters and histograms
for routing decisions, latency, fallbacks, and low-confidence cases
- internal/virtualmodels: intelligent virtual model support with
strategy-qualified selectors (intelligent:cost, intelligent:quality)
- docs/features/intelligent-routing.mdx: full feature documentation
All existing tests pass. New tests cover config validation, selector
detection, catalog filtering, scoring, and middleware passthrough for
intelligent selectors.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds intelligent routing configuration, analysis and ranking logic, runtime wiring through server and gateway paths, selector validation bypasses, and virtual-model support. ChangesIntelligent routing
Estimated code review effort🎯 5 (Critical) | ⏱️ ~90+ minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
Confidence Score: 3/5Merge safety is moderate because the new routing hook can invoke analyzer work before the request has passed the normal authorization and budget gates. The change is well-scoped and includes tests and documentation, but the request preparation order creates a concrete billing and access-control edge case for virtual selectors. internal/gateway/inference_prepare.go
What T-Rex did
Reviews (1): Last reviewed commit: "feat(router): add intelligent model rout..." | Re-trigger Greptile |
| requested := core.NewRequestedModelSelector(*model, *provider) | ||
| requested = o.evaluateIntelligentRouter(ctx, req, requested) | ||
| *model = requested.Model | ||
| *provider = requested.ProviderHint | ||
| workflow, err := o.ensureTranslatedRequestWorkflow(ctx, meta.Workflow, meta.RequestID, meta.Endpoint, model, provider) |
There was a problem hiding this comment.
Authorize before analysis
evaluateIntelligentRouter runs the analyzer before ensureTranslatedRequestWorkflow resolves and authorizes the final model, and dispatchChatCompletion enforces budget only later. A request with model: "auto" therefore sends its prompt to the configured analyzer and can incur analyzer usage even when the caller is unauthorized for the selected model or already over budget; move intelligent evaluation after workflow authorization/budget checks or add an equivalent pre-check before classification.
There was a problem hiding this comment.
Actionable comments posted: 13
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.env.template:
- Around line 278-279: The INTELLIGENT_ROUTING_MODE and
INTELLIGENT_ROUTING_DEFAULT_STRATEGY entries still use inline value-line
comments, which can be parsed as part of the value by env loaders. Update the
.env.template entries to match the rest of the template by moving the option
descriptions to separate preceding comment lines and keeping the assignment
lines as plain values; use the existing INTELLIGENT_ROUTING_MODE and
INTELLIGENT_ROUTING_DEFAULT_STRATEGY keys to locate the affected lines.
- Line 275: Update the stale intelligent-routing documentation reference so it
points to docs/features/intelligent-routing.mdx instead of
docs/dev/intelligent-model.md. Make the same link change wherever the old path
appears in the related config/docs references, including the .env.template entry
and the intelligent routing config/docs references such as
config/intelligent_routing.go and CLAUDE.md, so all unique symbols and comments
consistently reference the new doc.
In `@CLAUDE.md`:
- Line 131: The intelligent routing docs describe the default analyzer pool with
an inconsistent provider name for the first model entry; update the wording in
CLAUDE.md so it matches the shipped examples and feature docs that use the
OpenAI provider for gpt-5.4-mini. Use the intelligent routing section and its
analyzer pool description as the target, and keep the provider/model naming
aligned with config/config.example.yaml and
docs/features/intelligent-routing.mdx so the documented default pool is
consistent.
In `@config/intelligent_routing_test.go`:
- Around line 99-111: Add coverage in
TestValidateIntelligentRoutingConfig_RatioBounds for the explicit zero boundary
on IntelligentRoutingConfig.Defaults.MinSavingsRatio: create a case with ratio 0
alongside the existing out-of-range values, then assert
ValidateIntelligentRoutingConfig accepts it and does not overwrite it. Use the
existing ValidateIntelligentRoutingConfig helper and
IntelligentRoutingConfig/Defaults.MinSavingsRatio symbols so the test locks in
the “always cheapest” behavior.
In `@config/intelligent_routing.go`:
- Around line 238-254: `applyIntelligentRoutingDefaults` is overwriting
explicitly configured zero values for `MinSavingsRatio` and `MinConfidence`,
making the documented `0` behavior unreachable. Update the defaulting logic in
`applyIntelligentRoutingDefaults` so it only seeds these fields when the config
is truly unset, not when a user intentionally set `0`; use the existing
`DefaultIntelligentRoutingConfig`/merge flow or sentinel/pointer-based presence
tracking to distinguish unset from explicit values, while keeping the validation
in `IntelligentDefaults` unchanged.
In `@docs/features/intelligent-routing.mdx`:
- Around line 234-247: The environment variable table in the intelligent routing
docs is missing several top-level config fields, so update the table in
intelligent-routing.mdx to include INTELLIGENT_ROUTING_ANALYSIS_USER_PATH,
INTELLIGENT_ROUTING_CANDIDATES_ALLOW, and INTELLIGENT_ROUTING_CANDIDATES_DENY
alongside the existing entries, or change the “All top-level fields” wording to
match what is actually documented; use the env-tagged fields from
intelligent_routing.go as the source of truth and keep the table aligned with
the .env.template values.
In `@internal/intelligentrouter/catalog.go`:
- Line 25: BuildCandidates currently accepts requestedContextChars but never
uses it, so the signature is misleading and the selector’s passed value has no
effect. Update BuildCandidates and its call sites (including the selector path)
so requestedContextChars is either incorporated into the long-context decision,
likely via modelSupportsLongContext, or removed entirely if it is not needed.
Use the existing BuildCandidates, modelSupportsLongContext, and
class.RequiresLongContext symbols to keep the candidate filtering logic
consistent.
In `@internal/intelligentrouter/prompt.go`:
- Around line 33-56: The analyzerUserPrompt summary is dropping important
routing signals by reducing non-text content to a generic placeholder and
truncating text without any size metadata. Update analyzerUserPrompt to preserve
lightweight modality hints and original-size indicators for each message so
Classifier.Classify can still infer vision, audio, and long-context requests
without exposing full payloads. Keep the change localized to analyzerUserPrompt
and its message formatting logic, using msg.Content, core.ExtractTextContent,
and the per-message preview builder to add modality flags and original
length/count summaries.
In `@internal/intelligentrouter/scorer.go`:
- Around line 69-83: Balanced scoring in scoreCandidate currently ignores
class.Confidence, so uncertain classifications still over-prioritize cheap
models. Update the default balanced branch in
internal/intelligentrouter/scorer.go to incorporate Confidence into the
weighting before cost dominates, using the existing scoreCandidate switch and
class fields like Confidence, so low-confidence requests are less likely to
select the cheapest model and stay aligned with selector expectations.
In `@internal/intelligentrouter/selector.go`:
- Around line 190-195: The meta SelectionMeta parameter in
Selector.fallbackSelector is unused, so simplify the API by removing it from the
method signature and updating all call sites that pass it, including the one
using SelectionMeta{}. Keep the behavior the same by still preferring
parseFallback(s.fallback) and otherwise returning requestedSelector(requested).
- Around line 272-274: The low-confidence metric in logDecision is still using a
hardcoded 0.7, so it can drift from the selector’s configured threshold. Add the
configured minimum confidence to Decision in Evaluate, then update
choose/logDecision to use that Decision field instead of the literal when
incrementing IntelligentRoutingLowConfidenceTotal. Use the existing choose,
Evaluate, logDecision, and Decision symbols to thread the threshold through
cleanly.
- Around line 51-54: `MinSavingsRatio` is only initialized in `NewSelector` and
not enforced during routing, so the selector can still rewrite models even when
savings are below the configured threshold. Update `Evaluate`, `choose`, or
`RankCandidates` in `selector.go` to check the computed savings ratio against
`minSavings` before allowing a rewrite, and keep the fallback default only if
the setting is meant to remain configurable; otherwise remove the unused field
entirely.
In `@internal/virtualmodels/validation.go`:
- Around line 58-60: The truncation branch in the validation logic is
unreachable because the earlier non-intelligent redirect guard already rejects
multi-target VMs, so remove the dead `!intelligent && len(vm.Targets) > 1` path
from `validateVirtualModel` in `validation.go`. Keep the behavior explicit by
relying on the existing guard/contract around `vm.Targets` instead of silently
trimming targets, and ensure the remaining validation flow in
`validateVirtualModel` clearly reflects the supported strategy rules.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 978ea972-e48d-4a79-8ebf-6d4c9b0cdee7
📒 Files selected for processing (32)
.env.templateCLAUDE.mdconfig/config.example.yamlconfig/config.goconfig/intelligent_routing.goconfig/intelligent_routing_test.godocs/features/intelligent-routing.mdxinternal/app/app.gointernal/app/intelligent_routing.gointernal/gateway/inference_intelligent.gointernal/gateway/inference_orchestrator.gointernal/gateway/inference_prepare.gointernal/intelligentrouter/catalog.gointernal/intelligentrouter/catalog_test.gointernal/intelligentrouter/classifier.gointernal/intelligentrouter/classifier_test.gointernal/intelligentrouter/prompt.gointernal/intelligentrouter/scorer.gointernal/intelligentrouter/selector.gointernal/intelligentrouter/selector_test.gointernal/intelligentrouter/selectors.gointernal/intelligentrouter/selectors_test.gointernal/intelligentrouter/types.gointernal/observability/metrics.gointernal/server/handlers.gointernal/server/http.gointernal/server/model_validation.gointernal/server/model_validation_test.gointernal/server/translated_inference_service.gointernal/virtualmodels/intelligent.gointernal/virtualmodels/snapshot.gointernal/virtualmodels/validation.go
| # model for execution. Only triggers for intelligent selectors (auto/smart/ | ||
| # auto-cost/auto-quality) or intelligent virtual models, unless mode is observe. | ||
| # Configure analyzers/selectors in config.yaml under intelligent_routing. | ||
| # See docs/dev/intelligent-model.md. |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check whether the referenced dev doc exists and list any intelligent-routing docs.
fd -t f 'intelligent' docs
rg -n 'docs/dev/intelligent-model\.md' --glob '!**/vendor/**'Repository: ENTERPILOT/GoModel
Length of output: 194
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show the relevant references and whether the referenced dev doc exists.
git ls-files '.env.template' 'config/intelligent_routing.go' 'CLAUDE.md' 'docs/dev/*' 'docs/features/*'
printf '\n--- refs ---\n'
rg -n 'docs/dev/intelligent-model\.md|intelligent-model\.md|intelligent-routing\.mdx' .env.template config/intelligent_routing.go CLAUDE.md docs -g '!**/vendor/**' || true
printf '\n--- docs/dev listing ---\n'
fd -t f . docs/dev || true
printf '\n--- docs/features listing ---\n'
fd -t f . docs/features || trueRepository: ENTERPILOT/GoModel
Length of output: 2366
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the exact lines around the references.
sed -n '268,280p' .env.template
printf '\n--- config/intelligent_routing.go ---\n'
sed -n '1,40p' config/intelligent_routing.go
printf '\n--- CLAUDE.md ---\n'
sed -n '120,140p' CLAUDE.mdRepository: ENTERPILOT/GoModel
Length of output: 8641
Update the intelligent-routing doc link
.env.template, config/intelligent_routing.go, and CLAUDE.md still reference docs/dev/intelligent-model.md; point them to docs/features/intelligent-routing.mdx instead.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.env.template at line 275, Update the stale intelligent-routing
documentation reference so it points to docs/features/intelligent-routing.mdx
instead of docs/dev/intelligent-model.md. Make the same link change wherever the
old path appears in the related config/docs references, including the
.env.template entry and the intelligent routing config/docs references such as
config/intelligent_routing.go and CLAUDE.md, so all unique symbols and comments
consistently reference the new doc.
| # INTELLIGENT_ROUTING_MODE=off # off | observe | enforce | ||
| # INTELLIGENT_ROUTING_DEFAULT_STRATEGY=balanced # cost | balanced | quality | latency |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win
Avoid inline # … comments on value lines.
Lines 278-279 put the enumerations on the same line as the assignment (INTELLIGENT_ROUTING_MODE=off # off | observe | enforce). The rest of this template keeps descriptions on separate comment lines, and several env_file loaders (e.g. Docker Compose) do not strip trailing inline comments, so an uncommented value would become off # off | observe | enforce. Move the option lists to preceding comment lines for consistency and safety.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.env.template around lines 278 - 279, The INTELLIGENT_ROUTING_MODE and
INTELLIGENT_ROUTING_DEFAULT_STRATEGY entries still use inline value-line
comments, which can be parsed as part of the value by env loaders. Update the
.env.template entries to match the rest of the template by moving the option
descriptions to separate preceding comment lines and keeping the assignment
lines as plain values; use the existing INTELLIGENT_ROUTING_MODE and
INTELLIGENT_ROUTING_DEFAULT_STRATEGY keys to locate the affected lines.
| - **HTTP client:** `HTTP_TIMEOUT` (600s), `HTTP_RESPONSE_HEADER_TIMEOUT` (600s) | ||
| - **Resilience:** Configured via `config/config.yaml` - global `resilience.retry.*` and `resilience.circuit_breaker.*` defaults with optional per-provider overrides under `providers.<name>.resilience.retry.*` and `providers.<name>.resilience.circuit_breaker.*`. Retry defaults: `max_retries` (3), `initial_backoff` (1s), `max_backoff` (30s), `backoff_factor` (2.0), `jitter_factor` (0.1). Circuit breaker defaults: `failure_threshold` (5), `success_threshold` (2), `timeout` (30s) | ||
| - **Metrics:** `METRICS_ENABLED` (false), `METRICS_ENDPOINT` (/metrics) | ||
| - **Intelligent routing:** Disabled by default. When enabled, the gateway classifies each request with a cheap analyzer model and selects the best catalog model for execution. Configured via `config/config.yaml` under `intelligent_routing` (key env vars: `INTELLIGENT_ROUTING_ENABLED`, `INTELLIGENT_ROUTING_MODE` (`off`/`observe`/`enforce`), `INTELLIGENT_ROUTING_DEFAULT_STRATEGY` (`cost`/`balanced`/`quality`/`latency`), `INTELLIGENT_ROUTING_MAX_ANALYSIS_TOKENS`, `INTELLIGENT_ROUTING_TIMEOUT`, `INTELLIGENT_ROUTING_MIN_SAVINGS_RATIO`, `INTELLIGENT_ROUTING_FALLBACK_MODEL`). It only triggers for intelligent selectors (`auto`, `smart`, `auto-cost`, `auto-quality`) or intelligent virtual models, unless `mode` is `observe` (dry-run that records the recommendation without changing the executed model). The default example pool ships with `codex/gpt-5.4-mini`, `zai/glm-5-turbo`, and `anthropic/claude-haiku-4-5` as ordered analyzers (tried in order with failover). Analysis cost is attributed to `analysis_user_path` (`/intelligent-router` by default) to keep it separate from the main execution in usage reports. See `docs/dev/intelligent-model.md`. |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Analyzer provider inconsistency with the example config.
This line documents the default analyzer pool as codex/gpt-5.4-mini, but config/config.example.yaml (Line 190-191) registers gpt-5.4-mini under provider: "openai", and docs/features/intelligent-routing.mdx also uses openai. Align these so users aren't pointed at a provider name (codex) that isn't in the shipped examples.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@CLAUDE.md` at line 131, The intelligent routing docs describe the default
analyzer pool with an inconsistent provider name for the first model entry;
update the wording in CLAUDE.md so it matches the shipped examples and feature
docs that use the OpenAI provider for gpt-5.4-mini. Use the intelligent routing
section and its analyzer pool description as the target, and keep the
provider/model naming aligned with config/config.example.yaml and
docs/features/intelligent-routing.mdx so the documented default pool is
consistent.
| func TestValidateIntelligentRoutingConfig_RatioBounds(t *testing.T) { | ||
| for _, ratio := range []float64{-0.1, 1.5} { | ||
| cfg := &IntelligentRoutingConfig{ | ||
| Enabled: true, | ||
| Mode: IntelligentRoutingModeObserve, | ||
| Analyzers: []AnalyzerModelConfig{{Model: "gpt-5.4-mini"}}, | ||
| } | ||
| cfg.Defaults.MinSavingsRatio = ratio | ||
| err := ValidateIntelligentRoutingConfig(cfg) | ||
| require.Error(t, err) | ||
| require.Contains(t, err.Error(), "min_savings_ratio") | ||
| } | ||
| } |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win
Add coverage for the min_savings_ratio: 0 boundary.
RatioBounds only tests out-of-range values. Given the defaults-overwrite behavior flagged in intelligent_routing.go, a test asserting that an explicit 0 survives validation (and is honored) would lock in the documented "always cheapest" contract.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@config/intelligent_routing_test.go` around lines 99 - 111, Add coverage in
TestValidateIntelligentRoutingConfig_RatioBounds for the explicit zero boundary
on IntelligentRoutingConfig.Defaults.MinSavingsRatio: create a case with ratio 0
alongside the existing out-of-range values, then assert
ValidateIntelligentRoutingConfig accepts it and does not overwrite it. Use the
existing ValidateIntelligentRoutingConfig helper and
IntelligentRoutingConfig/Defaults.MinSavingsRatio symbols so the test locks in
the “always cheapest” behavior.
| func applyIntelligentRoutingDefaults(d *IntelligentDefaults) { | ||
| if strings.TrimSpace(d.Strategy) == "" { | ||
| d.Strategy = IntelligentStrategyBalanced | ||
| } | ||
| if d.MaxAnalysisTokens == 0 { | ||
| d.MaxAnalysisTokens = 256 | ||
| } | ||
| if d.Timeout == 0 { | ||
| d.Timeout = 1500 * time.Millisecond | ||
| } | ||
| if d.MinSavingsRatio == 0 { | ||
| d.MinSavingsRatio = 0.15 | ||
| } | ||
| if d.MinConfidence == 0 { | ||
| d.MinConfidence = 0.7 | ||
| } | ||
| } |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
min_savings_ratio: 0 (and min_confidence: 0) are silently overwritten with defaults, breaking a documented use case.
applyIntelligentRoutingDefaults cannot distinguish an explicit 0 from an unset zero value, so it forces MinSavingsRatio to 0.15 and MinConfidence to 0.7. The validation gate at Lines 186-190 already accepts 0 (< 0 only fails), and docs/features/intelligent-routing.mdx (Line 129) explicitly documents min_savings_ratio: 0 as "always prefer the cheapest model." With this logic that configuration is unreachable.
Consider gating these defaults so they only apply when the parent config wasn't user-populated, or use sentinel/pointer fields. A pragmatic fix is to apply the float defaults only before user values are merged (i.e. seed via DefaultIntelligentRoutingConfig and drop the == 0 reset here), keeping validation as the bound check.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@config/intelligent_routing.go` around lines 238 - 254,
`applyIntelligentRoutingDefaults` is overwriting explicitly configured zero
values for `MinSavingsRatio` and `MinConfidence`, making the documented `0`
behavior unreachable. Update the defaulting logic in
`applyIntelligentRoutingDefaults` so it only seeds these fields when the config
is truly unset, not when a user intentionally set `0`; use the existing
`DefaultIntelligentRoutingConfig`/merge flow or sentinel/pointer-based presence
tracking to distinguish unset from explicit values, while keeping the validation
in `IntelligentDefaults` unchanged.
| switch strategy { | ||
| case StrategyCost: | ||
| return 10*costScore + 0.1*float64(quality) + capabilityBonus(model, class) | ||
| case StrategyQuality: | ||
| base := float64(quality) | ||
| if class.RequiresReasoning || class.QualitySensitivity == "high" { | ||
| base *= 1.5 | ||
| } | ||
| return base + capabilityBonus(model, class) | ||
| case StrategyLatency: | ||
| // Prefer cheaper tiers as a latency proxy and small-context models. | ||
| return 5*costScore + float64(4-quality) + capabilityBonus(model, class) | ||
| default: // balanced | ||
| return 4*costScore + float64(quality) + capabilityBonus(model, class) | ||
| } |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Balanced scoring ignores low-confidence classifications.
scoreCandidate never reads class.Confidence, so the balanced path still maximizes on cost even when the analyzer is unsure. That breaks the downstream contract in internal/intelligentrouter/selector_test.go:95-107, which expects low confidence to avoid the cheapest model. Fold confidence into the balanced weighting before cost dominates, or that selector path will keep recommending mini for uncertain requests.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/intelligentrouter/scorer.go` around lines 69 - 83, Balanced scoring
in scoreCandidate currently ignores class.Confidence, so uncertain
classifications still over-prioritize cheap models. Update the default balanced
branch in internal/intelligentrouter/scorer.go to incorporate Confidence into
the weighting before cost dominates, using the existing scoreCandidate switch
and class fields like Confidence, so low-confidence requests are less likely to
select the cheapest model and stay aligned with selector expectations.
| minSavings := cfg.MinSavingsRatio | ||
| if minSavings <= 0 { | ||
| minSavings = 0.15 | ||
| } |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
rg -nP '\b(minSavings|MinSavingsRatio|MinSavings)\b' --type=go -C3Repository: ENTERPILOT/GoModel
Length of output: 156
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate the target file and inspect relevant surrounding code.
git ls-files | rg '^internal/intelligentrouter/selector\.go$|^internal/intelligentrouter/'
printf '\n--- selector.go outline ---\n'
ast-grep outline internal/intelligentrouter/selector.go --view expanded || true
printf '\n--- selector.go relevant lines ---\n'
nl -ba internal/intelligentrouter/selector.go | sed -n '1,260p'
printf '\n--- search for savings-related usage in intelligentrouter ---\n'
rg -n "MinSavingsRatio|minSavings|savings ratio|savings" internal/intelligentrouter -C 3 || trueRepository: ENTERPILOT/GoModel
Length of output: 2793
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect selector.go in smaller slices if it is long.
wc -l internal/intelligentrouter/selector.go
nl -ba internal/intelligentrouter/selector.go | sed -n '1,140p'
printf '\n---\n'
nl -ba internal/intelligentrouter/selector.go | sed -n '141,280p'Repository: ENTERPILOT/GoModel
Length of output: 239
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '%s\n' '--- exact references across repo ---'
rg -n "MinSavingsRatio|minSavings" . -g '!**/vendor/**' || true
printf '\n%s\n' '--- selector.go with line numbers via cat -n ---'
sed -n '1,320p' internal/intelligentrouter/selector.go | cat -n
printf '\n%s\n' '--- scorer.go outline ---'
ast-grep outline internal/intelligentrouter/scorer.go --view expanded || true
printf '\n%s\n' '--- scorer.go relevant lines ---'
sed -n '1,260p' internal/intelligentrouter/scorer.go | cat -nRepository: ENTERPILOT/GoModel
Length of output: 19745
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '%s\n' '--- selector.go (full, numbered) ---'
sed -n '1,320p' internal/intelligentrouter/selector.go | cat -n
printf '\n%s\n' '--- scorer.go (full, numbered) ---'
sed -n '1,260p' internal/intelligentrouter/scorer.go | cat -nRepository: ENTERPILOT/GoModel
Length of output: 17631
Apply MinSavingsRatio before rewriting. minSavings is only defaulted and stored in NewSelector; Evaluate, choose, and RankCandidates never use it, so enforce mode can still switch models even when projected savings are below the configured threshold. Gate the rewrite on the savings ratio, or remove the setting if it’s intentionally unused.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/intelligentrouter/selector.go` around lines 51 - 54,
`MinSavingsRatio` is only initialized in `NewSelector` and not enforced during
routing, so the selector can still rewrite models even when savings are below
the configured threshold. Update `Evaluate`, `choose`, or `RankCandidates` in
`selector.go` to check the computed savings ratio against `minSavings` before
allowing a rewrite, and keep the fallback default only if the setting is meant
to remain configurable; otherwise remove the unused field entirely.
| func (s *Selector) fallbackSelector(requested core.RequestedModelSelector, meta SelectionMeta) core.ModelSelector { | ||
| if fb := parseFallback(s.fallback); fb.Model != "" { | ||
| return fb | ||
| } | ||
| return requestedSelector(requested) | ||
| } |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value
Unused meta parameter in fallbackSelector.
meta SelectionMeta is never read; both call sites pass it (one as SelectionMeta{}). Drop it to simplify the signature.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/intelligentrouter/selector.go` around lines 190 - 195, The meta
SelectionMeta parameter in Selector.fallbackSelector is unused, so simplify the
API by removing it from the method signature and updating all call sites that
pass it, including the one using SelectionMeta{}. Keep the behavior the same by
still preferring parseFallback(s.fallback) and otherwise returning
requestedSelector(requested).
| if d.Confidence > 0 && d.Confidence < 0.7 { | ||
| observability.IntelligentRoutingLowConfidenceTotal.WithLabelValues(d.Mode, d.Strategy).Inc() | ||
| } |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Low-confidence metric uses a hardcoded 0.7 instead of the configured minConf.
choose keys its low-confidence behavior off the configurable s.minConf, but logDecision records IntelligentRoutingLowConfidenceTotal using a literal 0.7. When MinConfidence is overridden, the metric will diverge from the actual decision threshold. Since logDecision is a package function without access to the selector, thread the threshold through the Decision (e.g. a MinConfidence field set in Evaluate).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/intelligentrouter/selector.go` around lines 272 - 274, The
low-confidence metric in logDecision is still using a hardcoded 0.7, so it can
drift from the selector’s configured threshold. Add the configured minimum
confidence to Decision in Evaluate, then update choose/logDecision to use that
Decision field instead of the literal when incrementing
IntelligentRoutingLowConfidenceTotal. Use the existing choose, Evaluate,
logDecision, and Decision symbols to thread the threshold through cleanly.
| if !intelligent && len(vm.Targets) > 1 { | ||
| vm.Targets = vm.Targets[:1] | ||
| } |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value
Dead truncation branch — unreachable given the line 34 guard.
For non-intelligent strategies, line 34 already rejects any redirect where len(vm.Targets) > 1 (multi-target "not yet supported"). So by the time execution reaches line 58, a non-intelligent VM always has len(vm.Targets) <= 1, making !intelligent && len(vm.Targets) > 1 impossible. This contradicts the apparent intent (silently truncating extras) and may mislead future maintainers about the real contract.
♻️ Remove the unreachable branch
- if !intelligent && len(vm.Targets) > 1 {
- vm.Targets = vm.Targets[:1]
- }Based on coding guidelines: "Keep the implementation explicit and maintainable rather than relying on clever abstractions" and follow KISS.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if !intelligent && len(vm.Targets) > 1 { | |
| vm.Targets = vm.Targets[:1] | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/virtualmodels/validation.go` around lines 58 - 60, The truncation
branch in the validation logic is unreachable because the earlier
non-intelligent redirect guard already rejects multi-target VMs, so remove the
dead `!intelligent && len(vm.Targets) > 1` path from `validateVirtualModel` in
`validation.go`. Keep the behavior explicit by relying on the existing
guard/contract around `vm.Targets` instead of silently trimming targets, and
ensure the remaining validation flow in `validateVirtualModel` clearly reflects
the supported strategy rules.
Source: Coding guidelines
…bled newIntelligentRouterFromConfig returns a nil *intelligentrouter.Selector when the feature is inactive (the default). Assigning that typed nil directly to the gateway.IntelligentRouter interface field produced a non-nil interface wrapping a nil pointer, so the orchestrator's `== nil` guard was skipped and ShouldEvaluate panicked on every chat request, surfacing as HTTP 500. Only assign the router to the interface when the concrete pointer is non-nil, keeping the field at a true nil interface otherwise. Add a reflection-based typed-nil check in evaluateIntelligentRouter as defense in depth against future regressions. Reproduced and verified against the full integration suite.
SantiagoDePolonia
left a comment
There was a problem hiding this comment.
General thoughts
I definitely want to add this kind of feature. Also, the most probable scenario is that this kind of feature will be in paid version of GoModel (because I need to pay my bills).
The biggest problem I have with this kind of huge external PRs is that they are not composed into the vision - which I still mainly have in my head so far. Also, reviewing 2,500 lines of code line-by-line is impossible with my limited capacity.
The solutions I see are:
- working full-time with someone, so we can be aligned on the same vision
- agreeing on the specification before implementing huge changes like this
- or just better, more adjusted to the vision PRs - but it might be hard
Considering that I don't have money to hire anyone full-time right now - or even to pay myself - only the 2nd and 3rd option is valid.
The problem with the second option is that sharing those specifications is very often... more time-consuming than developing and testing the changes.
3rd option is still valid
Detailed feedback
I see those changes as kind of cumbersome. There are a lot of env variables / config.yaml properties, and they are not easy to understand what those change.
Also, those changes are not in line with the recently improved virtual models, which actually allow creating aliases like auto, smart, etc. There shouldn't be another mechanism for this. In this case:
selectors:
- name: "auto"
strategy: "balanced"
- name: "smart"
strategy: "balanced"
- name: "auto-cost"
strategy: "cost"
- name: "auto-quality"
strategy: "quality"shouldn't be there, because we already have virtual models for that.
Also, as far as I understand, this PR also:
- misses the UI configuration for this feature
- doesn't address the prompt caching persistence issue, especially for the
/responsesendpoint
Also, I can't guarantee that I'll merge it after the adjustments described here. If you want to give it a try - go ahead.
| Enabled bool `yaml:"enabled" env:"INTELLIGENT_ROUTING_ENABLED"` | ||
|
|
||
| // Mode selects how decisions are applied: | ||
| // - off: no analysis; intelligent selectors behave as unknown models |
There was a problem hiding this comment.
When we have Enabled flag, do we need to have 'off' mode here? I think observe and enforce might be enough. With default to observe.
| Analyzers []AnalyzerModelConfig `yaml:"analyzers"` | ||
|
|
||
| // Defaults holds resolved defaults shared by every selector. | ||
| Defaults IntelligentDefaults `yaml:"defaults"` |
There was a problem hiding this comment.
I'm not sure why we need this one - another type/structure to hold defaults?
| # See docs/dev/intelligent-model.md. | ||
| # ---------------------------------------------------------------------------- | ||
| # INTELLIGENT_ROUTING_ENABLED=false | ||
| # INTELLIGENT_ROUTING_MODE=off # off | observe | enforce |
There was a problem hiding this comment.
Again, IIUC:
INTELLIGENT_ROUTING_ENABLED=false AND INTELLIGENT_ROUTING_MODE=off ===
INTELLIGENT_ROUTING_ENABLED=true AND INTELLIGENT_ROUTING_MODE=off
What's the reason for mode=off ?
…nd gradual context penalty Refine the candidate ranking so the cost and context dimensions contribute signal only when they actually discriminate between models, instead of flooding the weighted sum with neutral noise. - Cost abstention: when every candidate shares the same estimated cost (the common case for gateways whose model registry has no pricing), the cost dimension contributes nothing, letting quality and capability decide. The cost score is now computed once across all candidates in RankCandidates and passed into scoreCandidate. - Free-model tier: models tagged "free", "local", or "self-hosted" (local Ollama/vLLM) now win the cost dimension at 1.0, while paid models are capped at 0.5 when a free model is present, preserving a proportional, visible gap. - Gradual context penalty: requests that approach a model's context window receive a proportional penalty instead of a binary exclude. Requests above 80% of the window decay from 1.0 toward 0.10; requests that exceed it are still hard-excluded. The request size is now estimated from message text and passed through the Candidate.ContextScore field. All existing tests pass. New table-driven tests cover cost abstention, the free-model advantage, the risk-zone penalty, the hard exclude, and unknown windows.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/intelligentrouter/catalog_test.go`:
- Around line 148-172: Update the RankCandidates cost-dimension coverage in
TestRankCandidates_FreeModelWinsOnCostDimension so it also exercises a
free/local model with no Pricing alongside other candidates that likewise omit
Pricing. Use BuildCandidates and RankCandidates to verify the free-tag signal
still wins instead of cost abstention hiding the behavior, and keep the existing
free/paid case as a separate check if needed.
In `@internal/intelligentrouter/scorer.go`:
- Around line 191-205: The cost-scoring logic in the scorer path is returning
too early when minCost/maxCost are flat, which prevents explicit free-tier
signals from being applied. In the cost evaluation around the minCost/maxCost
spread check, make sure free/local/self-hosted tags are converted into a
non-zero cost signal before the abstention branch can return zeroed scores.
Adjust the logic so the early return only happens after tag-based cost inference
has been considered, keeping the weighted sum able to prefer tagged free models
even when pricing data is missing.
In `@internal/intelligentrouter/selector.go`:
- Around line 231-238: The estimateRequestChars helper is counting UTF-8 bytes
instead of characters, which can overestimate non-ASCII prompts and affect
routing. Update the loop in estimateRequestChars to count runes for each
message’s extracted text content rather than using len on the string, and keep
the nil req handling unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7c01fa73-f17a-457b-87e2-19e9f5fb105a
📒 Files selected for processing (4)
internal/intelligentrouter/catalog.gointernal/intelligentrouter/catalog_test.gointernal/intelligentrouter/scorer.gointernal/intelligentrouter/selector.go
| // TestRankCandidates_FreeModelWinsOnCostDimension verifies Item 2: a model | ||
| // tagged "free" beats paid models on the cost dimension, and paid models are | ||
| // capped proportionally so the gap stays visible. | ||
| func TestRankCandidates_FreeModelWinsOnCostDimension(t *testing.T) { | ||
| paid := 1.0 | ||
| free := 0.0 | ||
| cat := fakeCatalog{ | ||
| models: []core.Model{ | ||
| {ID: "free-local", Metadata: &core.ModelMetadata{ | ||
| Modes: []string{"chat"}, Tags: []string{"free", "local"}, | ||
| Pricing: &core.ModelPricing{InputPerMtok: &free, OutputPerMtok: &free}, | ||
| }}, | ||
| {ID: "cheap-paid", Metadata: &core.ModelMetadata{ | ||
| Modes: []string{"chat"}, Tags: []string{"mini"}, | ||
| Pricing: &core.ModelPricing{InputPerMtok: &paid, OutputPerMtok: &paid}, | ||
| }}, | ||
| }, | ||
| provider: map[string]string{"free-local": "ollama", "cheap-paid": "openai"}, | ||
| } | ||
|
|
||
| cands := BuildCandidates(cat, CandidateFilter{}, nil, Classification{}, 0) | ||
| ranked := RankCandidates(cands, nil, StrategyCost, Classification{}) | ||
| require.NotEmpty(t, ranked) | ||
| require.Equal(t, "free-local", ranked[0].Candidate.Selector.Model) | ||
| } |
There was a problem hiding this comment.
🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win
Add coverage for free-tagged models without pricing.
This test only proves the free path when paid pricing is present. Add a case where "free"/"local" has no Pricing and other candidates also lack pricing, so cost abstention cannot silently bypass the free-tier signal. As per coding guidelines, **/*_test.go: “Add or update tests for behavior changes.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/intelligentrouter/catalog_test.go` around lines 148 - 172, Update
the RankCandidates cost-dimension coverage in
TestRankCandidates_FreeModelWinsOnCostDimension so it also exercises a
free/local model with no Pricing alongside other candidates that likewise omit
Pricing. Use BuildCandidates and RankCandidates to verify the free-tag signal
still wins instead of cost abstention hiding the behavior, and keep the existing
free/paid case as a separate check if needed.
Source: Coding guidelines
| func estimateRequestChars(req *core.ChatRequest) int { | ||
| if req == nil { | ||
| return 0 | ||
| } | ||
| total := 0 | ||
| for _, msg := range req.Messages { | ||
| total += len(core.ExtractTextContent(msg.Content)) | ||
| } |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify the estimator no longer uses byte length for extracted chat text.
rg -n 'estimateRequestChars|len\(core\.ExtractTextContent|RuneCountInString' internal/intelligentrouter internal/coreRepository: ENTERPILOT/GoModel
Length of output: 827
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '210,250p' internal/intelligentrouter/selector.go
printf '\n---\n'
sed -n '130,160p' internal/intelligentrouter/selector.goRepository: ENTERPILOT/GoModel
Length of output: 2560
Count runes instead of bytes in estimateRequestChars. UTF-8 text is currently overcounted, which can skew routing toward larger models for non-ASCII prompts.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/intelligentrouter/selector.go` around lines 231 - 238, The
estimateRequestChars helper is counting UTF-8 bytes instead of characters, which
can overestimate non-ASCII prompts and affect routing. Update the loop in
estimateRequestChars to count runes for each message’s extracted text content
rather than using len on the string, and keep the nil req handling unchanged.
Allow operators to attach routing_guidance to model metadata so the analyzer prompt can see explicit hints about when a model should be preferred. The classifier now builds its prompt from a first-pass candidate pool so guided models are visible before classification runs, while the final candidate list is still rebuilt afterward to apply capability-derived filters. Improve analyzer robustness by giving the same analyzer one repair attempt when its first response is not valid JSON. The repair call reuses the system prompt but asks only for the compact JSON object, without re-sending the user request. If repair succeeds, the router keeps the same analyzer; otherwise failover continues to the next analyzer as before. Also document routing_guidance in the intelligent routing feature guide.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/intelligentrouter/classifier_test.go (1)
23-49: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAssert that the repair call replays the original analyzer prompt.
These tests only prove a second call happened. Because the fake executor ignores the request body, they still pass when the repair path drops the original prompt and returns arbitrary JSON. Capture the second
ChatRequestand assert it still contains the request summary. Based on learnings: Add or update tests for behavior changes, with coverage for error handling.Also applies to: 156-193
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/intelligentrouter/classifier_test.go` around lines 23 - 49, The repair-path test is only verifying that a second call occurs, but not that the original analyzer prompt is preserved. Update the fake executor in fakeExecutor.ChatCompletion (or the specific test harness using it) to capture the full second ChatRequest, then add an assertion that the retry/repair request still includes the original request summary or analyzer prompt content. Use the existing ChatCompletion, ChatRequest, and repair-call test setup to ensure the second request body is validated, not just the model call count.Source: Learnings
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/intelligentrouter/classifier.go`:
- Around line 141-160: The repair path in Classifier.repairClassification is
missing the original request context, so it can return valid JSON for the wrong
intent. Update the repair call in Classifier.repairClassification to include the
original request prompt/summary from the classification attempt alongside the
JSON-only instruction, using the existing callCtx/analyzer flow so the model
repairs formatting without changing the underlying classification target.
---
Outside diff comments:
In `@internal/intelligentrouter/classifier_test.go`:
- Around line 23-49: The repair-path test is only verifying that a second call
occurs, but not that the original analyzer prompt is preserved. Update the fake
executor in fakeExecutor.ChatCompletion (or the specific test harness using it)
to capture the full second ChatRequest, then add an assertion that the
retry/repair request still includes the original request summary or analyzer
prompt content. Use the existing ChatCompletion, ChatRequest, and repair-call
test setup to ensure the second request body is validated, not just the model
call count.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 68850f2e-9b68-43be-b119-ce8dbb544459
📒 Files selected for processing (6)
docs/features/intelligent-routing.mdxinternal/core/types.gointernal/intelligentrouter/classifier.gointernal/intelligentrouter/classifier_test.gointernal/intelligentrouter/prompt.gointernal/intelligentrouter/selector.go
| repaired, repairErr := c.repairClassification(callCtx, analyzer, temperature, maxTokens) | ||
| if repairErr == nil { | ||
| slog.Info("intelligent router analyzer repair succeeded", | ||
| "analyzer", analyzer.QualifiedModel(), | ||
| ) | ||
| return repaired, analyzer, nil | ||
| } | ||
| return Classification{}, analyzer, fmt.Errorf("parse analyzer response: %w (repair failed: %v)", err, repairErr) | ||
| } | ||
|
|
||
| func (c *Classifier) repairClassification(ctx context.Context, analyzer core.ModelSelector, temperature float64, maxTokens int) (Classification, error) { | ||
| resp, err := c.executor.ChatCompletion(ctx, &core.ChatRequest{ | ||
| Model: analyzer.Model, | ||
| Provider: analyzer.Provider, | ||
| Temperature: &temperature, | ||
| MaxTokens: &maxTokens, | ||
| Messages: []core.Message{ | ||
| {Role: "system", Content: analyzerSystemPrompt}, | ||
| {Role: "user", Content: "Your previous response was invalid JSON. Return ONLY the compact JSON object described in the system prompt."}, | ||
| }, |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Pass the original request prompt into the repair call.
Line 159 asks the analyzer to resend valid JSON, but repairClassification no longer includes the original request summary. A syntactically valid repair can therefore succeed with an unrelated classification and misroute the request instead of just recovering from formatting noise.
Suggested fix
- repaired, repairErr := c.repairClassification(callCtx, analyzer, temperature, maxTokens)
+ repaired, repairErr := c.repairClassification(callCtx, analyzer, prompt, temperature, maxTokens)
...
-func (c *Classifier) repairClassification(ctx context.Context, analyzer core.ModelSelector, temperature float64, maxTokens int) (Classification, error) {
+func (c *Classifier) repairClassification(ctx context.Context, analyzer core.ModelSelector, prompt string, temperature float64, maxTokens int) (Classification, error) {
resp, err := c.executor.ChatCompletion(ctx, &core.ChatRequest{
Model: analyzer.Model,
Provider: analyzer.Provider,
Temperature: &temperature,
MaxTokens: &maxTokens,
Messages: []core.Message{
{Role: "system", Content: analyzerSystemPrompt},
- {Role: "user", Content: "Your previous response was invalid JSON. Return ONLY the compact JSON object described in the system prompt."},
+ {Role: "user", Content: prompt + "\n\nYour previous response was invalid JSON. Return ONLY the compact JSON object described in the system prompt."},
},
})📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| repaired, repairErr := c.repairClassification(callCtx, analyzer, temperature, maxTokens) | |
| if repairErr == nil { | |
| slog.Info("intelligent router analyzer repair succeeded", | |
| "analyzer", analyzer.QualifiedModel(), | |
| ) | |
| return repaired, analyzer, nil | |
| } | |
| return Classification{}, analyzer, fmt.Errorf("parse analyzer response: %w (repair failed: %v)", err, repairErr) | |
| } | |
| func (c *Classifier) repairClassification(ctx context.Context, analyzer core.ModelSelector, temperature float64, maxTokens int) (Classification, error) { | |
| resp, err := c.executor.ChatCompletion(ctx, &core.ChatRequest{ | |
| Model: analyzer.Model, | |
| Provider: analyzer.Provider, | |
| Temperature: &temperature, | |
| MaxTokens: &maxTokens, | |
| Messages: []core.Message{ | |
| {Role: "system", Content: analyzerSystemPrompt}, | |
| {Role: "user", Content: "Your previous response was invalid JSON. Return ONLY the compact JSON object described in the system prompt."}, | |
| }, | |
| repaired, repairErr := c.repairClassification(callCtx, analyzer, prompt, temperature, maxTokens) | |
| if repairErr == nil { | |
| slog.Info("intelligent router analyzer repair succeeded", | |
| "analyzer", analyzer.QualifiedModel(), | |
| ) | |
| return repaired, analyzer, nil | |
| } | |
| return Classification{}, analyzer, fmt.Errorf("parse analyzer response: %w (repair failed: %v)", err, repairErr) | |
| } | |
| func (c *Classifier) repairClassification(ctx context.Context, analyzer core.ModelSelector, prompt string, temperature float64, maxTokens int) (Classification, error) { | |
| resp, err := c.executor.ChatCompletion(ctx, &core.ChatRequest{ | |
| Model: analyzer.Model, | |
| Provider: analyzer.Provider, | |
| Temperature: &temperature, | |
| MaxTokens: &maxTokens, | |
| Messages: []core.Message{ | |
| {Role: "system", Content: analyzerSystemPrompt}, | |
| {Role: "user", Content: prompt + "\n\nYour previous response was invalid JSON. Return ONLY the compact JSON object described in the system prompt."}, | |
| }, |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/intelligentrouter/classifier.go` around lines 141 - 160, The repair
path in Classifier.repairClassification is missing the original request context,
so it can return valid JSON for the wrong intent. Update the repair call in
Classifier.repairClassification to include the original request prompt/summary
from the classification attempt alongside the JSON-only instruction, using the
existing callCtx/analyzer flow so the model repairs formatting without changing
the underlying classification target.
Add optional conversation-aware memory to the intelligent router so repeated requests in the same conversation can keep model choice more consistent across multi-turn sessions. Clients can now send X-GoModel-Conversation-ID on translated chat/responses routes. The transport metadata is propagated into the intelligent routing selection meta, where the selector reads a short in-memory history keyed by (user_path, conversation_id) and includes it in the analyzer prompt as "Previous routing decisions". When intelligent routing applies a concrete model in enforce mode, the router records the applied qualified model back into the same in-memory store. Entries expire automatically after one hour and each conversation is capped to the most recent 50 routing decisions. Also document the optional header in the intelligent routing feature guide and add unit coverage for header extraction, history prompt rendering, memory expiry/limits, and applied-model recording.
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/intelligentrouter/selector.go (1)
136-147: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winRecord the applied fallback before this early return.
When classification fails in
enforcemode,decision.applyForMode(...)can still produce an applied fallback, but this branch returns beforeaddRoutingDecision(...)runs. That drops the applied model from conversation memory and breaks the new continuity contract on analyzer outages.Proposed fix
if err != nil { decision.AnalysisFailed = true decision.Analyzers = s.classifier.Analyzers() decision.Duration = time.Since(start) decision.SelectedModel = s.fallbackSelector(requested, meta) decision.Reason = "analysis failed: " + err.Error() // In enforce, still apply the fallback so analysis failure never blocks the // user's request; in observe, execute the requested model unchanged. decision.applyForMode(s.mode, requested) + if decision.Applied && decision.AppliedModel.Model != "" && strings.TrimSpace(meta.ConversationID) != "" { + addRoutingDecision(meta.UserPath, meta.ConversationID, decision.AppliedModel.QualifiedModel()) + } logDecision(decision) return decision }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/intelligentrouter/selector.go` around lines 136 - 147, Record the applied fallback before returning from the classification failure branch in selector.go. In the ClassifyWithCandidates error path inside the routing logic, keep the existing decision.applyForMode(s.mode, requested) call, but also persist the resulting applied model by invoking addRoutingDecision with the same request/decision state before logDecision and return. This ensures the fallback chosen by s.fallbackSelector and applyForMode is captured in conversation memory even when analysis fails.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/features/intelligent-routing.mdx`:
- Around line 278-282: Clarify in the intelligent-routing docs that GoModel’s
in-memory history is only written when routing actually applies a model, not for
every request with X-GoModel-Conversation-ID. Update the description around the
short-lived per-(user_path, conversation_id) history and “Previous routing
decisions” to note that observe-only deployments remain stateless and do not
accumulate history unless a model is applied.
In `@internal/gateway/inference_prepare.go`:
- Around line 97-99: The routing update in evaluateIntelligentRouter is being
overwritten because ensureTranslatedRequestWorkflow still receives the original
meta.Workflow, so the stale resolved selector can replace the new model/provider
choice. Update inference_prepare.go so the workflow passed into
ensureTranslatedRequestWorkflow is dropped or rebuilt when
requested.Model/requested.ProviderHint change after evaluateIntelligentRouter,
and make sure ApplyResolvedSelector uses the routed selector instead of the
preexisting one. Add a regression test covering the existing workflow plus
enforced reroute path to verify the routed choice is preserved.
In `@internal/intelligentrouter/conversation_memory.go`:
- Around line 27-33: The routing memory key in routingMemoryKey is built by
concatenating userPath and conversationID with a delimiter, which can collide
when either value contains "::". Replace this with a structured key approach,
such as a small composite struct or nested map keyed by the two fields, and
update any callers or map usages in conversation_memory.go accordingly so
distinct tuples cannot alias each other.
In `@internal/intelligentrouter/selector_test.go`:
- Around line 109-140: Add a regression test for the Evaluate analysis-failure
fallback path in selector_test.go. The current
TestSelector_RecordsAppliedModelInConversationMemory and
TestSelector_DoesNotRecordHistoryInObserveMode only cover the happy enforce flow
and observe mode, so add a test that forces classification/analysis to fail in
ModeEnforce and verifies the fallback still records the applied model in routing
memory. Use the existing Evaluate, getRoutingHistory, defaultRoutingMemory, and
fakeExecutor symbols to locate and exercise the behavior.
---
Outside diff comments:
In `@internal/intelligentrouter/selector.go`:
- Around line 136-147: Record the applied fallback before returning from the
classification failure branch in selector.go. In the ClassifyWithCandidates
error path inside the routing logic, keep the existing
decision.applyForMode(s.mode, requested) call, but also persist the resulting
applied model by invoking addRoutingDecision with the same request/decision
state before logDecision and return. This ensures the fallback chosen by
s.fallbackSelector and applyForMode is captured in conversation memory even when
analysis fails.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: fff161ad-82ab-44c9-9c87-e75dc0198284
📒 Files selected for processing (13)
docs/features/intelligent-routing.mdxinternal/gateway/inference_intelligent.gointernal/gateway/inference_orchestrator.gointernal/gateway/inference_prepare.gointernal/intelligentrouter/classifier_test.gointernal/intelligentrouter/conversation_memory.gointernal/intelligentrouter/conversation_memory_test.gointernal/intelligentrouter/selector.gointernal/intelligentrouter/selector_test.gointernal/intelligentrouter/types.gointernal/server/request_support.gointernal/server/request_support_test.gointernal/server/translated_inference_service.go
| requested = o.evaluateIntelligentRouter(ctx, req, requested, meta) | ||
| *model = requested.Model | ||
| *provider = requested.ProviderHint |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Drop the stale workflow when routing rewrites the selector.
Lines 97-99 update model and provider, but Line 100 still passes the original meta.Workflow into ensureTranslatedRequestWorkflow. If that workflow already has a resolution, the old selector is reused and ApplyResolvedSelector overwrites the routed choice, so intelligent routing never takes effect on that path.
Proposed fix
requested := core.NewRequestedModelSelector(*model, *provider)
requested = o.evaluateIntelligentRouter(ctx, req, requested, meta)
*model = requested.Model
*provider = requested.ProviderHint
+ if meta.Workflow != nil && meta.Workflow.Resolution != nil {
+ current := meta.Workflow.Resolution.ResolvedSelector
+ if current.Model != requested.Model || current.Provider != requested.ProviderHint {
+ meta.Workflow = nil
+ }
+ }
workflow, err := o.ensureTranslatedRequestWorkflow(ctx, meta.Workflow, meta.RequestID, meta.Endpoint, model, provider)Please add a regression test for the "existing workflow + enforced reroute" case as well.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| requested = o.evaluateIntelligentRouter(ctx, req, requested, meta) | |
| *model = requested.Model | |
| *provider = requested.ProviderHint | |
| requested = o.evaluateIntelligentRouter(ctx, req, requested, meta) | |
| *model = requested.Model | |
| *provider = requested.ProviderHint | |
| if meta.Workflow != nil && meta.Workflow.Resolution != nil { | |
| current := meta.Workflow.Resolution.ResolvedSelector | |
| if current.Model != requested.Model || current.Provider != requested.ProviderHint { | |
| meta.Workflow = nil | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/gateway/inference_prepare.go` around lines 97 - 99, The routing
update in evaluateIntelligentRouter is being overwritten because
ensureTranslatedRequestWorkflow still receives the original meta.Workflow, so
the stale resolved selector can replace the new model/provider choice. Update
inference_prepare.go so the workflow passed into ensureTranslatedRequestWorkflow
is dropped or rebuilt when requested.Model/requested.ProviderHint change after
evaluateIntelligentRouter, and make sure ApplyResolvedSelector uses the routed
selector instead of the preexisting one. Add a regression test covering the
existing workflow plus enforced reroute path to verify the routed choice is
preserved.
| func routingMemoryKey(userPath, conversationID string) string { | ||
| userPath = strings.TrimSpace(userPath) | ||
| conversationID = strings.TrimSpace(conversationID) | ||
| if conversationID == "" { | ||
| return "" | ||
| } | ||
| return userPath + "::" + conversationID |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Use a structured key instead of delimiter concatenation.
userPath + "::" + conversationID aliases distinct tuples whenever either component contains ::. That can merge unrelated routing histories and feed the wrong “previous decisions” back into the analyzer. A struct key (or nested map) avoids that state-corruption edge case entirely.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/intelligentrouter/conversation_memory.go` around lines 27 - 33, The
routing memory key in routingMemoryKey is built by concatenating userPath and
conversationID with a delimiter, which can collide when either value contains
"::". Replace this with a structured key approach, such as a small composite
struct or nested map keyed by the two fields, and update any callers or map
usages in conversation_memory.go accordingly so distinct tuples cannot alias
each other.
| func TestSelector_RecordsAppliedModelInConversationMemory(t *testing.T) { | ||
| defaultRoutingMemory = &routingMemoryStore{data: make(map[string][]routingMemoryEntry)} | ||
| t.Cleanup(func() { | ||
| defaultRoutingMemory = &routingMemoryStore{data: make(map[string][]routingMemoryEntry)} | ||
| }) | ||
|
|
||
| exec := &fakeExecutor{responses: map[string]string{"a-mini": validClassification}} | ||
| s := newTestSelector(t, ModeEnforce, exec) | ||
| req := &core.ChatRequest{Messages: []core.Message{{Role: "user", Content: "hi"}}} | ||
| requested := core.NewRequestedModelSelector("auto", "") | ||
| d := s.Evaluate(context.Background(), req, requested, SelectionMeta{Mode: ModeEnforce, UserPath: "/team", ConversationID: "conv-1"}) | ||
|
|
||
| require.True(t, d.Applied) | ||
| history := getRoutingHistory("/team", "conv-1", 5) | ||
| require.Equal(t, []string{d.AppliedModel.QualifiedModel()}, history) | ||
| } | ||
|
|
||
| func TestSelector_DoesNotRecordHistoryInObserveMode(t *testing.T) { | ||
| defaultRoutingMemory = &routingMemoryStore{data: make(map[string][]routingMemoryEntry)} | ||
| t.Cleanup(func() { | ||
| defaultRoutingMemory = &routingMemoryStore{data: make(map[string][]routingMemoryEntry)} | ||
| }) | ||
|
|
||
| exec := &fakeExecutor{responses: map[string]string{"a-mini": validClassification}} | ||
| s := newTestSelector(t, ModeObserve, exec) | ||
| req := &core.ChatRequest{Messages: []core.Message{{Role: "user", Content: "hi"}}} | ||
| requested := core.NewRequestedModelSelector("auto", "") | ||
| d := s.Evaluate(context.Background(), req, requested, SelectionMeta{Mode: ModeObserve, UserPath: "/team", ConversationID: "conv-1"}) | ||
|
|
||
| require.False(t, d.Applied) | ||
| require.Nil(t, getRoutingHistory("/team", "conv-1", 5)) | ||
| } |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win
Add a regression test for enforce-mode fallback recording.
These cases only cover successful enforce and observe. The analysis-failure branch in Evaluate can still apply a fallback and return before writing to conversation memory, so this behavior is currently unprotected by tests.
As per coding guidelines, **/*_test.go: Add or update tests for behavior changes, and ensure tests cover request translation, response normalization, error handling, default configuration, and provider-specific parameter mapping.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/intelligentrouter/selector_test.go` around lines 109 - 140, Add a
regression test for the Evaluate analysis-failure fallback path in
selector_test.go. The current
TestSelector_RecordsAppliedModelInConversationMemory and
TestSelector_DoesNotRecordHistoryInObserveMode only cover the happy enforce flow
and observe mode, so add a test that forces classification/analysis to fail in
ModeEnforce and verifies the fallback still records the applied model in routing
memory. Use the existing Evaluate, getRoutingHistory, defaultRoutingMemory, and
fakeExecutor symbols to locate and exercise the behavior.
Source: Coding guidelines
Track the outcome of every provider call in a lightweight in-memory health tracker (exponential decay, configurable window/half-life, Bayesian smoothing). Candidates whose recent weighted error rate crosses the circuit-breaker threshold are hard-excluded from the intelligent-router candidate pool; others receive a proportional penalty so a degraded-but-alive model still competes at a lower score. Health outcomes are recorded at two points in the gateway: - executeWithUsage: credits the model that actually completed the request, and debits the originally-requested model when a fallback took over. - tryFallbackResponse: debits both the primary and each fallback attempt that fails before a winner is found. The IntelligentRouter interface gains RecordExecution(qualifiedModel, success) so the gateway can report outcomes without importing the intelligentrouter package directly. Config: new optional intelligent_routing.defaults.health block with defaults window=20m, half_life=5m, pseudo_counts=2.0, circuit_breaker=0.9. All values are configurable via env vars (INTELLIGENT_ROUTING_HEALTH_*).
Expose active intelligent selectors through the models endpoint and use the configured selector set during workflow resolution so custom selectors do not fail early as unsupported models.
| # virtual models, unless mode is observe. See docs/dev/intelligent-model.md. | ||
| intelligent_routing: | ||
| enabled: false | ||
| mode: "off" # off | observe | enforce; observe classifies but keeps the requested model |
There was a problem hiding this comment.
There is still 'off' mode here. What is it for?
|
I am still carefully reviewing the entire codebase and adding new ideas to the PR. |
Summary
auto,smart,auto-cost,auto-quality) in themodelfield instead of concrete model names.off(default),observe(dry-run, logs recommendation without changing execution), andenforce(rewrites the selector before dispatch).How it works
"model": "auto"(or another intelligent selector).cost,balanced,quality,latency).enforcemode, the winning model replaces the selector; the request then goes through normal authorization and provider routing.Key components
internal/intelligentrouter/— classifier, selector, scorer, catalog filtering, prompt builder, typesconfig/intelligent_routing.go— configuration struct, defaults, validation, env var overridesinternal/gateway/inference_intelligent.go— orchestrator hook that rewrites the model selector before resolutioninternal/server/model_validation.go— middleware skip for intelligent selectors to prevent prematuremodel_not_founderrorsinternal/observability/metrics.go— Prometheus counters and histograms for decisions, latency, fallbacks, and low-confidence casesinternal/virtualmodels/intelligent.go— intelligent virtual model support with strategy-qualified selectorsdocs/features/intelligent-routing.mdx— full feature documentationConfiguration
Test plan
make test-race"model": "auto"withmode: observelogsintelligent routing decisionwithout changing the executed model"model": "auto"withmode: enforcerewrites the selector and returns 200 with the selected model"model": "auto-cost"selects a cheaper model when pricing/tags are configuredfallback_modelwithanalysis_failed=truein logsmode: offreturnmodel_not_foundanalysis_user_pathin usage reportsSummary by CodeRabbit
X-GoModel-Conversation-ID, with in-memory routing history in enforce mode.