Skip to content

feat(router): add intelligent model routing with auto selectors#431

Open
vfeitoza wants to merge 7 commits into
ENTERPILOT:mainfrom
vfeitoza:feat/intelligent-model-routing
Open

feat(router): add intelligent model routing with auto selectors#431
vfeitoza wants to merge 7 commits into
ENTERPILOT:mainfrom
vfeitoza:feat/intelligent-model-routing

Conversation

@vfeitoza

@vfeitoza vfeitoza commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Adds an intelligent routing layer that classifies incoming chat requests with a cheap analyzer model and selects the best concrete model from the provider catalog before normal resolution runs.
  • Clients use virtual selectors (auto, smart, auto-cost, auto-quality) in the model field instead of concrete model names.
  • Three modes: off (default), observe (dry-run, logs recommendation without changing execution), and enforce (rewrites the selector before dispatch).

How it works

  1. Client sends "model": "auto" (or another intelligent selector).
  2. Gateway intercepts before provider resolution.
  3. A cheap analyzer model classifies the request (complexity, task type, required capabilities, confidence score).
  4. The scorer ranks all eligible catalog models against the active strategy (cost, balanced, quality, latency).
  5. In enforce mode, the winning model replaces the selector; the request then goes through normal authorization and provider routing.

Key components

  • internal/intelligentrouter/ — classifier, selector, scorer, catalog filtering, prompt builder, types
  • config/intelligent_routing.go — configuration struct, defaults, validation, env var overrides
  • internal/gateway/inference_intelligent.go — orchestrator hook that rewrites the model selector before resolution
  • internal/server/model_validation.go — middleware skip for intelligent selectors to prevent premature model_not_found errors
  • internal/observability/metrics.go — Prometheus counters and histograms for decisions, latency, fallbacks, and low-confidence cases
  • internal/virtualmodels/intelligent.go — intelligent virtual model support with strategy-qualified selectors
  • docs/features/intelligent-routing.mdx — full feature documentation

Configuration

intelligent_routing:
  enabled: true
  mode: "enforce"        # off | observe | enforce
  analyzers:
    - model: "gpt-4o-mini"
      provider: "openai"
      max_tokens: 256
  defaults:
    strategy: "balanced" # cost | balanced | quality | latency
    min_savings_ratio: 0.15
    min_confidence: 0.7
  fallback_model: "openai/gpt-4o-mini"
  analysis_user_path: "/intelligent-router"

Test plan

  • Unit tests pass: make test-race
  • "model": "auto" with mode: observe logs intelligent routing decision without changing the executed model
  • "model": "auto" with mode: enforce rewrites the selector and returns 200 with the selected model
  • "model": "auto-cost" selects a cheaper model when pricing/tags are configured
  • All analyzers failing falls back to fallback_model with analysis_failed=true in logs
  • Intelligent selectors with mode: off return model_not_found
  • Analyzer usage appears under analysis_user_path in usage reports

Summary by CodeRabbit

  • New Features
    • Added Intelligent Routing for automatic model selection with off/observe/enforce modes and cost/quality/balanced/latency strategies, including analyzer-based classification, candidate allow/deny filtering, and fallback model handling.
    • Added intelligent selector aliases and intelligent virtual-model targets, plus routing-guidance-aware classification.
    • Conversation-aware routing via X-GoModel-Conversation-ID, with in-memory routing history in enforce mode.
    • Added Prometheus metrics for routing requests, decision latency, fallbacks, and low-confidence events.
  • Bug Fixes
    • Ensured intelligent selectors pass through model validation in translated requests.
  • Documentation
    • Added a complete Intelligent Routing documentation page and updated config/environment templates and examples.

Introduces an intelligent routing layer that classifies incoming chat
requests with a cheap analyzer model and selects the best concrete model
from the provider catalog before normal resolution runs.

Clients send an intelligent selector in the model field instead of a
concrete model name:

  - auto / smart   balanced cost-quality selection
  - auto-cost      prefer the cheapest eligible model
  - auto-quality   prefer the highest-quality eligible model

The router operates in three modes configured via intelligent_routing.mode:

  - off      intelligent selectors return model_not_found (default)
  - observe  classifies and logs the recommendation; executes the
             original requested model unchanged (dry-run)
  - enforce  replaces the selector with the recommended model before
             provider dispatch

A pool of analyzer models is tried in order with per-call timeout and
automatic failover. When all analyzers fail, requests fall back to
intelligent_routing.fallback_model or return model_not_found if unset.

Analyzer usage is attributed to a dedicated analysis_user_path
(/intelligent-router by default) so classification cost is separated
from application request cost in usage reports.

Key components:

  - internal/intelligentrouter: classifier, selector, scorer, catalog,
    prompt builder, and type definitions
  - config/intelligent_routing.go: configuration struct, defaults,
    validation, and env var overrides
  - internal/gateway/inference_intelligent.go: orchestrator hook that
    rewrites the model selector before resolution
  - internal/server/model_validation.go: middleware skip for intelligent
    selectors to prevent premature model_not_found errors
  - internal/observability/metrics.go: Prometheus counters and histograms
    for routing decisions, latency, fallbacks, and low-confidence cases
  - internal/virtualmodels: intelligent virtual model support with
    strategy-qualified selectors (intelligent:cost, intelligent:quality)
  - docs/features/intelligent-routing.mdx: full feature documentation

All existing tests pass. New tests cover config validation, selector
detection, catalog filtering, scoring, and middleware passthrough for
intelligent selectors.
@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds intelligent routing configuration, analysis and ranking logic, runtime wiring through server and gateway paths, selector validation bypasses, and virtual-model support.

Changes

Intelligent routing

Layer / File(s) Summary
Config contract and docs
config/intelligent_routing.go, config/config.go, config/intelligent_routing_test.go, .env.template, CLAUDE.md, config/config.example.yaml, docs/features/intelligent-routing.mdx, internal/core/types.go
intelligent_routing config is added with defaults, validation, tests, metadata, and documentation for modes, analyzers, thresholds, filters, and fallback behavior.
Classification primitives
internal/intelligentrouter/types.go, selectors.go, prompt.go, classifier.go, selectors_test.go, classifier_test.go
Selector aliases, analyzer prompts, classification parsing, and classifier failover/repair behavior are added with tests.
Candidate ranking
internal/intelligentrouter/catalog.go, scorer.go, catalog_test.go
Eligible catalog models are filtered and scored for routing strategies, with tests for filtering and ranking.
Selector decision engine
internal/observability/metrics.go, internal/intelligentrouter/selector.go, selector_test.go, internal/intelligentrouter/conversation_memory.go, conversation_memory_test.go
Routing decisions classify requests, rank candidates, apply observe/enforce mode, record conversation history, and emit intelligent-routing metrics and logs.
Startup and request wiring
internal/app/app.go, internal/app/intelligent_routing.go, internal/gateway/*.go, internal/server/handlers.go, internal/server/http.go, internal/server/translated_inference_service.go, internal/server/request_support.go, internal/server/request_support_test.go, internal/server/model_validation.go, internal/server/model_validation_test.go, internal/virtualmodels/intelligent.go, internal/virtualmodels/validation.go, internal/virtualmodels/snapshot.go
The app builds the router, injects it into inference, rewrites translated request selectors, skips validation for intelligent aliases, extracts conversation IDs, and adjusts virtual-model handling.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~90+ minutes

Possibly related PRs

  • ENTERPILOT/GoModel#264: Also changes internal/core/types.go, which this PR extends for routing guidance metadata.

Poem

🐇 I sniffed the config, hop by hop,
then watched the router choose a top.
In observe I blinked, in enforce I sped,
with metrics twinkling overhead.
A cozy burrow learned to swap and shop.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.89% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly matches the main change: adding intelligent model routing and new auto-style selectors.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@greptile-apps

greptile-apps Bot commented Jun 26, 2026

Copy link
Copy Markdown

Confidence Score: 3/5

Merge safety is moderate because the new routing hook can invoke analyzer work before the request has passed the normal authorization and budget gates.

The change is well-scoped and includes tests and documentation, but the request preparation order creates a concrete billing and access-control edge case for virtual selectors.

internal/gateway/inference_prepare.go

T-Rex T-Rex Logs

What T-Rex did

  • Created a focused Go repro test harness to exercise the analyzer invocation before final authorization denial.
  • Tried to run the focused Go repro test with go test, but the environment lacked the Go toolchain, blocking runtime verification.
  • Ran the PrepareChatRequest test path, which exercises evaluateIntelligentRouter and normal workflow resolution, observing not_found_error for the auto model before analysis and a successful enforce path for mini.
  • Compared base and head selector harness attempts, noting base lacked intelligentrouter and exited with code 127, and head run also failed due to go not found.
  • Captured analyzer fallback traces showing missing intelligentrouter and missing Go toolchain in the base run, and go: not found plus an environment check in the head run.
  • Observed the base compile failure due to missing intelligent routing config symbols, and in the after state the system resolved to default off with YAML/env resolution, plus emitted gomodel_intelligent_routing metrics.

View all artifacts

T-Rex Ran code and verified through T-Rex

Reviews (1): Last reviewed commit: "feat(router): add intelligent model rout..." | Re-trigger Greptile

Comment on lines +96 to 100
requested := core.NewRequestedModelSelector(*model, *provider)
requested = o.evaluateIntelligentRouter(ctx, req, requested)
*model = requested.Model
*provider = requested.ProviderHint
workflow, err := o.ensureTranslatedRequestWorkflow(ctx, meta.Workflow, meta.RequestID, meta.Endpoint, model, provider)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Authorize before analysis
evaluateIntelligentRouter runs the analyzer before ensureTranslatedRequestWorkflow resolves and authorizes the final model, and dispatchChatCompletion enforces budget only later. A request with model: "auto" therefore sends its prompt to the configured analyzer and can incur analyzer usage even when the caller is unauthorized for the selected model or already over budget; move intelligent evaluation after workflow authorization/budget checks or add an equivalent pre-check before classification.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.env.template:
- Around line 278-279: The INTELLIGENT_ROUTING_MODE and
INTELLIGENT_ROUTING_DEFAULT_STRATEGY entries still use inline value-line
comments, which can be parsed as part of the value by env loaders. Update the
.env.template entries to match the rest of the template by moving the option
descriptions to separate preceding comment lines and keeping the assignment
lines as plain values; use the existing INTELLIGENT_ROUTING_MODE and
INTELLIGENT_ROUTING_DEFAULT_STRATEGY keys to locate the affected lines.
- Line 275: Update the stale intelligent-routing documentation reference so it
points to docs/features/intelligent-routing.mdx instead of
docs/dev/intelligent-model.md. Make the same link change wherever the old path
appears in the related config/docs references, including the .env.template entry
and the intelligent routing config/docs references such as
config/intelligent_routing.go and CLAUDE.md, so all unique symbols and comments
consistently reference the new doc.

In `@CLAUDE.md`:
- Line 131: The intelligent routing docs describe the default analyzer pool with
an inconsistent provider name for the first model entry; update the wording in
CLAUDE.md so it matches the shipped examples and feature docs that use the
OpenAI provider for gpt-5.4-mini. Use the intelligent routing section and its
analyzer pool description as the target, and keep the provider/model naming
aligned with config/config.example.yaml and
docs/features/intelligent-routing.mdx so the documented default pool is
consistent.

In `@config/intelligent_routing_test.go`:
- Around line 99-111: Add coverage in
TestValidateIntelligentRoutingConfig_RatioBounds for the explicit zero boundary
on IntelligentRoutingConfig.Defaults.MinSavingsRatio: create a case with ratio 0
alongside the existing out-of-range values, then assert
ValidateIntelligentRoutingConfig accepts it and does not overwrite it. Use the
existing ValidateIntelligentRoutingConfig helper and
IntelligentRoutingConfig/Defaults.MinSavingsRatio symbols so the test locks in
the “always cheapest” behavior.

In `@config/intelligent_routing.go`:
- Around line 238-254: `applyIntelligentRoutingDefaults` is overwriting
explicitly configured zero values for `MinSavingsRatio` and `MinConfidence`,
making the documented `0` behavior unreachable. Update the defaulting logic in
`applyIntelligentRoutingDefaults` so it only seeds these fields when the config
is truly unset, not when a user intentionally set `0`; use the existing
`DefaultIntelligentRoutingConfig`/merge flow or sentinel/pointer-based presence
tracking to distinguish unset from explicit values, while keeping the validation
in `IntelligentDefaults` unchanged.

In `@docs/features/intelligent-routing.mdx`:
- Around line 234-247: The environment variable table in the intelligent routing
docs is missing several top-level config fields, so update the table in
intelligent-routing.mdx to include INTELLIGENT_ROUTING_ANALYSIS_USER_PATH,
INTELLIGENT_ROUTING_CANDIDATES_ALLOW, and INTELLIGENT_ROUTING_CANDIDATES_DENY
alongside the existing entries, or change the “All top-level fields” wording to
match what is actually documented; use the env-tagged fields from
intelligent_routing.go as the source of truth and keep the table aligned with
the .env.template values.

In `@internal/intelligentrouter/catalog.go`:
- Line 25: BuildCandidates currently accepts requestedContextChars but never
uses it, so the signature is misleading and the selector’s passed value has no
effect. Update BuildCandidates and its call sites (including the selector path)
so requestedContextChars is either incorporated into the long-context decision,
likely via modelSupportsLongContext, or removed entirely if it is not needed.
Use the existing BuildCandidates, modelSupportsLongContext, and
class.RequiresLongContext symbols to keep the candidate filtering logic
consistent.

In `@internal/intelligentrouter/prompt.go`:
- Around line 33-56: The analyzerUserPrompt summary is dropping important
routing signals by reducing non-text content to a generic placeholder and
truncating text without any size metadata. Update analyzerUserPrompt to preserve
lightweight modality hints and original-size indicators for each message so
Classifier.Classify can still infer vision, audio, and long-context requests
without exposing full payloads. Keep the change localized to analyzerUserPrompt
and its message formatting logic, using msg.Content, core.ExtractTextContent,
and the per-message preview builder to add modality flags and original
length/count summaries.

In `@internal/intelligentrouter/scorer.go`:
- Around line 69-83: Balanced scoring in scoreCandidate currently ignores
class.Confidence, so uncertain classifications still over-prioritize cheap
models. Update the default balanced branch in
internal/intelligentrouter/scorer.go to incorporate Confidence into the
weighting before cost dominates, using the existing scoreCandidate switch and
class fields like Confidence, so low-confidence requests are less likely to
select the cheapest model and stay aligned with selector expectations.

In `@internal/intelligentrouter/selector.go`:
- Around line 190-195: The meta SelectionMeta parameter in
Selector.fallbackSelector is unused, so simplify the API by removing it from the
method signature and updating all call sites that pass it, including the one
using SelectionMeta{}. Keep the behavior the same by still preferring
parseFallback(s.fallback) and otherwise returning requestedSelector(requested).
- Around line 272-274: The low-confidence metric in logDecision is still using a
hardcoded 0.7, so it can drift from the selector’s configured threshold. Add the
configured minimum confidence to Decision in Evaluate, then update
choose/logDecision to use that Decision field instead of the literal when
incrementing IntelligentRoutingLowConfidenceTotal. Use the existing choose,
Evaluate, logDecision, and Decision symbols to thread the threshold through
cleanly.
- Around line 51-54: `MinSavingsRatio` is only initialized in `NewSelector` and
not enforced during routing, so the selector can still rewrite models even when
savings are below the configured threshold. Update `Evaluate`, `choose`, or
`RankCandidates` in `selector.go` to check the computed savings ratio against
`minSavings` before allowing a rewrite, and keep the fallback default only if
the setting is meant to remain configurable; otherwise remove the unused field
entirely.

In `@internal/virtualmodels/validation.go`:
- Around line 58-60: The truncation branch in the validation logic is
unreachable because the earlier non-intelligent redirect guard already rejects
multi-target VMs, so remove the dead `!intelligent && len(vm.Targets) > 1` path
from `validateVirtualModel` in `validation.go`. Keep the behavior explicit by
relying on the existing guard/contract around `vm.Targets` instead of silently
trimming targets, and ensure the remaining validation flow in
`validateVirtualModel` clearly reflects the supported strategy rules.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 978ea972-e48d-4a79-8ebf-6d4c9b0cdee7

📥 Commits

Reviewing files that changed from the base of the PR and between f291083 and 9ddbb3d.

📒 Files selected for processing (32)
  • .env.template
  • CLAUDE.md
  • config/config.example.yaml
  • config/config.go
  • config/intelligent_routing.go
  • config/intelligent_routing_test.go
  • docs/features/intelligent-routing.mdx
  • internal/app/app.go
  • internal/app/intelligent_routing.go
  • internal/gateway/inference_intelligent.go
  • internal/gateway/inference_orchestrator.go
  • internal/gateway/inference_prepare.go
  • internal/intelligentrouter/catalog.go
  • internal/intelligentrouter/catalog_test.go
  • internal/intelligentrouter/classifier.go
  • internal/intelligentrouter/classifier_test.go
  • internal/intelligentrouter/prompt.go
  • internal/intelligentrouter/scorer.go
  • internal/intelligentrouter/selector.go
  • internal/intelligentrouter/selector_test.go
  • internal/intelligentrouter/selectors.go
  • internal/intelligentrouter/selectors_test.go
  • internal/intelligentrouter/types.go
  • internal/observability/metrics.go
  • internal/server/handlers.go
  • internal/server/http.go
  • internal/server/model_validation.go
  • internal/server/model_validation_test.go
  • internal/server/translated_inference_service.go
  • internal/virtualmodels/intelligent.go
  • internal/virtualmodels/snapshot.go
  • internal/virtualmodels/validation.go

Comment thread .env.template
# model for execution. Only triggers for intelligent selectors (auto/smart/
# auto-cost/auto-quality) or intelligent virtual models, unless mode is observe.
# Configure analyzers/selectors in config.yaml under intelligent_routing.
# See docs/dev/intelligent-model.md.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check whether the referenced dev doc exists and list any intelligent-routing docs.
fd -t f 'intelligent' docs
rg -n 'docs/dev/intelligent-model\.md' --glob '!**/vendor/**'

Repository: ENTERPILOT/GoModel

Length of output: 194


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Show the relevant references and whether the referenced dev doc exists.
git ls-files '.env.template' 'config/intelligent_routing.go' 'CLAUDE.md' 'docs/dev/*' 'docs/features/*'

printf '\n--- refs ---\n'
rg -n 'docs/dev/intelligent-model\.md|intelligent-model\.md|intelligent-routing\.mdx' .env.template config/intelligent_routing.go CLAUDE.md docs -g '!**/vendor/**' || true

printf '\n--- docs/dev listing ---\n'
fd -t f . docs/dev || true

printf '\n--- docs/features listing ---\n'
fd -t f . docs/features || true

Repository: ENTERPILOT/GoModel

Length of output: 2366


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the exact lines around the references.
sed -n '268,280p' .env.template
printf '\n--- config/intelligent_routing.go ---\n'
sed -n '1,40p' config/intelligent_routing.go
printf '\n--- CLAUDE.md ---\n'
sed -n '120,140p' CLAUDE.md

Repository: ENTERPILOT/GoModel

Length of output: 8641


Update the intelligent-routing doc link
.env.template, config/intelligent_routing.go, and CLAUDE.md still reference docs/dev/intelligent-model.md; point them to docs/features/intelligent-routing.mdx instead.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.env.template at line 275, Update the stale intelligent-routing
documentation reference so it points to docs/features/intelligent-routing.mdx
instead of docs/dev/intelligent-model.md. Make the same link change wherever the
old path appears in the related config/docs references, including the
.env.template entry and the intelligent routing config/docs references such as
config/intelligent_routing.go and CLAUDE.md, so all unique symbols and comments
consistently reference the new doc.

Comment thread .env.template
Comment on lines +278 to +279
# INTELLIGENT_ROUTING_MODE=off # off | observe | enforce
# INTELLIGENT_ROUTING_DEFAULT_STRATEGY=balanced # cost | balanced | quality | latency

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Avoid inline # … comments on value lines.

Lines 278-279 put the enumerations on the same line as the assignment (INTELLIGENT_ROUTING_MODE=off # off | observe | enforce). The rest of this template keeps descriptions on separate comment lines, and several env_file loaders (e.g. Docker Compose) do not strip trailing inline comments, so an uncommented value would become off # off | observe | enforce. Move the option lists to preceding comment lines for consistency and safety.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.env.template around lines 278 - 279, The INTELLIGENT_ROUTING_MODE and
INTELLIGENT_ROUTING_DEFAULT_STRATEGY entries still use inline value-line
comments, which can be parsed as part of the value by env loaders. Update the
.env.template entries to match the rest of the template by moving the option
descriptions to separate preceding comment lines and keeping the assignment
lines as plain values; use the existing INTELLIGENT_ROUTING_MODE and
INTELLIGENT_ROUTING_DEFAULT_STRATEGY keys to locate the affected lines.

Comment thread CLAUDE.md
- **HTTP client:** `HTTP_TIMEOUT` (600s), `HTTP_RESPONSE_HEADER_TIMEOUT` (600s)
- **Resilience:** Configured via `config/config.yaml` - global `resilience.retry.*` and `resilience.circuit_breaker.*` defaults with optional per-provider overrides under `providers.<name>.resilience.retry.*` and `providers.<name>.resilience.circuit_breaker.*`. Retry defaults: `max_retries` (3), `initial_backoff` (1s), `max_backoff` (30s), `backoff_factor` (2.0), `jitter_factor` (0.1). Circuit breaker defaults: `failure_threshold` (5), `success_threshold` (2), `timeout` (30s)
- **Metrics:** `METRICS_ENABLED` (false), `METRICS_ENDPOINT` (/metrics)
- **Intelligent routing:** Disabled by default. When enabled, the gateway classifies each request with a cheap analyzer model and selects the best catalog model for execution. Configured via `config/config.yaml` under `intelligent_routing` (key env vars: `INTELLIGENT_ROUTING_ENABLED`, `INTELLIGENT_ROUTING_MODE` (`off`/`observe`/`enforce`), `INTELLIGENT_ROUTING_DEFAULT_STRATEGY` (`cost`/`balanced`/`quality`/`latency`), `INTELLIGENT_ROUTING_MAX_ANALYSIS_TOKENS`, `INTELLIGENT_ROUTING_TIMEOUT`, `INTELLIGENT_ROUTING_MIN_SAVINGS_RATIO`, `INTELLIGENT_ROUTING_FALLBACK_MODEL`). It only triggers for intelligent selectors (`auto`, `smart`, `auto-cost`, `auto-quality`) or intelligent virtual models, unless `mode` is `observe` (dry-run that records the recommendation without changing the executed model). The default example pool ships with `codex/gpt-5.4-mini`, `zai/glm-5-turbo`, and `anthropic/claude-haiku-4-5` as ordered analyzers (tried in order with failover). Analysis cost is attributed to `analysis_user_path` (`/intelligent-router` by default) to keep it separate from the main execution in usage reports. See `docs/dev/intelligent-model.md`.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Analyzer provider inconsistency with the example config.

This line documents the default analyzer pool as codex/gpt-5.4-mini, but config/config.example.yaml (Line 190-191) registers gpt-5.4-mini under provider: "openai", and docs/features/intelligent-routing.mdx also uses openai. Align these so users aren't pointed at a provider name (codex) that isn't in the shipped examples.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@CLAUDE.md` at line 131, The intelligent routing docs describe the default
analyzer pool with an inconsistent provider name for the first model entry;
update the wording in CLAUDE.md so it matches the shipped examples and feature
docs that use the OpenAI provider for gpt-5.4-mini. Use the intelligent routing
section and its analyzer pool description as the target, and keep the
provider/model naming aligned with config/config.example.yaml and
docs/features/intelligent-routing.mdx so the documented default pool is
consistent.

Comment on lines +99 to +111
func TestValidateIntelligentRoutingConfig_RatioBounds(t *testing.T) {
for _, ratio := range []float64{-0.1, 1.5} {
cfg := &IntelligentRoutingConfig{
Enabled: true,
Mode: IntelligentRoutingModeObserve,
Analyzers: []AnalyzerModelConfig{{Model: "gpt-5.4-mini"}},
}
cfg.Defaults.MinSavingsRatio = ratio
err := ValidateIntelligentRoutingConfig(cfg)
require.Error(t, err)
require.Contains(t, err.Error(), "min_savings_ratio")
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add coverage for the min_savings_ratio: 0 boundary.

RatioBounds only tests out-of-range values. Given the defaults-overwrite behavior flagged in intelligent_routing.go, a test asserting that an explicit 0 survives validation (and is honored) would lock in the documented "always cheapest" contract.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@config/intelligent_routing_test.go` around lines 99 - 111, Add coverage in
TestValidateIntelligentRoutingConfig_RatioBounds for the explicit zero boundary
on IntelligentRoutingConfig.Defaults.MinSavingsRatio: create a case with ratio 0
alongside the existing out-of-range values, then assert
ValidateIntelligentRoutingConfig accepts it and does not overwrite it. Use the
existing ValidateIntelligentRoutingConfig helper and
IntelligentRoutingConfig/Defaults.MinSavingsRatio symbols so the test locks in
the “always cheapest” behavior.

Comment on lines +238 to +254
func applyIntelligentRoutingDefaults(d *IntelligentDefaults) {
if strings.TrimSpace(d.Strategy) == "" {
d.Strategy = IntelligentStrategyBalanced
}
if d.MaxAnalysisTokens == 0 {
d.MaxAnalysisTokens = 256
}
if d.Timeout == 0 {
d.Timeout = 1500 * time.Millisecond
}
if d.MinSavingsRatio == 0 {
d.MinSavingsRatio = 0.15
}
if d.MinConfidence == 0 {
d.MinConfidence = 0.7
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

min_savings_ratio: 0 (and min_confidence: 0) are silently overwritten with defaults, breaking a documented use case.

applyIntelligentRoutingDefaults cannot distinguish an explicit 0 from an unset zero value, so it forces MinSavingsRatio to 0.15 and MinConfidence to 0.7. The validation gate at Lines 186-190 already accepts 0 (< 0 only fails), and docs/features/intelligent-routing.mdx (Line 129) explicitly documents min_savings_ratio: 0 as "always prefer the cheapest model." With this logic that configuration is unreachable.

Consider gating these defaults so they only apply when the parent config wasn't user-populated, or use sentinel/pointer fields. A pragmatic fix is to apply the float defaults only before user values are merged (i.e. seed via DefaultIntelligentRoutingConfig and drop the == 0 reset here), keeping validation as the bound check.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@config/intelligent_routing.go` around lines 238 - 254,
`applyIntelligentRoutingDefaults` is overwriting explicitly configured zero
values for `MinSavingsRatio` and `MinConfidence`, making the documented `0`
behavior unreachable. Update the defaulting logic in
`applyIntelligentRoutingDefaults` so it only seeds these fields when the config
is truly unset, not when a user intentionally set `0`; use the existing
`DefaultIntelligentRoutingConfig`/merge flow or sentinel/pointer-based presence
tracking to distinguish unset from explicit values, while keeping the validation
in `IntelligentDefaults` unchanged.

Comment on lines +69 to +83
switch strategy {
case StrategyCost:
return 10*costScore + 0.1*float64(quality) + capabilityBonus(model, class)
case StrategyQuality:
base := float64(quality)
if class.RequiresReasoning || class.QualitySensitivity == "high" {
base *= 1.5
}
return base + capabilityBonus(model, class)
case StrategyLatency:
// Prefer cheaper tiers as a latency proxy and small-context models.
return 5*costScore + float64(4-quality) + capabilityBonus(model, class)
default: // balanced
return 4*costScore + float64(quality) + capabilityBonus(model, class)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Balanced scoring ignores low-confidence classifications.

scoreCandidate never reads class.Confidence, so the balanced path still maximizes on cost even when the analyzer is unsure. That breaks the downstream contract in internal/intelligentrouter/selector_test.go:95-107, which expects low confidence to avoid the cheapest model. Fold confidence into the balanced weighting before cost dominates, or that selector path will keep recommending mini for uncertain requests.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/intelligentrouter/scorer.go` around lines 69 - 83, Balanced scoring
in scoreCandidate currently ignores class.Confidence, so uncertain
classifications still over-prioritize cheap models. Update the default balanced
branch in internal/intelligentrouter/scorer.go to incorporate Confidence into
the weighting before cost dominates, using the existing scoreCandidate switch
and class fields like Confidence, so low-confidence requests are less likely to
select the cheapest model and stay aligned with selector expectations.

Comment on lines +51 to +54
minSavings := cfg.MinSavingsRatio
if minSavings <= 0 {
minSavings = 0.15
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
rg -nP '\b(minSavings|MinSavingsRatio|MinSavings)\b' --type=go -C3

Repository: ENTERPILOT/GoModel

Length of output: 156


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate the target file and inspect relevant surrounding code.
git ls-files | rg '^internal/intelligentrouter/selector\.go$|^internal/intelligentrouter/'

printf '\n--- selector.go outline ---\n'
ast-grep outline internal/intelligentrouter/selector.go --view expanded || true

printf '\n--- selector.go relevant lines ---\n'
nl -ba internal/intelligentrouter/selector.go | sed -n '1,260p'

printf '\n--- search for savings-related usage in intelligentrouter ---\n'
rg -n "MinSavingsRatio|minSavings|savings ratio|savings" internal/intelligentrouter -C 3 || true

Repository: ENTERPILOT/GoModel

Length of output: 2793


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect selector.go in smaller slices if it is long.
wc -l internal/intelligentrouter/selector.go
nl -ba internal/intelligentrouter/selector.go | sed -n '1,140p'
printf '\n---\n'
nl -ba internal/intelligentrouter/selector.go | sed -n '141,280p'

Repository: ENTERPILOT/GoModel

Length of output: 239


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '%s\n' '--- exact references across repo ---'
rg -n "MinSavingsRatio|minSavings" . -g '!**/vendor/**' || true

printf '\n%s\n' '--- selector.go with line numbers via cat -n ---'
sed -n '1,320p' internal/intelligentrouter/selector.go | cat -n

printf '\n%s\n' '--- scorer.go outline ---'
ast-grep outline internal/intelligentrouter/scorer.go --view expanded || true

printf '\n%s\n' '--- scorer.go relevant lines ---'
sed -n '1,260p' internal/intelligentrouter/scorer.go | cat -n

Repository: ENTERPILOT/GoModel

Length of output: 19745


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '%s\n' '--- selector.go (full, numbered) ---'
sed -n '1,320p' internal/intelligentrouter/selector.go | cat -n

printf '\n%s\n' '--- scorer.go (full, numbered) ---'
sed -n '1,260p' internal/intelligentrouter/scorer.go | cat -n

Repository: ENTERPILOT/GoModel

Length of output: 17631


Apply MinSavingsRatio before rewriting. minSavings is only defaulted and stored in NewSelector; Evaluate, choose, and RankCandidates never use it, so enforce mode can still switch models even when projected savings are below the configured threshold. Gate the rewrite on the savings ratio, or remove the setting if it’s intentionally unused.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/intelligentrouter/selector.go` around lines 51 - 54,
`MinSavingsRatio` is only initialized in `NewSelector` and not enforced during
routing, so the selector can still rewrite models even when savings are below
the configured threshold. Update `Evaluate`, `choose`, or `RankCandidates` in
`selector.go` to check the computed savings ratio against `minSavings` before
allowing a rewrite, and keep the fallback default only if the setting is meant
to remain configurable; otherwise remove the unused field entirely.

Comment on lines +190 to +195
func (s *Selector) fallbackSelector(requested core.RequestedModelSelector, meta SelectionMeta) core.ModelSelector {
if fb := parseFallback(s.fallback); fb.Model != "" {
return fb
}
return requestedSelector(requested)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Unused meta parameter in fallbackSelector.

meta SelectionMeta is never read; both call sites pass it (one as SelectionMeta{}). Drop it to simplify the signature.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/intelligentrouter/selector.go` around lines 190 - 195, The meta
SelectionMeta parameter in Selector.fallbackSelector is unused, so simplify the
API by removing it from the method signature and updating all call sites that
pass it, including the one using SelectionMeta{}. Keep the behavior the same by
still preferring parseFallback(s.fallback) and otherwise returning
requestedSelector(requested).

Comment on lines +272 to +274
if d.Confidence > 0 && d.Confidence < 0.7 {
observability.IntelligentRoutingLowConfidenceTotal.WithLabelValues(d.Mode, d.Strategy).Inc()
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Low-confidence metric uses a hardcoded 0.7 instead of the configured minConf.

choose keys its low-confidence behavior off the configurable s.minConf, but logDecision records IntelligentRoutingLowConfidenceTotal using a literal 0.7. When MinConfidence is overridden, the metric will diverge from the actual decision threshold. Since logDecision is a package function without access to the selector, thread the threshold through the Decision (e.g. a MinConfidence field set in Evaluate).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/intelligentrouter/selector.go` around lines 272 - 274, The
low-confidence metric in logDecision is still using a hardcoded 0.7, so it can
drift from the selector’s configured threshold. Add the configured minimum
confidence to Decision in Evaluate, then update choose/logDecision to use that
Decision field instead of the literal when incrementing
IntelligentRoutingLowConfidenceTotal. Use the existing choose, Evaluate,
logDecision, and Decision symbols to thread the threshold through cleanly.

Comment on lines +58 to 60
if !intelligent && len(vm.Targets) > 1 {
vm.Targets = vm.Targets[:1]
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Dead truncation branch — unreachable given the line 34 guard.

For non-intelligent strategies, line 34 already rejects any redirect where len(vm.Targets) > 1 (multi-target "not yet supported"). So by the time execution reaches line 58, a non-intelligent VM always has len(vm.Targets) <= 1, making !intelligent && len(vm.Targets) > 1 impossible. This contradicts the apparent intent (silently truncating extras) and may mislead future maintainers about the real contract.

♻️ Remove the unreachable branch
-	if !intelligent && len(vm.Targets) > 1 {
-		vm.Targets = vm.Targets[:1]
-	}

Based on coding guidelines: "Keep the implementation explicit and maintainable rather than relying on clever abstractions" and follow KISS.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if !intelligent && len(vm.Targets) > 1 {
vm.Targets = vm.Targets[:1]
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/virtualmodels/validation.go` around lines 58 - 60, The truncation
branch in the validation logic is unreachable because the earlier
non-intelligent redirect guard already rejects multi-target VMs, so remove the
dead `!intelligent && len(vm.Targets) > 1` path from `validateVirtualModel` in
`validation.go`. Keep the behavior explicit by relying on the existing
guard/contract around `vm.Targets` instead of silently trimming targets, and
ensure the remaining validation flow in `validateVirtualModel` clearly reflects
the supported strategy rules.

Source: Coding guidelines

…bled

newIntelligentRouterFromConfig returns a nil *intelligentrouter.Selector when
the feature is inactive (the default). Assigning that typed nil directly to
the gateway.IntelligentRouter interface field produced a non-nil interface
wrapping a nil pointer, so the orchestrator's `== nil` guard was skipped and
ShouldEvaluate panicked on every chat request, surfacing as HTTP 500.

Only assign the router to the interface when the concrete pointer is non-nil,
keeping the field at a true nil interface otherwise. Add a reflection-based
typed-nil check in evaluateIntelligentRouter as defense in depth against
future regressions.

Reproduced and verified against the full integration suite.

@SantiagoDePolonia SantiagoDePolonia left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General thoughts

I definitely want to add this kind of feature. Also, the most probable scenario is that this kind of feature will be in paid version of GoModel (because I need to pay my bills).

The biggest problem I have with this kind of huge external PRs is that they are not composed into the vision - which I still mainly have in my head so far. Also, reviewing 2,500 lines of code line-by-line is impossible with my limited capacity.

The solutions I see are:

  • working full-time with someone, so we can be aligned on the same vision
  • agreeing on the specification before implementing huge changes like this
  • or just better, more adjusted to the vision PRs - but it might be hard

Considering that I don't have money to hire anyone full-time right now - or even to pay myself - only the 2nd and 3rd option is valid.

The problem with the second option is that sharing those specifications is very often... more time-consuming than developing and testing the changes.

3rd option is still valid


Detailed feedback

I see those changes as kind of cumbersome. There are a lot of env variables / config.yaml properties, and they are not easy to understand what those change.

Also, those changes are not in line with the recently improved virtual models, which actually allow creating aliases like auto, smart, etc. There shouldn't be another mechanism for this. In this case:

selectors:
  - name: "auto"
    strategy: "balanced"
  - name: "smart"
    strategy: "balanced"
  - name: "auto-cost"
    strategy: "cost"
  - name: "auto-quality"
    strategy: "quality"

shouldn't be there, because we already have virtual models for that.

Also, as far as I understand, this PR also:

  • misses the UI configuration for this feature
  • doesn't address the prompt caching persistence issue, especially for the /responses endpoint

Also, I can't guarantee that I'll merge it after the adjustments described here. If you want to give it a try - go ahead.

Enabled bool `yaml:"enabled" env:"INTELLIGENT_ROUTING_ENABLED"`

// Mode selects how decisions are applied:
// - off: no analysis; intelligent selectors behave as unknown models

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we have Enabled flag, do we need to have 'off' mode here? I think observe and enforce might be enough. With default to observe.

Analyzers []AnalyzerModelConfig `yaml:"analyzers"`

// Defaults holds resolved defaults shared by every selector.
Defaults IntelligentDefaults `yaml:"defaults"`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we need this one - another type/structure to hold defaults?

Comment thread .env.template
# See docs/dev/intelligent-model.md.
# ----------------------------------------------------------------------------
# INTELLIGENT_ROUTING_ENABLED=false
# INTELLIGENT_ROUTING_MODE=off # off | observe | enforce

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, IIUC:
INTELLIGENT_ROUTING_ENABLED=false AND INTELLIGENT_ROUTING_MODE=off ===

INTELLIGENT_ROUTING_ENABLED=true AND INTELLIGENT_ROUTING_MODE=off

What's the reason for mode=off ?

…nd gradual context penalty

Refine the candidate ranking so the cost and context dimensions contribute
signal only when they actually discriminate between models, instead of
flooding the weighted sum with neutral noise.

- Cost abstention: when every candidate shares the same estimated cost (the
  common case for gateways whose model registry has no pricing), the cost
  dimension contributes nothing, letting quality and capability decide. The
  cost score is now computed once across all candidates in RankCandidates and
  passed into scoreCandidate.
- Free-model tier: models tagged "free", "local", or "self-hosted" (local
  Ollama/vLLM) now win the cost dimension at 1.0, while paid models are capped
  at 0.5 when a free model is present, preserving a proportional, visible gap.
- Gradual context penalty: requests that approach a model's context window
  receive a proportional penalty instead of a binary exclude. Requests above
  80% of the window decay from 1.0 toward 0.10; requests that exceed it are
  still hard-excluded. The request size is now estimated from message text and
  passed through the Candidate.ContextScore field.

All existing tests pass. New table-driven tests cover cost abstention, the
free-model advantage, the risk-zone penalty, the hard exclude, and unknown
windows.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/intelligentrouter/catalog_test.go`:
- Around line 148-172: Update the RankCandidates cost-dimension coverage in
TestRankCandidates_FreeModelWinsOnCostDimension so it also exercises a
free/local model with no Pricing alongside other candidates that likewise omit
Pricing. Use BuildCandidates and RankCandidates to verify the free-tag signal
still wins instead of cost abstention hiding the behavior, and keep the existing
free/paid case as a separate check if needed.

In `@internal/intelligentrouter/scorer.go`:
- Around line 191-205: The cost-scoring logic in the scorer path is returning
too early when minCost/maxCost are flat, which prevents explicit free-tier
signals from being applied. In the cost evaluation around the minCost/maxCost
spread check, make sure free/local/self-hosted tags are converted into a
non-zero cost signal before the abstention branch can return zeroed scores.
Adjust the logic so the early return only happens after tag-based cost inference
has been considered, keeping the weighted sum able to prefer tagged free models
even when pricing data is missing.

In `@internal/intelligentrouter/selector.go`:
- Around line 231-238: The estimateRequestChars helper is counting UTF-8 bytes
instead of characters, which can overestimate non-ASCII prompts and affect
routing. Update the loop in estimateRequestChars to count runes for each
message’s extracted text content rather than using len on the string, and keep
the nil req handling unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7c01fa73-f17a-457b-87e2-19e9f5fb105a

📥 Commits

Reviewing files that changed from the base of the PR and between 140e94c and c601282.

📒 Files selected for processing (4)
  • internal/intelligentrouter/catalog.go
  • internal/intelligentrouter/catalog_test.go
  • internal/intelligentrouter/scorer.go
  • internal/intelligentrouter/selector.go

Comment on lines +148 to +172
// TestRankCandidates_FreeModelWinsOnCostDimension verifies Item 2: a model
// tagged "free" beats paid models on the cost dimension, and paid models are
// capped proportionally so the gap stays visible.
func TestRankCandidates_FreeModelWinsOnCostDimension(t *testing.T) {
paid := 1.0
free := 0.0
cat := fakeCatalog{
models: []core.Model{
{ID: "free-local", Metadata: &core.ModelMetadata{
Modes: []string{"chat"}, Tags: []string{"free", "local"},
Pricing: &core.ModelPricing{InputPerMtok: &free, OutputPerMtok: &free},
}},
{ID: "cheap-paid", Metadata: &core.ModelMetadata{
Modes: []string{"chat"}, Tags: []string{"mini"},
Pricing: &core.ModelPricing{InputPerMtok: &paid, OutputPerMtok: &paid},
}},
},
provider: map[string]string{"free-local": "ollama", "cheap-paid": "openai"},
}

cands := BuildCandidates(cat, CandidateFilter{}, nil, Classification{}, 0)
ranked := RankCandidates(cands, nil, StrategyCost, Classification{})
require.NotEmpty(t, ranked)
require.Equal(t, "free-local", ranked[0].Candidate.Selector.Model)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Add coverage for free-tagged models without pricing.

This test only proves the free path when paid pricing is present. Add a case where "free"/"local" has no Pricing and other candidates also lack pricing, so cost abstention cannot silently bypass the free-tier signal. As per coding guidelines, **/*_test.go: “Add or update tests for behavior changes.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/intelligentrouter/catalog_test.go` around lines 148 - 172, Update
the RankCandidates cost-dimension coverage in
TestRankCandidates_FreeModelWinsOnCostDimension so it also exercises a
free/local model with no Pricing alongside other candidates that likewise omit
Pricing. Use BuildCandidates and RankCandidates to verify the free-tag signal
still wins instead of cost abstention hiding the behavior, and keep the existing
free/paid case as a separate check if needed.

Source: Coding guidelines

Comment thread internal/intelligentrouter/scorer.go
Comment on lines +231 to +238
func estimateRequestChars(req *core.ChatRequest) int {
if req == nil {
return 0
}
total := 0
for _, msg := range req.Messages {
total += len(core.ExtractTextContent(msg.Content))
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify the estimator no longer uses byte length for extracted chat text.
rg -n 'estimateRequestChars|len\(core\.ExtractTextContent|RuneCountInString' internal/intelligentrouter internal/core

Repository: ENTERPILOT/GoModel

Length of output: 827


🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '210,250p' internal/intelligentrouter/selector.go
printf '\n---\n'
sed -n '130,160p' internal/intelligentrouter/selector.go

Repository: ENTERPILOT/GoModel

Length of output: 2560


Count runes instead of bytes in estimateRequestChars. UTF-8 text is currently overcounted, which can skew routing toward larger models for non-ASCII prompts.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/intelligentrouter/selector.go` around lines 231 - 238, The
estimateRequestChars helper is counting UTF-8 bytes instead of characters, which
can overestimate non-ASCII prompts and affect routing. Update the loop in
estimateRequestChars to count runes for each message’s extracted text content
rather than using len on the string, and keep the nil req handling unchanged.

Allow operators to attach routing_guidance to model metadata so the analyzer
prompt can see explicit hints about when a model should be preferred. The
classifier now builds its prompt from a first-pass candidate pool so guided
models are visible before classification runs, while the final candidate list
is still rebuilt afterward to apply capability-derived filters.

Improve analyzer robustness by giving the same analyzer one repair attempt when
its first response is not valid JSON. The repair call reuses the system prompt
but asks only for the compact JSON object, without re-sending the user request.
If repair succeeds, the router keeps the same analyzer; otherwise failover
continues to the next analyzer as before.

Also document routing_guidance in the intelligent routing feature guide.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/intelligentrouter/classifier_test.go (1)

23-49: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Assert that the repair call replays the original analyzer prompt.

These tests only prove a second call happened. Because the fake executor ignores the request body, they still pass when the repair path drops the original prompt and returns arbitrary JSON. Capture the second ChatRequest and assert it still contains the request summary. Based on learnings: Add or update tests for behavior changes, with coverage for error handling.

Also applies to: 156-193

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/intelligentrouter/classifier_test.go` around lines 23 - 49, The
repair-path test is only verifying that a second call occurs, but not that the
original analyzer prompt is preserved. Update the fake executor in
fakeExecutor.ChatCompletion (or the specific test harness using it) to capture
the full second ChatRequest, then add an assertion that the retry/repair request
still includes the original request summary or analyzer prompt content. Use the
existing ChatCompletion, ChatRequest, and repair-call test setup to ensure the
second request body is validated, not just the model call count.

Source: Learnings

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/intelligentrouter/classifier.go`:
- Around line 141-160: The repair path in Classifier.repairClassification is
missing the original request context, so it can return valid JSON for the wrong
intent. Update the repair call in Classifier.repairClassification to include the
original request prompt/summary from the classification attempt alongside the
JSON-only instruction, using the existing callCtx/analyzer flow so the model
repairs formatting without changing the underlying classification target.

---

Outside diff comments:
In `@internal/intelligentrouter/classifier_test.go`:
- Around line 23-49: The repair-path test is only verifying that a second call
occurs, but not that the original analyzer prompt is preserved. Update the fake
executor in fakeExecutor.ChatCompletion (or the specific test harness using it)
to capture the full second ChatRequest, then add an assertion that the
retry/repair request still includes the original request summary or analyzer
prompt content. Use the existing ChatCompletion, ChatRequest, and repair-call
test setup to ensure the second request body is validated, not just the model
call count.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 68850f2e-9b68-43be-b119-ce8dbb544459

📥 Commits

Reviewing files that changed from the base of the PR and between c601282 and 995a4c8.

📒 Files selected for processing (6)
  • docs/features/intelligent-routing.mdx
  • internal/core/types.go
  • internal/intelligentrouter/classifier.go
  • internal/intelligentrouter/classifier_test.go
  • internal/intelligentrouter/prompt.go
  • internal/intelligentrouter/selector.go

Comment on lines +141 to +160
repaired, repairErr := c.repairClassification(callCtx, analyzer, temperature, maxTokens)
if repairErr == nil {
slog.Info("intelligent router analyzer repair succeeded",
"analyzer", analyzer.QualifiedModel(),
)
return repaired, analyzer, nil
}
return Classification{}, analyzer, fmt.Errorf("parse analyzer response: %w (repair failed: %v)", err, repairErr)
}

func (c *Classifier) repairClassification(ctx context.Context, analyzer core.ModelSelector, temperature float64, maxTokens int) (Classification, error) {
resp, err := c.executor.ChatCompletion(ctx, &core.ChatRequest{
Model: analyzer.Model,
Provider: analyzer.Provider,
Temperature: &temperature,
MaxTokens: &maxTokens,
Messages: []core.Message{
{Role: "system", Content: analyzerSystemPrompt},
{Role: "user", Content: "Your previous response was invalid JSON. Return ONLY the compact JSON object described in the system prompt."},
},

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Pass the original request prompt into the repair call.

Line 159 asks the analyzer to resend valid JSON, but repairClassification no longer includes the original request summary. A syntactically valid repair can therefore succeed with an unrelated classification and misroute the request instead of just recovering from formatting noise.

Suggested fix
-	repaired, repairErr := c.repairClassification(callCtx, analyzer, temperature, maxTokens)
+	repaired, repairErr := c.repairClassification(callCtx, analyzer, prompt, temperature, maxTokens)
 ...
-func (c *Classifier) repairClassification(ctx context.Context, analyzer core.ModelSelector, temperature float64, maxTokens int) (Classification, error) {
+func (c *Classifier) repairClassification(ctx context.Context, analyzer core.ModelSelector, prompt string, temperature float64, maxTokens int) (Classification, error) {
 	resp, err := c.executor.ChatCompletion(ctx, &core.ChatRequest{
 		Model:       analyzer.Model,
 		Provider:    analyzer.Provider,
 		Temperature: &temperature,
 		MaxTokens:   &maxTokens,
 		Messages: []core.Message{
 			{Role: "system", Content: analyzerSystemPrompt},
-			{Role: "user", Content: "Your previous response was invalid JSON. Return ONLY the compact JSON object described in the system prompt."},
+			{Role: "user", Content: prompt + "\n\nYour previous response was invalid JSON. Return ONLY the compact JSON object described in the system prompt."},
 		},
 	})
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
repaired, repairErr := c.repairClassification(callCtx, analyzer, temperature, maxTokens)
if repairErr == nil {
slog.Info("intelligent router analyzer repair succeeded",
"analyzer", analyzer.QualifiedModel(),
)
return repaired, analyzer, nil
}
return Classification{}, analyzer, fmt.Errorf("parse analyzer response: %w (repair failed: %v)", err, repairErr)
}
func (c *Classifier) repairClassification(ctx context.Context, analyzer core.ModelSelector, temperature float64, maxTokens int) (Classification, error) {
resp, err := c.executor.ChatCompletion(ctx, &core.ChatRequest{
Model: analyzer.Model,
Provider: analyzer.Provider,
Temperature: &temperature,
MaxTokens: &maxTokens,
Messages: []core.Message{
{Role: "system", Content: analyzerSystemPrompt},
{Role: "user", Content: "Your previous response was invalid JSON. Return ONLY the compact JSON object described in the system prompt."},
},
repaired, repairErr := c.repairClassification(callCtx, analyzer, prompt, temperature, maxTokens)
if repairErr == nil {
slog.Info("intelligent router analyzer repair succeeded",
"analyzer", analyzer.QualifiedModel(),
)
return repaired, analyzer, nil
}
return Classification{}, analyzer, fmt.Errorf("parse analyzer response: %w (repair failed: %v)", err, repairErr)
}
func (c *Classifier) repairClassification(ctx context.Context, analyzer core.ModelSelector, prompt string, temperature float64, maxTokens int) (Classification, error) {
resp, err := c.executor.ChatCompletion(ctx, &core.ChatRequest{
Model: analyzer.Model,
Provider: analyzer.Provider,
Temperature: &temperature,
MaxTokens: &maxTokens,
Messages: []core.Message{
{Role: "system", Content: analyzerSystemPrompt},
{Role: "user", Content: prompt + "\n\nYour previous response was invalid JSON. Return ONLY the compact JSON object described in the system prompt."},
},
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/intelligentrouter/classifier.go` around lines 141 - 160, The repair
path in Classifier.repairClassification is missing the original request context,
so it can return valid JSON for the wrong intent. Update the repair call in
Classifier.repairClassification to include the original request prompt/summary
from the classification attempt alongside the JSON-only instruction, using the
existing callCtx/analyzer flow so the model repairs formatting without changing
the underlying classification target.

Add optional conversation-aware memory to the intelligent router so repeated
requests in the same conversation can keep model choice more consistent across
multi-turn sessions.

Clients can now send X-GoModel-Conversation-ID on translated chat/responses
routes. The transport metadata is propagated into the intelligent routing
selection meta, where the selector reads a short in-memory history keyed by
(user_path, conversation_id) and includes it in the analyzer prompt as
"Previous routing decisions".

When intelligent routing applies a concrete model in enforce mode, the router
records the applied qualified model back into the same in-memory store. Entries
expire automatically after one hour and each conversation is capped to the most
recent 50 routing decisions.

Also document the optional header in the intelligent routing feature guide and
add unit coverage for header extraction, history prompt rendering, memory
expiry/limits, and applied-model recording.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/intelligentrouter/selector.go (1)

136-147: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Record the applied fallback before this early return.

When classification fails in enforce mode, decision.applyForMode(...) can still produce an applied fallback, but this branch returns before addRoutingDecision(...) runs. That drops the applied model from conversation memory and breaks the new continuity contract on analyzer outages.

Proposed fix
 	if err != nil {
 		decision.AnalysisFailed = true
 		decision.Analyzers = s.classifier.Analyzers()
 		decision.Duration = time.Since(start)
 		decision.SelectedModel = s.fallbackSelector(requested, meta)
 		decision.Reason = "analysis failed: " + err.Error()
 		// In enforce, still apply the fallback so analysis failure never blocks the
 		// user's request; in observe, execute the requested model unchanged.
 		decision.applyForMode(s.mode, requested)
+		if decision.Applied && decision.AppliedModel.Model != "" && strings.TrimSpace(meta.ConversationID) != "" {
+			addRoutingDecision(meta.UserPath, meta.ConversationID, decision.AppliedModel.QualifiedModel())
+		}
 		logDecision(decision)
 		return decision
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/intelligentrouter/selector.go` around lines 136 - 147, Record the
applied fallback before returning from the classification failure branch in
selector.go. In the ClassifyWithCandidates error path inside the routing logic,
keep the existing decision.applyForMode(s.mode, requested) call, but also
persist the resulting applied model by invoking addRoutingDecision with the same
request/decision state before logDecision and return. This ensures the fallback
chosen by s.fallbackSelector and applyForMode is captured in conversation memory
even when analysis fails.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/features/intelligent-routing.mdx`:
- Around line 278-282: Clarify in the intelligent-routing docs that GoModel’s
in-memory history is only written when routing actually applies a model, not for
every request with X-GoModel-Conversation-ID. Update the description around the
short-lived per-(user_path, conversation_id) history and “Previous routing
decisions” to note that observe-only deployments remain stateless and do not
accumulate history unless a model is applied.

In `@internal/gateway/inference_prepare.go`:
- Around line 97-99: The routing update in evaluateIntelligentRouter is being
overwritten because ensureTranslatedRequestWorkflow still receives the original
meta.Workflow, so the stale resolved selector can replace the new model/provider
choice. Update inference_prepare.go so the workflow passed into
ensureTranslatedRequestWorkflow is dropped or rebuilt when
requested.Model/requested.ProviderHint change after evaluateIntelligentRouter,
and make sure ApplyResolvedSelector uses the routed selector instead of the
preexisting one. Add a regression test covering the existing workflow plus
enforced reroute path to verify the routed choice is preserved.

In `@internal/intelligentrouter/conversation_memory.go`:
- Around line 27-33: The routing memory key in routingMemoryKey is built by
concatenating userPath and conversationID with a delimiter, which can collide
when either value contains "::". Replace this with a structured key approach,
such as a small composite struct or nested map keyed by the two fields, and
update any callers or map usages in conversation_memory.go accordingly so
distinct tuples cannot alias each other.

In `@internal/intelligentrouter/selector_test.go`:
- Around line 109-140: Add a regression test for the Evaluate analysis-failure
fallback path in selector_test.go. The current
TestSelector_RecordsAppliedModelInConversationMemory and
TestSelector_DoesNotRecordHistoryInObserveMode only cover the happy enforce flow
and observe mode, so add a test that forces classification/analysis to fail in
ModeEnforce and verifies the fallback still records the applied model in routing
memory. Use the existing Evaluate, getRoutingHistory, defaultRoutingMemory, and
fakeExecutor symbols to locate and exercise the behavior.

---

Outside diff comments:
In `@internal/intelligentrouter/selector.go`:
- Around line 136-147: Record the applied fallback before returning from the
classification failure branch in selector.go. In the ClassifyWithCandidates
error path inside the routing logic, keep the existing
decision.applyForMode(s.mode, requested) call, but also persist the resulting
applied model by invoking addRoutingDecision with the same request/decision
state before logDecision and return. This ensures the fallback chosen by
s.fallbackSelector and applyForMode is captured in conversation memory even when
analysis fails.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: fff161ad-82ab-44c9-9c87-e75dc0198284

📥 Commits

Reviewing files that changed from the base of the PR and between 995a4c8 and 6901217.

📒 Files selected for processing (13)
  • docs/features/intelligent-routing.mdx
  • internal/gateway/inference_intelligent.go
  • internal/gateway/inference_orchestrator.go
  • internal/gateway/inference_prepare.go
  • internal/intelligentrouter/classifier_test.go
  • internal/intelligentrouter/conversation_memory.go
  • internal/intelligentrouter/conversation_memory_test.go
  • internal/intelligentrouter/selector.go
  • internal/intelligentrouter/selector_test.go
  • internal/intelligentrouter/types.go
  • internal/server/request_support.go
  • internal/server/request_support_test.go
  • internal/server/translated_inference_service.go

Comment thread docs/features/intelligent-routing.mdx
Comment on lines +97 to +99
requested = o.evaluateIntelligentRouter(ctx, req, requested, meta)
*model = requested.Model
*provider = requested.ProviderHint

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Drop the stale workflow when routing rewrites the selector.

Lines 97-99 update model and provider, but Line 100 still passes the original meta.Workflow into ensureTranslatedRequestWorkflow. If that workflow already has a resolution, the old selector is reused and ApplyResolvedSelector overwrites the routed choice, so intelligent routing never takes effect on that path.

Proposed fix
 	requested := core.NewRequestedModelSelector(*model, *provider)
 	requested = o.evaluateIntelligentRouter(ctx, req, requested, meta)
 	*model = requested.Model
 	*provider = requested.ProviderHint
+	if meta.Workflow != nil && meta.Workflow.Resolution != nil {
+		current := meta.Workflow.Resolution.ResolvedSelector
+		if current.Model != requested.Model || current.Provider != requested.ProviderHint {
+			meta.Workflow = nil
+		}
+	}
 	workflow, err := o.ensureTranslatedRequestWorkflow(ctx, meta.Workflow, meta.RequestID, meta.Endpoint, model, provider)

Please add a regression test for the "existing workflow + enforced reroute" case as well.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
requested = o.evaluateIntelligentRouter(ctx, req, requested, meta)
*model = requested.Model
*provider = requested.ProviderHint
requested = o.evaluateIntelligentRouter(ctx, req, requested, meta)
*model = requested.Model
*provider = requested.ProviderHint
if meta.Workflow != nil && meta.Workflow.Resolution != nil {
current := meta.Workflow.Resolution.ResolvedSelector
if current.Model != requested.Model || current.Provider != requested.ProviderHint {
meta.Workflow = nil
}
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/gateway/inference_prepare.go` around lines 97 - 99, The routing
update in evaluateIntelligentRouter is being overwritten because
ensureTranslatedRequestWorkflow still receives the original meta.Workflow, so
the stale resolved selector can replace the new model/provider choice. Update
inference_prepare.go so the workflow passed into ensureTranslatedRequestWorkflow
is dropped or rebuilt when requested.Model/requested.ProviderHint change after
evaluateIntelligentRouter, and make sure ApplyResolvedSelector uses the routed
selector instead of the preexisting one. Add a regression test covering the
existing workflow plus enforced reroute path to verify the routed choice is
preserved.

Comment on lines +27 to +33
func routingMemoryKey(userPath, conversationID string) string {
userPath = strings.TrimSpace(userPath)
conversationID = strings.TrimSpace(conversationID)
if conversationID == "" {
return ""
}
return userPath + "::" + conversationID

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Use a structured key instead of delimiter concatenation.

userPath + "::" + conversationID aliases distinct tuples whenever either component contains ::. That can merge unrelated routing histories and feed the wrong “previous decisions” back into the analyzer. A struct key (or nested map) avoids that state-corruption edge case entirely.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/intelligentrouter/conversation_memory.go` around lines 27 - 33, The
routing memory key in routingMemoryKey is built by concatenating userPath and
conversationID with a delimiter, which can collide when either value contains
"::". Replace this with a structured key approach, such as a small composite
struct or nested map keyed by the two fields, and update any callers or map
usages in conversation_memory.go accordingly so distinct tuples cannot alias
each other.

Comment on lines +109 to +140
func TestSelector_RecordsAppliedModelInConversationMemory(t *testing.T) {
defaultRoutingMemory = &routingMemoryStore{data: make(map[string][]routingMemoryEntry)}
t.Cleanup(func() {
defaultRoutingMemory = &routingMemoryStore{data: make(map[string][]routingMemoryEntry)}
})

exec := &fakeExecutor{responses: map[string]string{"a-mini": validClassification}}
s := newTestSelector(t, ModeEnforce, exec)
req := &core.ChatRequest{Messages: []core.Message{{Role: "user", Content: "hi"}}}
requested := core.NewRequestedModelSelector("auto", "")
d := s.Evaluate(context.Background(), req, requested, SelectionMeta{Mode: ModeEnforce, UserPath: "/team", ConversationID: "conv-1"})

require.True(t, d.Applied)
history := getRoutingHistory("/team", "conv-1", 5)
require.Equal(t, []string{d.AppliedModel.QualifiedModel()}, history)
}

func TestSelector_DoesNotRecordHistoryInObserveMode(t *testing.T) {
defaultRoutingMemory = &routingMemoryStore{data: make(map[string][]routingMemoryEntry)}
t.Cleanup(func() {
defaultRoutingMemory = &routingMemoryStore{data: make(map[string][]routingMemoryEntry)}
})

exec := &fakeExecutor{responses: map[string]string{"a-mini": validClassification}}
s := newTestSelector(t, ModeObserve, exec)
req := &core.ChatRequest{Messages: []core.Message{{Role: "user", Content: "hi"}}}
requested := core.NewRequestedModelSelector("auto", "")
d := s.Evaluate(context.Background(), req, requested, SelectionMeta{Mode: ModeObserve, UserPath: "/team", ConversationID: "conv-1"})

require.False(t, d.Applied)
require.Nil(t, getRoutingHistory("/team", "conv-1", 5))
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add a regression test for enforce-mode fallback recording.

These cases only cover successful enforce and observe. The analysis-failure branch in Evaluate can still apply a fallback and return before writing to conversation memory, so this behavior is currently unprotected by tests.

As per coding guidelines, **/*_test.go: Add or update tests for behavior changes, and ensure tests cover request translation, response normalization, error handling, default configuration, and provider-specific parameter mapping.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/intelligentrouter/selector_test.go` around lines 109 - 140, Add a
regression test for the Evaluate analysis-failure fallback path in
selector_test.go. The current
TestSelector_RecordsAppliedModelInConversationMemory and
TestSelector_DoesNotRecordHistoryInObserveMode only cover the happy enforce flow
and observe mode, so add a test that forces classification/analysis to fail in
ModeEnforce and verifies the fallback still records the applied model in routing
memory. Use the existing Evaluate, getRoutingHistory, defaultRoutingMemory, and
fakeExecutor symbols to locate and exercise the behavior.

Source: Coding guidelines

vfeitoza added 2 commits June 27, 2026 22:29
Track the outcome of every provider call in a lightweight in-memory health
tracker (exponential decay, configurable window/half-life, Bayesian smoothing).
Candidates whose recent weighted error rate crosses the circuit-breaker threshold
are hard-excluded from the intelligent-router candidate pool; others receive a
proportional penalty so a degraded-but-alive model still competes at a lower
score.

Health outcomes are recorded at two points in the gateway:
- executeWithUsage: credits the model that actually completed the request, and
  debits the originally-requested model when a fallback took over.
- tryFallbackResponse: debits both the primary and each fallback attempt that
  fails before a winner is found.

The IntelligentRouter interface gains RecordExecution(qualifiedModel, success)
so the gateway can report outcomes without importing the intelligentrouter
package directly.

Config: new optional intelligent_routing.defaults.health block with defaults
  window=20m, half_life=5m, pseudo_counts=2.0, circuit_breaker=0.9.
  All values are configurable via env vars (INTELLIGENT_ROUTING_HEALTH_*).
Expose active intelligent selectors through the models endpoint and use the configured selector set during workflow resolution so custom selectors do not fail early as unsupported models.
# virtual models, unless mode is observe. See docs/dev/intelligent-model.md.
intelligent_routing:
enabled: false
mode: "off" # off | observe | enforce; observe classifies but keeps the requested model

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still 'off' mode here. What is it for?

@vfeitoza

Copy link
Copy Markdown
Collaborator Author

I am still carefully reviewing the entire codebase and adding new ideas to the PR.
I will examine your comments more closely and align the implementation with the existing standards in Gomodel.
I’ve already modified the code to create a virtual model, though I haven't yet considered the dashboard experience.
I’ll take a closer look, and once everything is fully polished, I’ll let you know so you can test it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants