feat(router): add intelligent model routing with auto selectors#1
Open
vfeitoza wants to merge 7 commits into
Open
feat(router): add intelligent model routing with auto selectors#1vfeitoza wants to merge 7 commits into
vfeitoza wants to merge 7 commits into
Conversation
Introduces an intelligent routing layer that classifies incoming chat
requests with a cheap analyzer model and selects the best concrete model
from the provider catalog before normal resolution runs.
Clients send an intelligent selector in the model field instead of a
concrete model name:
- auto / smart balanced cost-quality selection
- auto-cost prefer the cheapest eligible model
- auto-quality prefer the highest-quality eligible model
The router operates in three modes configured via intelligent_routing.mode:
- off intelligent selectors return model_not_found (default)
- observe classifies and logs the recommendation; executes the
original requested model unchanged (dry-run)
- enforce replaces the selector with the recommended model before
provider dispatch
A pool of analyzer models is tried in order with per-call timeout and
automatic failover. When all analyzers fail, requests fall back to
intelligent_routing.fallback_model or return model_not_found if unset.
Analyzer usage is attributed to a dedicated analysis_user_path
(/intelligent-router by default) so classification cost is separated
from application request cost in usage reports.
Key components:
- internal/intelligentrouter: classifier, selector, scorer, catalog,
prompt builder, and type definitions
- config/intelligent_routing.go: configuration struct, defaults,
validation, and env var overrides
- internal/gateway/inference_intelligent.go: orchestrator hook that
rewrites the model selector before resolution
- internal/server/model_validation.go: middleware skip for intelligent
selectors to prevent premature model_not_found errors
- internal/observability/metrics.go: Prometheus counters and histograms
for routing decisions, latency, fallbacks, and low-confidence cases
- internal/virtualmodels: intelligent virtual model support with
strategy-qualified selectors (intelligent:cost, intelligent:quality)
- docs/features/intelligent-routing.mdx: full feature documentation
All existing tests pass. New tests cover config validation, selector
detection, catalog filtering, scoring, and middleware passthrough for
intelligent selectors.
…bled newIntelligentRouterFromConfig returns a nil *intelligentrouter.Selector when the feature is inactive (the default). Assigning that typed nil directly to the gateway.IntelligentRouter interface field produced a non-nil interface wrapping a nil pointer, so the orchestrator's `== nil` guard was skipped and ShouldEvaluate panicked on every chat request, surfacing as HTTP 500. Only assign the router to the interface when the concrete pointer is non-nil, keeping the field at a true nil interface otherwise. Add a reflection-based typed-nil check in evaluateIntelligentRouter as defense in depth against future regressions. Reproduced and verified against the full integration suite.
…nd gradual context penalty Refine the candidate ranking so the cost and context dimensions contribute signal only when they actually discriminate between models, instead of flooding the weighted sum with neutral noise. - Cost abstention: when every candidate shares the same estimated cost (the common case for gateways whose model registry has no pricing), the cost dimension contributes nothing, letting quality and capability decide. The cost score is now computed once across all candidates in RankCandidates and passed into scoreCandidate. - Free-model tier: models tagged "free", "local", or "self-hosted" (local Ollama/vLLM) now win the cost dimension at 1.0, while paid models are capped at 0.5 when a free model is present, preserving a proportional, visible gap. - Gradual context penalty: requests that approach a model's context window receive a proportional penalty instead of a binary exclude. Requests above 80% of the window decay from 1.0 toward 0.10; requests that exceed it are still hard-excluded. The request size is now estimated from message text and passed through the Candidate.ContextScore field. All existing tests pass. New table-driven tests cover cost abstention, the free-model advantage, the risk-zone penalty, the hard exclude, and unknown windows.
Allow operators to attach routing_guidance to model metadata so the analyzer prompt can see explicit hints about when a model should be preferred. The classifier now builds its prompt from a first-pass candidate pool so guided models are visible before classification runs, while the final candidate list is still rebuilt afterward to apply capability-derived filters. Improve analyzer robustness by giving the same analyzer one repair attempt when its first response is not valid JSON. The repair call reuses the system prompt but asks only for the compact JSON object, without re-sending the user request. If repair succeeds, the router keeps the same analyzer; otherwise failover continues to the next analyzer as before. Also document routing_guidance in the intelligent routing feature guide.
Add optional conversation-aware memory to the intelligent router so repeated requests in the same conversation can keep model choice more consistent across multi-turn sessions. Clients can now send X-GoModel-Conversation-ID on translated chat/responses routes. The transport metadata is propagated into the intelligent routing selection meta, where the selector reads a short in-memory history keyed by (user_path, conversation_id) and includes it in the analyzer prompt as "Previous routing decisions". When intelligent routing applies a concrete model in enforce mode, the router records the applied qualified model back into the same in-memory store. Entries expire automatically after one hour and each conversation is capped to the most recent 50 routing decisions. Also document the optional header in the intelligent routing feature guide and add unit coverage for header extraction, history prompt rendering, memory expiry/limits, and applied-model recording.
Track the outcome of every provider call in a lightweight in-memory health tracker (exponential decay, configurable window/half-life, Bayesian smoothing). Candidates whose recent weighted error rate crosses the circuit-breaker threshold are hard-excluded from the intelligent-router candidate pool; others receive a proportional penalty so a degraded-but-alive model still competes at a lower score. Health outcomes are recorded at two points in the gateway: - executeWithUsage: credits the model that actually completed the request, and debits the originally-requested model when a fallback took over. - tryFallbackResponse: debits both the primary and each fallback attempt that fails before a winner is found. The IntelligentRouter interface gains RecordExecution(qualifiedModel, success) so the gateway can report outcomes without importing the intelligentrouter package directly. Config: new optional intelligent_routing.defaults.health block with defaults window=20m, half_life=5m, pseudo_counts=2.0, circuit_breaker=0.9. All values are configurable via env vars (INTELLIGENT_ROUTING_HEALTH_*).
Expose active intelligent selectors through the models endpoint and use the configured selector set during workflow resolution so custom selectors do not fail early as unsupported models.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
auto,smart,auto-cost,auto-quality) in themodelfield instead of concrete model names.off(default),observe(dry-run, logs recommendation without changing execution), andenforce(rewrites the selector before dispatch).How it works
"model": "auto"(or another intelligent selector).cost,balanced,quality,latency).enforcemode, the winning model replaces the selector; the request then goes through normal authorization and provider routing.Key components
internal/intelligentrouter/— classifier, selector, scorer, catalog filtering, prompt builder, typesconfig/intelligent_routing.go— configuration struct, defaults, validation, env var overridesinternal/gateway/inference_intelligent.go— orchestrator hook that rewrites the model selector before resolutioninternal/server/model_validation.go— middleware skip for intelligent selectors to prevent prematuremodel_not_founderrorsinternal/observability/metrics.go— Prometheus counters and histograms for decisions, latency, fallbacks, and low-confidence casesinternal/virtualmodels/intelligent.go— intelligent virtual model support with strategy-qualified selectorsdocs/features/intelligent-routing.mdx— full feature documentationConfiguration
Test plan
make test-race"model": "auto"withmode: observelogsintelligent routing decisionwithout changing the executed model"model": "auto"withmode: enforcerewrites the selector and returns 200 with the selected model"model": "auto-cost"selects a cheaper model when pricing/tags are configuredfallback_modelwithanalysis_failed=truein logsmode: offreturnmodel_not_foundanalysis_user_pathin usage reports