feat(router): add intelligent model routing with auto selectors by vfeitoza · Pull Request #1 · vfeitoza/GoModel

vfeitoza · 2026-06-26T19:41:01Z

Summary

Adds an intelligent routing layer that classifies incoming chat requests with a cheap analyzer model and selects the best concrete model from the provider catalog before normal resolution runs.
Clients use virtual selectors (auto, smart, auto-cost, auto-quality) in the model field instead of concrete model names.
Three modes: off (default), observe (dry-run, logs recommendation without changing execution), and enforce (rewrites the selector before dispatch).

How it works

Client sends "model": "auto" (or another intelligent selector).
Gateway intercepts before provider resolution.
A cheap analyzer model classifies the request (complexity, task type, required capabilities, confidence score).
The scorer ranks all eligible catalog models against the active strategy (cost, balanced, quality, latency).
In enforce mode, the winning model replaces the selector; the request then goes through normal authorization and provider routing.

Key components

internal/intelligentrouter/ — classifier, selector, scorer, catalog filtering, prompt builder, types
config/intelligent_routing.go — configuration struct, defaults, validation, env var overrides
internal/gateway/inference_intelligent.go — orchestrator hook that rewrites the model selector before resolution
internal/server/model_validation.go — middleware skip for intelligent selectors to prevent premature model_not_found errors
internal/observability/metrics.go — Prometheus counters and histograms for decisions, latency, fallbacks, and low-confidence cases
internal/virtualmodels/intelligent.go — intelligent virtual model support with strategy-qualified selectors
docs/features/intelligent-routing.mdx — full feature documentation

Configuration

intelligent_routing:
  enabled: true
  mode: "enforce"        # off | observe | enforce
  analyzers:
    - model: "gpt-4o-mini"
      provider: "openai"
      max_tokens: 256
  defaults:
    strategy: "balanced" # cost | balanced | quality | latency
    min_savings_ratio: 0.15
    min_confidence: 0.7
  fallback_model: "openai/gpt-4o-mini"
  analysis_user_path: "/intelligent-router"

Test plan

Unit tests pass: make test-race
"model": "auto" with mode: observe logs intelligent routing decision without changing the executed model
"model": "auto" with mode: enforce rewrites the selector and returns 200 with the selected model
"model": "auto-cost" selects a cheaper model when pricing/tags are configured
All analyzers failing falls back to fallback_model with analysis_failed=true in logs
Intelligent selectors with mode: off return model_not_found
Analyzer usage appears under analysis_user_path in usage reports

Introduces an intelligent routing layer that classifies incoming chat requests with a cheap analyzer model and selects the best concrete model from the provider catalog before normal resolution runs. Clients send an intelligent selector in the model field instead of a concrete model name: - auto / smart balanced cost-quality selection - auto-cost prefer the cheapest eligible model - auto-quality prefer the highest-quality eligible model The router operates in three modes configured via intelligent_routing.mode: - off intelligent selectors return model_not_found (default) - observe classifies and logs the recommendation; executes the original requested model unchanged (dry-run) - enforce replaces the selector with the recommended model before provider dispatch A pool of analyzer models is tried in order with per-call timeout and automatic failover. When all analyzers fail, requests fall back to intelligent_routing.fallback_model or return model_not_found if unset. Analyzer usage is attributed to a dedicated analysis_user_path (/intelligent-router by default) so classification cost is separated from application request cost in usage reports. Key components: - internal/intelligentrouter: classifier, selector, scorer, catalog, prompt builder, and type definitions - config/intelligent_routing.go: configuration struct, defaults, validation, and env var overrides - internal/gateway/inference_intelligent.go: orchestrator hook that rewrites the model selector before resolution - internal/server/model_validation.go: middleware skip for intelligent selectors to prevent premature model_not_found errors - internal/observability/metrics.go: Prometheus counters and histograms for routing decisions, latency, fallbacks, and low-confidence cases - internal/virtualmodels: intelligent virtual model support with strategy-qualified selectors (intelligent:cost, intelligent:quality) - docs/features/intelligent-routing.mdx: full feature documentation All existing tests pass. New tests cover config validation, selector detection, catalog filtering, scoring, and middleware passthrough for intelligent selectors.

…bled newIntelligentRouterFromConfig returns a nil *intelligentrouter.Selector when the feature is inactive (the default). Assigning that typed nil directly to the gateway.IntelligentRouter interface field produced a non-nil interface wrapping a nil pointer, so the orchestrator's `== nil` guard was skipped and ShouldEvaluate panicked on every chat request, surfacing as HTTP 500. Only assign the router to the interface when the concrete pointer is non-nil, keeping the field at a true nil interface otherwise. Add a reflection-based typed-nil check in evaluateIntelligentRouter as defense in depth against future regressions. Reproduced and verified against the full integration suite.

…nd gradual context penalty Refine the candidate ranking so the cost and context dimensions contribute signal only when they actually discriminate between models, instead of flooding the weighted sum with neutral noise. - Cost abstention: when every candidate shares the same estimated cost (the common case for gateways whose model registry has no pricing), the cost dimension contributes nothing, letting quality and capability decide. The cost score is now computed once across all candidates in RankCandidates and passed into scoreCandidate. - Free-model tier: models tagged "free", "local", or "self-hosted" (local Ollama/vLLM) now win the cost dimension at 1.0, while paid models are capped at 0.5 when a free model is present, preserving a proportional, visible gap. - Gradual context penalty: requests that approach a model's context window receive a proportional penalty instead of a binary exclude. Requests above 80% of the window decay from 1.0 toward 0.10; requests that exceed it are still hard-excluded. The request size is now estimated from message text and passed through the Candidate.ContextScore field. All existing tests pass. New table-driven tests cover cost abstention, the free-model advantage, the risk-zone penalty, the hard exclude, and unknown windows.

Allow operators to attach routing_guidance to model metadata so the analyzer prompt can see explicit hints about when a model should be preferred. The classifier now builds its prompt from a first-pass candidate pool so guided models are visible before classification runs, while the final candidate list is still rebuilt afterward to apply capability-derived filters. Improve analyzer robustness by giving the same analyzer one repair attempt when its first response is not valid JSON. The repair call reuses the system prompt but asks only for the compact JSON object, without re-sending the user request. If repair succeeds, the router keeps the same analyzer; otherwise failover continues to the next analyzer as before. Also document routing_guidance in the intelligent routing feature guide.

Add optional conversation-aware memory to the intelligent router so repeated requests in the same conversation can keep model choice more consistent across multi-turn sessions. Clients can now send X-GoModel-Conversation-ID on translated chat/responses routes. The transport metadata is propagated into the intelligent routing selection meta, where the selector reads a short in-memory history keyed by (user_path, conversation_id) and includes it in the analyzer prompt as "Previous routing decisions". When intelligent routing applies a concrete model in enforce mode, the router records the applied qualified model back into the same in-memory store. Entries expire automatically after one hour and each conversation is capped to the most recent 50 routing decisions. Also document the optional header in the intelligent routing feature guide and add unit coverage for header extraction, history prompt rendering, memory expiry/limits, and applied-model recording.

Track the outcome of every provider call in a lightweight in-memory health tracker (exponential decay, configurable window/half-life, Bayesian smoothing). Candidates whose recent weighted error rate crosses the circuit-breaker threshold are hard-excluded from the intelligent-router candidate pool; others receive a proportional penalty so a degraded-but-alive model still competes at a lower score. Health outcomes are recorded at two points in the gateway: - executeWithUsage: credits the model that actually completed the request, and debits the originally-requested model when a fallback took over. - tryFallbackResponse: debits both the primary and each fallback attempt that fails before a winner is found. The IntelligentRouter interface gains RecordExecution(qualifiedModel, success) so the gateway can report outcomes without importing the intelligentrouter package directly. Config: new optional intelligent_routing.defaults.health block with defaults window=20m, half_life=5m, pseudo_counts=2.0, circuit_breaker=0.9. All values are configurable via env vars (INTELLIGENT_ROUTING_HEALTH_*).

Expose active intelligent selectors through the models endpoint and use the configured selector set during workflow resolution so custom selectors do not fail early as unsupported models.

vfeitoza added 7 commits June 26, 2026 16:38

fix(router): support configured intelligent selectors

11b1f42

Expose active intelligent selectors through the models endpoint and use the configured selector set during workflow resolution so custom selectors do not fail early as unsupported models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(router): add intelligent model routing with auto selectors#1

feat(router): add intelligent model routing with auto selectors#1
vfeitoza wants to merge 7 commits into
mainfrom
feat/intelligent-model-routing

vfeitoza commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant