Skip to content

feat(router): add intelligent model routing with auto selectors#1

Open
vfeitoza wants to merge 7 commits into
mainfrom
feat/intelligent-model-routing
Open

feat(router): add intelligent model routing with auto selectors#1
vfeitoza wants to merge 7 commits into
mainfrom
feat/intelligent-model-routing

Conversation

@vfeitoza

Copy link
Copy Markdown
Owner

Summary

  • Adds an intelligent routing layer that classifies incoming chat requests with a cheap analyzer model and selects the best concrete model from the provider catalog before normal resolution runs.
  • Clients use virtual selectors (auto, smart, auto-cost, auto-quality) in the model field instead of concrete model names.
  • Three modes: off (default), observe (dry-run, logs recommendation without changing execution), and enforce (rewrites the selector before dispatch).

How it works

  1. Client sends "model": "auto" (or another intelligent selector).
  2. Gateway intercepts before provider resolution.
  3. A cheap analyzer model classifies the request (complexity, task type, required capabilities, confidence score).
  4. The scorer ranks all eligible catalog models against the active strategy (cost, balanced, quality, latency).
  5. In enforce mode, the winning model replaces the selector; the request then goes through normal authorization and provider routing.

Key components

  • internal/intelligentrouter/ — classifier, selector, scorer, catalog filtering, prompt builder, types
  • config/intelligent_routing.go — configuration struct, defaults, validation, env var overrides
  • internal/gateway/inference_intelligent.go — orchestrator hook that rewrites the model selector before resolution
  • internal/server/model_validation.go — middleware skip for intelligent selectors to prevent premature model_not_found errors
  • internal/observability/metrics.go — Prometheus counters and histograms for decisions, latency, fallbacks, and low-confidence cases
  • internal/virtualmodels/intelligent.go — intelligent virtual model support with strategy-qualified selectors
  • docs/features/intelligent-routing.mdx — full feature documentation

Configuration

intelligent_routing:
  enabled: true
  mode: "enforce"        # off | observe | enforce
  analyzers:
    - model: "gpt-4o-mini"
      provider: "openai"
      max_tokens: 256
  defaults:
    strategy: "balanced" # cost | balanced | quality | latency
    min_savings_ratio: 0.15
    min_confidence: 0.7
  fallback_model: "openai/gpt-4o-mini"
  analysis_user_path: "/intelligent-router"

Test plan

  • Unit tests pass: make test-race
  • "model": "auto" with mode: observe logs intelligent routing decision without changing the executed model
  • "model": "auto" with mode: enforce rewrites the selector and returns 200 with the selected model
  • "model": "auto-cost" selects a cheaper model when pricing/tags are configured
  • All analyzers failing falls back to fallback_model with analysis_failed=true in logs
  • Intelligent selectors with mode: off return model_not_found
  • Analyzer usage appears under analysis_user_path in usage reports

vfeitoza added 7 commits June 26, 2026 16:38
Introduces an intelligent routing layer that classifies incoming chat
requests with a cheap analyzer model and selects the best concrete model
from the provider catalog before normal resolution runs.

Clients send an intelligent selector in the model field instead of a
concrete model name:

  - auto / smart   balanced cost-quality selection
  - auto-cost      prefer the cheapest eligible model
  - auto-quality   prefer the highest-quality eligible model

The router operates in three modes configured via intelligent_routing.mode:

  - off      intelligent selectors return model_not_found (default)
  - observe  classifies and logs the recommendation; executes the
             original requested model unchanged (dry-run)
  - enforce  replaces the selector with the recommended model before
             provider dispatch

A pool of analyzer models is tried in order with per-call timeout and
automatic failover. When all analyzers fail, requests fall back to
intelligent_routing.fallback_model or return model_not_found if unset.

Analyzer usage is attributed to a dedicated analysis_user_path
(/intelligent-router by default) so classification cost is separated
from application request cost in usage reports.

Key components:

  - internal/intelligentrouter: classifier, selector, scorer, catalog,
    prompt builder, and type definitions
  - config/intelligent_routing.go: configuration struct, defaults,
    validation, and env var overrides
  - internal/gateway/inference_intelligent.go: orchestrator hook that
    rewrites the model selector before resolution
  - internal/server/model_validation.go: middleware skip for intelligent
    selectors to prevent premature model_not_found errors
  - internal/observability/metrics.go: Prometheus counters and histograms
    for routing decisions, latency, fallbacks, and low-confidence cases
  - internal/virtualmodels: intelligent virtual model support with
    strategy-qualified selectors (intelligent:cost, intelligent:quality)
  - docs/features/intelligent-routing.mdx: full feature documentation

All existing tests pass. New tests cover config validation, selector
detection, catalog filtering, scoring, and middleware passthrough for
intelligent selectors.
…bled

newIntelligentRouterFromConfig returns a nil *intelligentrouter.Selector when
the feature is inactive (the default). Assigning that typed nil directly to
the gateway.IntelligentRouter interface field produced a non-nil interface
wrapping a nil pointer, so the orchestrator's `== nil` guard was skipped and
ShouldEvaluate panicked on every chat request, surfacing as HTTP 500.

Only assign the router to the interface when the concrete pointer is non-nil,
keeping the field at a true nil interface otherwise. Add a reflection-based
typed-nil check in evaluateIntelligentRouter as defense in depth against
future regressions.

Reproduced and verified against the full integration suite.
…nd gradual context penalty

Refine the candidate ranking so the cost and context dimensions contribute
signal only when they actually discriminate between models, instead of
flooding the weighted sum with neutral noise.

- Cost abstention: when every candidate shares the same estimated cost (the
  common case for gateways whose model registry has no pricing), the cost
  dimension contributes nothing, letting quality and capability decide. The
  cost score is now computed once across all candidates in RankCandidates and
  passed into scoreCandidate.
- Free-model tier: models tagged "free", "local", or "self-hosted" (local
  Ollama/vLLM) now win the cost dimension at 1.0, while paid models are capped
  at 0.5 when a free model is present, preserving a proportional, visible gap.
- Gradual context penalty: requests that approach a model's context window
  receive a proportional penalty instead of a binary exclude. Requests above
  80% of the window decay from 1.0 toward 0.10; requests that exceed it are
  still hard-excluded. The request size is now estimated from message text and
  passed through the Candidate.ContextScore field.

All existing tests pass. New table-driven tests cover cost abstention, the
free-model advantage, the risk-zone penalty, the hard exclude, and unknown
windows.
Allow operators to attach routing_guidance to model metadata so the analyzer
prompt can see explicit hints about when a model should be preferred. The
classifier now builds its prompt from a first-pass candidate pool so guided
models are visible before classification runs, while the final candidate list
is still rebuilt afterward to apply capability-derived filters.

Improve analyzer robustness by giving the same analyzer one repair attempt when
its first response is not valid JSON. The repair call reuses the system prompt
but asks only for the compact JSON object, without re-sending the user request.
If repair succeeds, the router keeps the same analyzer; otherwise failover
continues to the next analyzer as before.

Also document routing_guidance in the intelligent routing feature guide.
Add optional conversation-aware memory to the intelligent router so repeated
requests in the same conversation can keep model choice more consistent across
multi-turn sessions.

Clients can now send X-GoModel-Conversation-ID on translated chat/responses
routes. The transport metadata is propagated into the intelligent routing
selection meta, where the selector reads a short in-memory history keyed by
(user_path, conversation_id) and includes it in the analyzer prompt as
"Previous routing decisions".

When intelligent routing applies a concrete model in enforce mode, the router
records the applied qualified model back into the same in-memory store. Entries
expire automatically after one hour and each conversation is capped to the most
recent 50 routing decisions.

Also document the optional header in the intelligent routing feature guide and
add unit coverage for header extraction, history prompt rendering, memory
expiry/limits, and applied-model recording.
Track the outcome of every provider call in a lightweight in-memory health
tracker (exponential decay, configurable window/half-life, Bayesian smoothing).
Candidates whose recent weighted error rate crosses the circuit-breaker threshold
are hard-excluded from the intelligent-router candidate pool; others receive a
proportional penalty so a degraded-but-alive model still competes at a lower
score.

Health outcomes are recorded at two points in the gateway:
- executeWithUsage: credits the model that actually completed the request, and
  debits the originally-requested model when a fallback took over.
- tryFallbackResponse: debits both the primary and each fallback attempt that
  fails before a winner is found.

The IntelligentRouter interface gains RecordExecution(qualifiedModel, success)
so the gateway can report outcomes without importing the intelligentrouter
package directly.

Config: new optional intelligent_routing.defaults.health block with defaults
  window=20m, half_life=5m, pseudo_counts=2.0, circuit_breaker=0.9.
  All values are configurable via env vars (INTELLIGENT_ROUTING_HEALTH_*).
Expose active intelligent selectors through the models endpoint and use the configured selector set during workflow resolution so custom selectors do not fail early as unsupported models.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant