feat: add manual failover management#444
Conversation
📝 WalkthroughWalkthroughAdds a complete failover rules management system: new ChangesFailover Rules Management
Sequence Diagram(s)sequenceDiagram
participant AdminUI as Dashboard UI
participant Handler as admin.Handler
participant Service as failover.Service
participant Store as failover.Store
participant Resolver as fallback.Resolver
AdminUI->>Handler: GET /admin/failover
Handler->>Service: ListViews()
Service-->>Handler: []View
Handler-->>AdminUI: JSON rule list
AdminUI->>Handler: PUT /admin/failover
Handler->>Service: Upsert(Rule)
Service->>Store: Upsert(Rule)
Store-->>Service: ok
Service->>Service: Refresh()
Handler-->>AdminUI: View / 204
AdminUI->>Handler: POST /admin/failover/generate
Handler->>Resolver: SuggestFallbacks(resolution, op)
Resolver-->>Handler: []ModelSelector
Handler-->>AdminUI: []View suggestions
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
|
||
| func (s *MongoDBStore) Get(ctx context.Context, source string) (*Rule, error) { | ||
| var rule Rule | ||
| err := s.collection.FindOne(ctx, bson.M{"_id": strings.TrimSpace(source)}).Decode(&rule) |
| "created_at": rule.CreatedAt, | ||
| }, | ||
| } | ||
| _, err := s.collection.UpdateOne(ctx, bson.M{"_id": strings.TrimSpace(rule.Source)}, update, options.UpdateOne().SetUpsert(true)) |
| } | ||
|
|
||
| func (s *MongoDBStore) Delete(ctx context.Context, source string) error { | ||
| result, err := s.collection.DeleteOne(ctx, bson.M{"_id": strings.TrimSpace(source)}) |
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
There was a problem hiding this comment.
Actionable comments posted: 14
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/fallback/resolver_test.go (1)
55-83: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick winAssert that manual targets never reappear in
SuggestFallbacks().This test now only checks the first two suggestions. If
SuggestFallbacks()starts appending the already-configured manual target ("azure/gpt-4o") later in the slice, the test still passes even thoughinternal/fallback/resolver.go:171-189is explicitly deduping manual selectors before auto suggestions are generated. Add a negative assertion over the full result set. As per coding guidelines,**/*_test.go: Add or update tests for behavior changes.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/fallback/resolver_test.go` around lines 55 - 83, The fallback ranking test for Resolver currently only validates the first two suggestions, so it can miss regressions where the manual target is appended later in the slice. Update TestResolverSuggestFallbacksReturnsRankingCandidates to assert across the full result from SuggestFallbacks that the manual selector from Resolver.NewResolver config (the "azure/gpt-4o" entry) never appears, while still preserving the existing ordering checks for the top ranked auto suggestions.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@config/fallback.go`:
- Around line 252-263: The disabled-model JSON parsing currently accepts the
literal null because json.Unmarshal into the slice in the cfg.DisabledModelsJSON
branch succeeds with a nil result; update this parsing path to explicitly reject
null before or after unmarshaling. In the fallback config logic that builds
disabled, keep allowing only a JSON array or boolean object shape, and if the
expanded value is null return an error instead of treating it as valid input.
In `@docs/features/failover.mdx`:
- Around line 13-15: The overview text in the failover documentation has an
awkward, broken sentence that makes the runtime migration guidance unclear.
Update the affected prose in the failover overview to rephrase the runtime
failover guidance into a complete, unambiguous sentence, keeping the meaning
that runtime failover relies on manual rules only. Locate the text near the
introductory failover description and adjust the surrounding sentence flow for
clarity.
In `@docs/openapi.json`:
- Around line 6775-6782: The failover request schemas currently define source as
optional even though the delete and upsert handlers require it; update both
admin.deleteFailoverRuleRequest and admin.upsertFailoverRuleRequest in the
OpenAPI document to mark source as required. Make sure the required field list
matches the server contract so generated clients treat source as mandatory for
both request shapes.
- Around line 952-1003: The failover upsert response contract is missing the
successful 204 No Content path. Update the responses for the PUT /admin/failover
operation in docs/openapi.json to include a 204 entry alongside the existing
200/400/401/502/503 responses, matching the behavior in
internal/admin/handler_failover.go and its refreshed view handling.
In `@internal/admin/dashboard/static/js/modules/failover.js`:
- Around line 83-89: The failover edit flow in failover.js captures
failoverFormOriginalSource but never uses it when saving, so changing source
creates a new persisted rule instead of updating the existing one. Fix the save
path in the form handling logic by either making source immutable in edit mode
or by using failoverFormOriginalSource to delete/replace the old rule before
upserting the edited one; update the relevant edit/save methods around
failoverFormOriginalSource and failoverForm so the original key is handled
correctly.
- Around line 82-88: The failover UI is still using rule.managed to determine
whether a rule is config-managed, but this PR’s contract uses managed_source
instead. Update the failover state setup in the failover module to derive the
managed flag from managed_source, and make sure the template logic for the
edit/delete affordances also checks managed_source consistently so managed rules
are treated as non-editable dashboard entries.
In `@internal/admin/dashboard/templates/layout.html`:
- Around line 162-170: The failover target picker is using a datalist with a
textarea, but the `list` attribute is not supported on `<textarea>`, so the
autocomplete will never work. Update the `failover-target-options` /
`failover-targets` UI in `layout.html` to use a supported control such as an
`<input>` with `list`, or replace it with a custom autocomplete component while
keeping the `failoverForm.targets` binding and `failoverFormManaged` disabled
state intact.
In `@internal/admin/dashboard/templates/page-failover.html`:
- Around line 21-24: The failover filter input is bound to shared state but the
tables still render the unfiltered arrays, so typing has no effect and leaks
into the models page filter. Update the failover template to use a
failover-specific filtered collection in the table loops, and avoid reusing the
shared modelFilter binding by introducing a dedicated failover filter state and
derived list in the surrounding Alpine data/template logic. Ensure both table
sections that currently iterate the raw collections are switched to the new
filtered source.
In `@internal/admin/dashboard/templates/sidebar.html`:
- Around line 43-46: The sidebar Failover nav link is still shown and clickable
even when the feature is unavailable. Update the sidebar template’s Failover
entry to use the same availability check as the other failover entry points by
gating the anchor on failoverAvailable, so it is hidden when the API reports
failover is unavailable.
In `@internal/admin/handler_failover.go`:
- Around line 169-172: The `/admin/failover/generate` endpoint is documented as
returning `failover.View`, but it only fabricates draft data and should not use
the persisted-rule shape. Update the handler/Swagger annotations in
`admin/handler_failover.go` to return a dedicated draft DTO for this endpoint,
and adjust the generation path in the failover handler so it serializes only the
draft fields instead of `created_at`/`updated_at` or other stored-view-only
values.
In `@internal/failover/factory.go`:
- Around line 90-98: The failover factory currently reuses the model-cache
refresh ticker for propagation, which can leave peer nodes stale for too long.
Update the Result construction in the factory logic to use a dedicated, short
failover refresh interval or an explicit invalidation mechanism instead of
cfg.Cache.Model.RefreshInterval and the time.Hour fallback. Keep the change
localized around the service.StartBackgroundRefresh call so failover updates
propagate independently from the model cache.
In `@internal/failover/service_test.go`:
- Around line 60-89: The current test only covers the happy-path merge in
TestServiceConfigRulesOverrideDashboardRules; add regression tests for the new
write/error behavior too. Extend coverage around
Service/Refresh/UpsertRule/ResetDashboardRules to verify config-managed rules
cannot be overwritten by dashboard writes, config rules survive
ResetDashboardRules, and storage failures from refresh/upsert are surfaced as
errors instead of being swallowed. Use the existing symbols NewService, Refresh,
Rules, Get, and the managed-source constants to keep the tests aligned with the
failover contract.
In `@internal/failover/service.go`:
- Around line 183-185: The pre-read in `service.go` is swallowing unexpected
`Store.Get` errors and still allowing the write to continue, which can hide
storage failures and overwrite metadata. Update the `existing, err :=
s.store.Get(ctx, normalized.Source)` check in the rule write path to distinguish
`ErrNotFound` from other errors: keep the current `CreatedAt` preservation only
when a record exists, but immediately return any non-`ErrNotFound` error before
proceeding. Ensure the fix is applied in the same update flow that uses
`normalized.CreatedAt` and `s.store.Get`.
In `@internal/failover/store_mongodb.go`:
- Around line 19-32: NewMongoDBStore creates indexes with context.Background(),
so index creation ignores the caller’s cancellation and shutdown state. Update
the MongoDB initialization path to accept and use the caller context in
NewMongoDBStore, and pass that ctx through to coll.Indexes().CreateMany instead
of creating a new timeout context. Keep the existing error handling and index
definitions intact while aligning the MongoDB flow with the PostgreSQL init
path.
---
Outside diff comments:
In `@internal/fallback/resolver_test.go`:
- Around line 55-83: The fallback ranking test for Resolver currently only
validates the first two suggestions, so it can miss regressions where the manual
target is appended later in the slice. Update
TestResolverSuggestFallbacksReturnsRankingCandidates to assert across the full
result from SuggestFallbacks that the manual selector from Resolver.NewResolver
config (the "azure/gpt-4o" entry) never appears, while still preserving the
existing ordering checks for the top ranked auto suggestions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 849c231b-e921-4bda-ac96-a07c5211f4eb
📒 Files selected for processing (36)
.env.templateconfig/config.example.yamlconfig/config.goconfig/config_test.goconfig/fallback.godocs/features/failover.mdxdocs/openapi.jsoninternal/admin/dashboard/static/js/dashboard.jsinternal/admin/dashboard/static/js/modules/failover.jsinternal/admin/dashboard/static/js/modules/workflows-layout.test.cjsinternal/admin/dashboard/static/js/modules/workflows.jsinternal/admin/dashboard/static/js/modules/workflows.test.cjsinternal/admin/dashboard/templates/index.htmlinternal/admin/dashboard/templates/layout.htmlinternal/admin/dashboard/templates/model-table-body.htmlinternal/admin/dashboard/templates/page-failover.htmlinternal/admin/dashboard/templates/page-settings.htmlinternal/admin/dashboard/templates/sidebar.htmlinternal/admin/handler.gointernal/admin/handler_failover.gointernal/admin/handler_test.gointernal/admin/routes.gointernal/admin/routes_test.gointernal/app/app.gointernal/app/app_test.gointernal/failover/factory.gointernal/failover/service.gointernal/failover/service_test.gointernal/failover/store.gointernal/failover/store_mongodb.gointernal/failover/store_postgresql.gointernal/failover/store_sqlite.gointernal/failover/types.gointernal/fallback/resolver.gointernal/fallback/resolver_test.gointernal/server/http_test.go
| if raw := strings.TrimSpace(cfg.DisabledModelsJSON); raw != "" { | ||
| expanded := expandString(raw) | ||
| var list []string | ||
| if err := json.Unmarshal([]byte(expanded), &list); err == nil { | ||
| for _, model := range list { | ||
| model = strings.TrimSpace(model) | ||
| if model != "" { | ||
| disabled[model] = true | ||
| } | ||
| } | ||
| return nilIfEmpty(disabled), nil | ||
| } |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Reject null for disabled-model JSON.
json.Unmarshal("null", &list) succeeds with a nil slice, so FAILOVER_DISABLED_MODELS_JSON=null is silently accepted even though this field only allows an array or boolean object.
Proposed fix
if raw := strings.TrimSpace(cfg.DisabledModelsJSON); raw != "" {
expanded := expandString(raw)
+ if bytes.Equal(bytes.TrimSpace([]byte(expanded)), []byte("null")) {
+ return nil, fmt.Errorf("fallback.disabled_models_json: null is not allowed; must be a JSON array or boolean object")
+ }
var list []string
if err := json.Unmarshal([]byte(expanded), &list); err == nil {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if raw := strings.TrimSpace(cfg.DisabledModelsJSON); raw != "" { | |
| expanded := expandString(raw) | |
| var list []string | |
| if err := json.Unmarshal([]byte(expanded), &list); err == nil { | |
| for _, model := range list { | |
| model = strings.TrimSpace(model) | |
| if model != "" { | |
| disabled[model] = true | |
| } | |
| } | |
| return nilIfEmpty(disabled), nil | |
| } | |
| if raw := strings.TrimSpace(cfg.DisabledModelsJSON); raw != "" { | |
| expanded := expandString(raw) | |
| if bytes.Equal(bytes.TrimSpace([]byte(expanded)), []byte("null")) { | |
| return nil, fmt.Errorf("fallback.disabled_models_json: null is not allowed; must be a JSON array or boolean object") | |
| } | |
| var list []string | |
| if err := json.Unmarshal([]byte(expanded), &list); err == nil { | |
| for _, model := range list { | |
| model = strings.TrimSpace(model) | |
| if model != "" { | |
| disabled[model] = true | |
| } | |
| } | |
| return nilIfEmpty(disabled), nil | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@config/fallback.go` around lines 252 - 263, The disabled-model JSON parsing
currently accepts the literal null because json.Unmarshal into the slice in the
cfg.DisabledModelsJSON branch succeeds with a nil result; update this parsing
path to explicitly reject null before or after unmarshaling. In the fallback
config logic that builds disabled, keep allowing only a JSON array or boolean
object shape, and if the expanded value is null return an error instead of
treating it as valid input.
| When a request fails, GoModel can retry it against alternate models. For | ||
| predictable behavior, use manual mode. | ||
| runtime failover uses manual rules only. | ||
|
|
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Fix the broken overview sentence.
Line 14 reads awkwardly as “For runtime failover uses manual rules only.” Please rephrase it so the migration guidance is unambiguous to readers.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/features/failover.mdx` around lines 13 - 15, The overview text in the
failover documentation has an awkward, broken sentence that makes the runtime
migration guidance unclear. Update the affected prose in the failover overview
to rephrase the runtime failover guidance into a complete, unambiguous sentence,
keeping the meaning that runtime failover relies on manual rules only. Locate
the text near the introductory failover description and adjust the surrounding
sentence flow for clarity.
| "responses": { | ||
| "200": { | ||
| "description": "OK", | ||
| "content": { | ||
| "application/json": { | ||
| "schema": { | ||
| "$ref": "#/components/schemas/failover.View" | ||
| } | ||
| } | ||
| } | ||
| }, | ||
| "400": { | ||
| "description": "Bad Request", | ||
| "content": { | ||
| "application/json": { | ||
| "schema": { | ||
| "$ref": "#/components/schemas/core.GatewayError" | ||
| } | ||
| } | ||
| } | ||
| }, | ||
| "401": { | ||
| "description": "Unauthorized", | ||
| "content": { | ||
| "application/json": { | ||
| "schema": { | ||
| "$ref": "#/components/schemas/core.GatewayError" | ||
| } | ||
| } | ||
| } | ||
| }, | ||
| "502": { | ||
| "description": "Bad Gateway", | ||
| "content": { | ||
| "application/json": { | ||
| "schema": { | ||
| "$ref": "#/components/schemas/core.GatewayError" | ||
| } | ||
| } | ||
| } | ||
| }, | ||
| "503": { | ||
| "description": "Service Unavailable", | ||
| "content": { | ||
| "application/json": { | ||
| "schema": { | ||
| "$ref": "#/components/schemas/core.GatewayError" | ||
| } | ||
| } | ||
| } | ||
| } | ||
| }, |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Document the 204 No Content upsert path.
internal/admin/handler_failover.go can return 204 after a successful PUT /admin/failover when the refreshed view is unavailable, but this operation only advertises 200. That makes the published API contract narrower than the handler.
Suggested OpenAPI fix
"responses": {
"200": {
"description": "OK",
"content": {
"application/json": {
"schema": {
"$ref": "`#/components/schemas/failover.View`"
}
}
}
},
+ "204": {
+ "description": "No Content"
+ },
"400": {
"description": "Bad Request",
"content": {
"application/json": {
"schema": {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "responses": { | |
| "200": { | |
| "description": "OK", | |
| "content": { | |
| "application/json": { | |
| "schema": { | |
| "$ref": "#/components/schemas/failover.View" | |
| } | |
| } | |
| } | |
| }, | |
| "400": { | |
| "description": "Bad Request", | |
| "content": { | |
| "application/json": { | |
| "schema": { | |
| "$ref": "#/components/schemas/core.GatewayError" | |
| } | |
| } | |
| } | |
| }, | |
| "401": { | |
| "description": "Unauthorized", | |
| "content": { | |
| "application/json": { | |
| "schema": { | |
| "$ref": "#/components/schemas/core.GatewayError" | |
| } | |
| } | |
| } | |
| }, | |
| "502": { | |
| "description": "Bad Gateway", | |
| "content": { | |
| "application/json": { | |
| "schema": { | |
| "$ref": "#/components/schemas/core.GatewayError" | |
| } | |
| } | |
| } | |
| }, | |
| "503": { | |
| "description": "Service Unavailable", | |
| "content": { | |
| "application/json": { | |
| "schema": { | |
| "$ref": "#/components/schemas/core.GatewayError" | |
| } | |
| } | |
| } | |
| } | |
| }, | |
| "responses": { | |
| "200": { | |
| "description": "OK", | |
| "content": { | |
| "application/json": { | |
| "schema": { | |
| "$ref": "`#/components/schemas/failover.View`" | |
| } | |
| } | |
| } | |
| }, | |
| "204": { | |
| "description": "No Content" | |
| }, | |
| "400": { | |
| "description": "Bad Request", | |
| "content": { | |
| "application/json": { | |
| "schema": { | |
| "$ref": "`#/components/schemas/core.GatewayError`" | |
| } | |
| } | |
| } | |
| }, | |
| "401": { | |
| "description": "Unauthorized", | |
| "content": { | |
| "application/json": { | |
| "schema": { | |
| "$ref": "`#/components/schemas/core.GatewayError`" | |
| } | |
| } | |
| } | |
| }, | |
| "502": { | |
| "description": "Bad Gateway", | |
| "content": { | |
| "application/json": { | |
| "schema": { | |
| "$ref": "`#/components/schemas/core.GatewayError`" | |
| } | |
| } | |
| } | |
| }, | |
| "503": { | |
| "description": "Service Unavailable", | |
| "content": { | |
| "application/json": { | |
| "schema": { | |
| "$ref": "`#/components/schemas/core.GatewayError`" | |
| } | |
| } | |
| } | |
| } | |
| }, |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/openapi.json` around lines 952 - 1003, The failover upsert response
contract is missing the successful 204 No Content path. Update the responses for
the PUT /admin/failover operation in docs/openapi.json to include a 204 entry
alongside the existing 200/400/401/502/503 responses, matching the behavior in
internal/admin/handler_failover.go and its refreshed view handling.
| "admin.deleteFailoverRuleRequest": { | ||
| "type": "object", | ||
| "properties": { | ||
| "source": { | ||
| "type": "string" | ||
| } | ||
| } | ||
| }, |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Mark source as required in both failover request schemas.
Both the delete and upsert handlers reject missing or blank source, but these schemas leave it optional. That weakens generated clients and misstates the server contract.
Suggested OpenAPI fix
"admin.deleteFailoverRuleRequest": {
"type": "object",
+ "required": [
+ "source"
+ ],
"properties": {
"source": {
"type": "string"
}
}
},
@@
"admin.upsertFailoverRuleRequest": {
"type": "object",
+ "required": [
+ "source"
+ ],
"properties": {
"description": {
"type": "string"
},Also applies to: 6957-6976
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/openapi.json` around lines 6775 - 6782, The failover request schemas
currently define source as optional even though the delete and upsert handlers
require it; update both admin.deleteFailoverRuleRequest and
admin.upsertFailoverRuleRequest in the OpenAPI document to mark source as
required. Make sure the required field list matches the server contract so
generated clients treat source as mandatory for both request shapes.
| this.failoverFormManaged = Boolean(rule.managed); | ||
| this.failoverFormOriginalSource = rule.source || ''; | ||
| this.failoverForm = { | ||
| source: rule.source || '', | ||
| targets: (Array.isArray(rule.targets) ? rule.targets : []).join('\n'), | ||
| description: rule.description || '', | ||
| enabled: rule.enabled !== false |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Use managed_source consistently for managed-rule state.
This module still keys off rule.managed, but the view contract in this PR uses managed_source. As written, config-managed rules will be labeled as normal dashboard rules here, and the same mismatch leaks into the template’s edit/delete affordances. Gate the UI off managed_source instead.
Also applies to: 297-300
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/admin/dashboard/static/js/modules/failover.js` around lines 82 - 88,
The failover UI is still using rule.managed to determine whether a rule is
config-managed, but this PR’s contract uses managed_source instead. Update the
failover state setup in the failover module to derive the managed flag from
managed_source, and make sure the template logic for the edit/delete affordances
also checks managed_source consistently so managed rules are treated as
non-editable dashboard entries.
| // @Success 200 {array} failover.View | ||
| // @Failure 401 {object} core.GatewayError | ||
| // @Failure 503 {object} core.GatewayError | ||
| // @Router /admin/failover/generate [post] |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift
Don't return draft suggestions as failover.View.
failover.View is the persisted-rule schema, but this endpoint fabricates only a subset of those fields. The generated payload will serialize zero created_at/updated_at values and a stored-view shape for objects that have not been saved yet. Please switch /admin/failover/generate to a dedicated typed draft DTO instead of reusing failover.View.
Suggested direction
+type generateFailoverRuleResponse struct {
+ Source string `json:"source"`
+ Targets []string `json:"targets"`
+ Description string `json:"description,omitempty"`
+ Enabled bool `json:"enabled"`
+}
+
func (h *Handler) GenerateFailoverRules(c *echo.Context) error {
...
- suggestions := make([]failover.View, 0)
+ suggestions := make([]generateFailoverRuleResponse, 0)
...
- suggestions = append(suggestions, failover.View{
- Source: source,
- Targets: targets,
- Enabled: true,
- ManagedSource: failover.ManagedSourceDashboard,
- })
+ suggestions = append(suggestions, generateFailoverRuleResponse{
+ Source: source,
+ Targets: targets,
+ Enabled: true,
+ })
}
return c.JSON(http.StatusOK, suggestions)
}Also applies to: 207-212
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/admin/handler_failover.go` around lines 169 - 172, The
`/admin/failover/generate` endpoint is documented as returning `failover.View`,
but it only fabricates draft data and should not use the persisted-rule shape.
Update the handler/Swagger annotations in `admin/handler_failover.go` to return
a dedicated draft DTO for this endpoint, and adjust the generation path in the
failover handler so it serializes only the draft fields instead of
`created_at`/`updated_at` or other stored-view-only values.
| refreshInterval := time.Duration(cfg.Cache.Model.RefreshInterval) * time.Second | ||
| if refreshInterval <= 0 { | ||
| refreshInterval = time.Hour | ||
| } | ||
| return &Result{ | ||
| Service: service, | ||
| Store: store, | ||
| stopRefresh: service.StartBackgroundRefresh(refreshInterval), | ||
| }, nil |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy lift
Don't bind failover propagation to the model-cache refresh interval.
Dashboard writes refresh only the local process. Every other instance has to wait for this ticker, and the current fallback to time.Hour means a manual failover change can stay stale on peer nodes for up to an hour. For an incident-response control, that lag is too high; give failover its own short refresh interval or an explicit invalidation path.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/failover/factory.go` around lines 90 - 98, The failover factory
currently reuses the model-cache refresh ticker for propagation, which can leave
peer nodes stale for too long. Update the Result construction in the factory
logic to use a dedicated, short failover refresh interval or an explicit
invalidation mechanism instead of cfg.Cache.Model.RefreshInterval and the
time.Hour fallback. Keep the change localized around the
service.StartBackgroundRefresh call so failover updates propagate independently
from the model cache.
| func TestServiceConfigRulesOverrideDashboardRules(t *testing.T) { | ||
| store := newMemoryStore(Rule{ | ||
| Source: "gpt-4o", | ||
| Targets: []string{"openrouter/gpt-4o"}, | ||
| Enabled: true, | ||
| ManagedSource: ManagedSourceDashboard, | ||
| }) | ||
| service, err := NewService(store, config.FallbackConfig{ | ||
| Enabled: true, | ||
| Manual: map[string][]string{ | ||
| "gpt-4o": {"azure/gpt-4o"}, | ||
| }, | ||
| }) | ||
| if err != nil { | ||
| t.Fatalf("NewService() error = %v", err) | ||
| } | ||
| if err := service.Refresh(context.Background()); err != nil { | ||
| t.Fatalf("Refresh() error = %v", err) | ||
| } | ||
|
|
||
| got := service.Rules()["gpt-4o"] | ||
| want := []string{"azure/gpt-4o"} | ||
| if !reflect.DeepEqual(got, want) { | ||
| t.Fatalf("Rules()[gpt-4o] = %v, want %v", got, want) | ||
| } | ||
| view, ok := service.Get("gpt-4o") | ||
| if !ok || view.ManagedSource != ManagedSourceConfig { | ||
| t.Fatalf("Get(gpt-4o) = %+v, %v; want config-managed rule", view, ok) | ||
| } | ||
| } |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win
Add regression coverage for the new write and error paths.
This only exercises the happy-path merge. The new contract also depends on rejecting config-managed writes, preserving config rules across ResetDashboardRules, and surfacing refresh/upsert storage failures correctly.
As per coding guidelines, **/*_test.go: “Add or update tests for behavior changes, and ensure tests cover ... error handling, default configuration ...”.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/failover/service_test.go` around lines 60 - 89, The current test
only covers the happy-path merge in
TestServiceConfigRulesOverrideDashboardRules; add regression tests for the new
write/error behavior too. Extend coverage around
Service/Refresh/UpsertRule/ResetDashboardRules to verify config-managed rules
cannot be overwritten by dashboard writes, config rules survive
ResetDashboardRules, and storage failures from refresh/upsert are surfaced as
errors instead of being swallowed. Use the existing symbols NewService, Refresh,
Rules, Get, and the managed-source constants to keep the tests aligned with the
failover contract.
Source: Coding guidelines
| if existing, err := s.store.Get(ctx, normalized.Source); err == nil && existing != nil { | ||
| normalized.CreatedAt = existing.CreatedAt | ||
| } |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Propagate unexpected Store.Get failures before writing.
A non-ErrNotFound read error is silently ignored here, so an update can still succeed after a failed pre-read and overwrite metadata as if the rule were new. That both hides a storage fault from the admin API and can reset CreatedAt on existing rows.
Suggested fix
import (
"context"
+ "errors"
"fmt"
"log/slog"
"sort"
@@
- if existing, err := s.store.Get(ctx, normalized.Source); err == nil && existing != nil {
+ existing, err := s.store.Get(ctx, normalized.Source)
+ if err != nil && !errors.Is(err, ErrNotFound) {
+ return fmt.Errorf("get failover rule: %w", err)
+ }
+ if err == nil && existing != nil {
normalized.CreatedAt = existing.CreatedAt
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if existing, err := s.store.Get(ctx, normalized.Source); err == nil && existing != nil { | |
| normalized.CreatedAt = existing.CreatedAt | |
| } | |
| import ( | |
| "context" | |
| "errors" | |
| "fmt" | |
| "log/slog" | |
| "sort" | |
| ) | |
| existing, err := s.store.Get(ctx, normalized.Source) | |
| if err != nil && !errors.Is(err, ErrNotFound) { | |
| return fmt.Errorf("get failover rule: %w", err) | |
| } | |
| if err == nil && existing != nil { | |
| normalized.CreatedAt = existing.CreatedAt | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/failover/service.go` around lines 183 - 185, The pre-read in
`service.go` is swallowing unexpected `Store.Get` errors and still allowing the
write to continue, which can hide storage failures and overwrite metadata.
Update the `existing, err := s.store.Get(ctx, normalized.Source)` check in the
rule write path to distinguish `ErrNotFound` from other errors: keep the current
`CreatedAt` preservation only when a record exists, but immediately return any
non-`ErrNotFound` error before proceeding. Ensure the fix is applied in the same
update flow that uses `normalized.CreatedAt` and `s.store.Get`.
Confidence Score: 3/5Two behavioral issues exist in the changed runtime paths: one affects traffic routing for existing deployments that previously disabled fallback via deprecated The backward-compatibility gap —
Reviews (1): Last reviewed commit: "feat: add manual failover management" | Re-trigger Greptile |
| } | ||
|
|
||
| cfg.DefaultMode = ResolveFallbackDefaultMode(cfg.DefaultMode) | ||
| if !cfg.DefaultMode.Valid() { | ||
| return fmt.Errorf("fallback.default_mode must be one of: auto, manual, off") | ||
| } | ||
|
|
||
| if len(cfg.Overrides) > 0 { |
There was a problem hiding this comment.
default_mode: off no longer disables failover — silent behavior change on upgrade
FEATURE_FALLBACK_MODE=off / default_mode: off was the documented way to globally disable fallback. The old resolver gate was if mode == FallbackModeOff && len(cfg.Manual) == 0 && len(cfg.Overrides) == 0 { return nil }, so with manual rules present, the resolver existed but modeFor() returned FallbackModeOff for every model that lacked an explicit mode: manual override — those models got no fallbacks.
After this PR, Enabled defaults to true and DefaultMode is silently ignored at runtime. Any operator who had default_mode: off (or FEATURE_FALLBACK_MODE=off) together with a populated manual_rules_path (or FALLBACK_MANUAL_RULES_PATH) will silently receive fallbacks for all those models on the next deploy.
loadFallbackConfig processes the deprecated DefaultMode field but never maps FallbackModeOff → Enabled=false. Either add that migration or update the deprecation notice and migration docs to call out this specific case explicitly.
| requiredCategory := requiredCategoryForOperation(op) | ||
| if requiredCategory == core.CategoryEmbedding { | ||
| return nil | ||
| } | ||
| source := r.sourceModelInfo(resolution) | ||
| if r.disabledFor(resolution, source) { | ||
| return nil | ||
| } | ||
| sourceKey := r.sourceKey(resolution, source) | ||
| seen := make(map[string]struct{}) | ||
| for _, selector := range r.manualSelectorsFor(resolution, source, sourceKey, seen) { | ||
| seen[selector.QualifiedModel()] = struct{}{} | ||
| } | ||
| return r.autoSelectorsFor(source, sourceKey, requiredCategory, seen) | ||
| } | ||
|
|
||
| func (r *Resolver) effectiveManualRules() map[string][]string { | ||
| if r.ruleProvider == nil { | ||
| return r.manual | ||
| } | ||
| dynamic := r.ruleProvider.Rules() |
There was a problem hiding this comment.
Dynamic rules silently override static config rules in
effectiveManualRules()
In effectiveManualRules(), r.manual is written first then dynamic overwrites every shared key. If the service's in-memory state is stale during a background refresh, the dynamic map may not yet carry the correct config values and would overwrite the authoritative static snapshot in r.manual. Consider reversing the merge order so r.manual wins for any key it already owns.
| source = strings.TrimSpace(source) | ||
| for _, row := range s.List() { | ||
| if row.Source == source { | ||
| return &row, true | ||
| } | ||
| } | ||
| return nil, false | ||
| } | ||
|
|
There was a problem hiding this comment.
Silent error swallowing in
Upsert when preserving CreatedAt
When s.store.Get returns an error other than ErrNotFound, the condition is false and CreatedAt is silently left at zero. stampUpsert then sets it to now, effectively resetting created_at on every transient DB failure. Distinguish ErrNotFound from other errors and propagate the latter.
| } | ||
| suggestions = append(suggestions, failover.View{ | ||
| Source: source, | ||
| Targets: targets, | ||
| Enabled: true, | ||
| ManagedSource: failover.ManagedSourceDashboard, | ||
| }) | ||
| } | ||
| return c.JSON(http.StatusOK, suggestions) | ||
| } | ||
|
|
||
| func modelSupportsCategory(meta *core.ModelMetadata, category core.ModelCategory) bool { | ||
| if meta == nil || len(meta.Categories) == 0 { | ||
| return true | ||
| } | ||
| for _, candidate := range meta.Categories { | ||
| if candidate == category { | ||
| return true | ||
| } | ||
| } | ||
| return false | ||
| } | ||
|
|
||
| func (h *Handler) findFailoverView(source string) (failover.View, bool) { | ||
| for _, view := range h.failoverRules.ListViews() { | ||
| if view.Source == source { | ||
| return view, true | ||
| } | ||
| } | ||
| return failover.View{}, false | ||
| } | ||
|
|
||
| func failoverWriteError(err error) error { | ||
| if errors.Is(err, failover.ErrManaged) { | ||
| return core.NewInvalidRequestError("failover rule is managed by configuration and cannot be changed in the dashboard", err) |
There was a problem hiding this comment.
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
internal/admin/dashboard/static/js/modules/failover.js (1)
34-37: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winClear generated drafts when the failover feature becomes unavailable.
On the 503 and fetch-error paths, this only clears
failoverRules.page-failover.htmlstill rendersfailoverGeneratedRulesindependently, so stale generated drafts remain visible and reopenable after the backend reports the feature as unavailable. ResetfailoverGeneratedRulesin these early-return paths too.Also applies to: 49-53
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/admin/dashboard/static/js/modules/failover.js` around lines 34 - 37, The failover unavailable paths in failover.js only clear failoverRules, leaving stale drafts in failoverGeneratedRules visible in page-failover.html. Update the early-return branches in the fetch response handling (including the 503 and fetch-error paths) to also reset failoverGeneratedRules alongside failoverAvailable and failoverRules. Use the failoverGeneratedRules state in the failover.js module as the target to clear so the UI no longer shows reopenable generated drafts after the feature becomes unavailable.internal/admin/handler_failover.go (1)
69-83: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick winNormalize
fallback_modelsbefore storing them.
primary_modelis trimmed and validated, but each fallback selector is persisted verbatim. A payload like[" azure/gpt-4o ", ""]will survive this PUT and store selectors that will never match later. Trim each entry and reject blank targets before building the rule.Suggested fix
source := strings.TrimSpace(req.PrimaryModel) if source == "" { return handleError(c, core.NewInvalidRequestError("primary_model is required", nil)) } + targets := make([]string, 0, len(req.FallbackModels)) + for _, raw := range req.FallbackModels { + target := strings.TrimSpace(raw) + if target == "" { + return handleError(c, core.NewInvalidRequestError("fallback_models must not contain blank entries", nil)) + } + targets = append(targets, target) + } enabled := true if existing, ok := h.failoverRules.Get(source); ok && existing != nil { enabled = existing.Enabled } @@ rule := failover.Rule{ Source: source, - Targets: req.FallbackModels, + Targets: targets, Enabled: enabled, }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/admin/handler_failover.go` around lines 69 - 83, The failover rule creation path in handler_failover.go trims and validates primary_model, but stores req.FallbackModels unchanged in failover.Rule.Targets. Normalize each fallback model before constructing the rule by trimming whitespace in the same handler logic, and reject any blank entries with an invalid request error so only valid selectors are persisted.docs/features/failover.mdx (1)
42-52: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick winDocument the disable-list controls too.
This page explains manual rule sources, but it never mentions
fallback.disabled_models/FAILOVER_DISABLED_MODELS_JSON, even though both are now part of the exposed config surface inconfig/config.example.yamland.env.template. Without that, operators have no docs path for opting specific selectors out of failover.Suggested addition
For env-only deployments, use inline JSON: @@ `fallback.default_mode` and `FEATURE_FALLBACK_MODE` are deprecated compatibility inputs. They are accepted but ignored by runtime failover. + +To disable failover for specific primary model selectors, use +`fallback.disabled_models` in YAML or `FAILOVER_DISABLED_MODELS_JSON` in env-only +deployments. Example: + +```bash +FAILOVER_DISABLED_MODELS_JSON='["claude-sonnet-4"]' +``` The Dashboard can generate suggested mappings from model metadata, but suggestions must be saved as manual mappings before they affect traffic.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/features/failover.mdx` around lines 42 - 52, The failover docs in the failover page are missing the disable-list configuration, so add coverage for fallback.disabled_models and FAILOVER_DISABLED_MODELS_JSON alongside the existing rule source examples. Update the documentation near the inline JSON / environment variable examples to show how operators can opt specific selectors out of failover, and mention that this is part of the supported config surface just like the manual mappings described in the page.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/failover/store_postgresql.go`:
- Around line 26-29: The PostgreSQL failover store DDL only covers fresh
installs, so upgraded databases with the old source/targets columns will still
break when FailoverStore reads or upserts failover_rules. Update the store
initialization/migration path in FailoverStore or its schema setup to detect the
legacy layout and run a real ALTER TABLE migration that renames source to
primary_model and targets to fallback_models before any queries use the new
names.
In `@internal/failover/store_sqlite.go`:
- Around line 21-24: The failover_rules initializer in store_sqlite.go only
creates the new table shape for fresh databases, so existing SQLite installs
will keep the old source/targets schema and break later queries. Update the
initialization/migration path in the failover store setup to detect and migrate
the existing failover_rules table before any SELECT or upsert logic runs,
backfilling/renaming source and targets into primary_model and fallback_models
so the new column names are always available.
In `@internal/failover/types.go`:
- Line 17: The BSON tag change on Targets in the failover types breaks decoding
of legacy failover_rules documents that still persist targets, so
MongoDBStore.List/Get will return empty Targets and upserts may rewrite them
under fallback_models. Add backward-compatible read handling in the failover
decoding path (or a migration in MongoDBStore) so both targets and
fallback_models are accepted, and only switch the persisted tag after existing
documents can be read reliably.
---
Outside diff comments:
In `@docs/features/failover.mdx`:
- Around line 42-52: The failover docs in the failover page are missing the
disable-list configuration, so add coverage for fallback.disabled_models and
FAILOVER_DISABLED_MODELS_JSON alongside the existing rule source examples.
Update the documentation near the inline JSON / environment variable examples to
show how operators can opt specific selectors out of failover, and mention that
this is part of the supported config surface just like the manual mappings
described in the page.
In `@internal/admin/dashboard/static/js/modules/failover.js`:
- Around line 34-37: The failover unavailable paths in failover.js only clear
failoverRules, leaving stale drafts in failoverGeneratedRules visible in
page-failover.html. Update the early-return branches in the fetch response
handling (including the 503 and fetch-error paths) to also reset
failoverGeneratedRules alongside failoverAvailable and failoverRules. Use the
failoverGeneratedRules state in the failover.js module as the target to clear so
the UI no longer shows reopenable generated drafts after the feature becomes
unavailable.
In `@internal/admin/handler_failover.go`:
- Around line 69-83: The failover rule creation path in handler_failover.go
trims and validates primary_model, but stores req.FallbackModels unchanged in
failover.Rule.Targets. Normalize each fallback model before constructing the
rule by trimming whitespace in the same handler logic, and reject any blank
entries with an invalid request error so only valid selectors are persisted.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 111eae64-2273-418b-ba43-1ec136d2ce31
📒 Files selected for processing (14)
.env.templateconfig/config.example.yamldocs/features/failover.mdxdocs/openapi.jsoninternal/admin/dashboard/static/js/modules/failover.jsinternal/admin/dashboard/templates/layout.htmlinternal/admin/dashboard/templates/page-failover.htmlinternal/admin/handler_failover.gointernal/failover/service.gointernal/failover/store.gointernal/failover/store_mongodb.gointernal/failover/store_postgresql.gointernal/failover/store_sqlite.gointernal/failover/types.go
| // Rule is one manual failover mapping for a primary model selector. | ||
| type Rule struct { | ||
| Source string `json:"primary_model" bson:"_id"` | ||
| Targets []string `json:"fallback_models" bson:"fallback_models"` |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift
Keep backward-compatible BSON decoding for existing failover documents.
Changing Targets to bson:"fallback_models" means legacy failover_rules documents that still store targets will decode with empty Targets in MongoDBStore.List/Get, and the next upsert will rewrite them under the new field name. Add a migration or dual-read compatibility path before switching the persisted tag.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/failover/types.go` at line 17, The BSON tag change on Targets in the
failover types breaks decoding of legacy failover_rules documents that still
persist targets, so MongoDBStore.List/Get will return empty Targets and upserts
may rewrite them under fallback_models. Add backward-compatible read handling in
the failover decoding path (or a migration in MongoDBStore) so both targets and
fallback_models are accepted, and only switch the persisted tag after existing
documents can be read reliably.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/failover/store_postgresql.go`:
- Around line 40-63: Scope the migration checks in the failover PostgreSQL
migration so they only inspect the active schema. In the migration logic that
renames failover_rules columns in store_postgresql.go, update each
information_schema.columns existence check to include table_schema =
current_schema() alongside table_name and column_name, so the ALTER TABLE
failover_rules statements only depend on columns in the schema being migrated.
In `@internal/failover/store_sqlite_test.go`:
- Around line 51-64: The test in store_sqlite_test.go currently verifies only
Source and Targets, so add assertions for the migrated metadata fields that the
rule maps as well. Update the existing List() result checks in the relevant test
case to validate enabled, managed_source, created_at, and updated_at on the
returned row from store.List, using the same row variable already referenced in
the test so regressions in metadata preservation fail here too.
In `@internal/failover/store_sqlite.go`:
- Around line 104-113: The SQLite migration in the failover store is selecting
the legacy primary key value without normalizing it, so rows can be inserted
with leading/trailing spaces even though lookups trim inputs. Update the
migration query in the failover store logic (the code building the INSERT/SELECT
for failover_rules in store_sqlite.go) so the primary key expression is TRIMmed
before being stored, matching the normalization used by Get and Delete and the
TRIM filter already applied in the source query.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0dd8c12f-4b04-4ba5-a223-ce262e5afac1
📒 Files selected for processing (6)
internal/admin/dashboard/templates/model-table-body.htmlinternal/admin/dashboard/templates/shuffle-icon.htmlinternal/failover/store_mongodb.gointernal/failover/store_postgresql.gointernal/failover/store_sqlite.gointernal/failover/store_sqlite_test.go
| IF EXISTS ( | ||
| SELECT 1 FROM information_schema.columns | ||
| WHERE table_name = 'failover_rules' AND column_name = 'source' | ||
| ) AND NOT EXISTS ( | ||
| SELECT 1 FROM information_schema.columns | ||
| WHERE table_name = 'failover_rules' AND column_name = 'primary_model' | ||
| ) THEN | ||
| ALTER TABLE failover_rules RENAME COLUMN source TO primary_model; | ||
| END IF; | ||
| IF EXISTS ( | ||
| SELECT 1 FROM information_schema.columns | ||
| WHERE table_name = 'failover_rules' AND column_name = 'targets' | ||
| ) AND NOT EXISTS ( | ||
| SELECT 1 FROM information_schema.columns | ||
| WHERE table_name = 'failover_rules' AND column_name = 'fallback_models' | ||
| ) THEN | ||
| ALTER TABLE failover_rules RENAME COLUMN targets TO fallback_models; | ||
| END IF; | ||
| IF EXISTS ( | ||
| SELECT 1 FROM information_schema.columns | ||
| WHERE table_name = 'failover_rules' AND column_name = 'description' | ||
| ) THEN | ||
| ALTER TABLE failover_rules DROP COLUMN description; | ||
| END IF; |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
git ls-files internal/failover/store_postgresql.go && wc -l internal/failover/store_postgresql.go && sed -n '1,140p' internal/failover/store_postgresql.goRepository: ENTERPILOT/GoModel
Length of output: 4263
🏁 Script executed:
rg -n "failover_rules|search_path|current_schema|table_schema" -S .Repository: ENTERPILOT/GoModel
Length of output: 6112
Scope the migration checks to the active schema. The information_schema.columns probes should filter on table_schema = current_schema() so another schema’s failover_rules table can’t make this migration skip renaming source/targets on the table ALTER TABLE failover_rules will actually touch.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/failover/store_postgresql.go` around lines 40 - 63, Scope the
migration checks in the failover PostgreSQL migration so they only inspect the
active schema. In the migration logic that renames failover_rules columns in
store_postgresql.go, update each information_schema.columns existence check to
include table_schema = current_schema() alongside table_name and column_name, so
the ALTER TABLE failover_rules statements only depend on columns in the schema
being migrated.
| rows, err := store.List(context.Background()) | ||
| if err != nil { | ||
| t.Fatalf("List() error = %v", err) | ||
| } | ||
| if len(rows) != 1 { | ||
| t.Fatalf("List() returned %d rows, want 1", len(rows)) | ||
| } | ||
| if rows[0].Source != "gpt-4o" { | ||
| t.Fatalf("row source = %q, want gpt-4o", rows[0].Source) | ||
| } | ||
| wantTargets := []string{"azure/gpt-4o", "gemini/gemini-2.5-pro"} | ||
| if !reflect.DeepEqual(rows[0].Targets, wantTargets) { | ||
| t.Fatalf("row targets = %v, want %v", rows[0].Targets, wantTargets) | ||
| } |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win
Assert the metadata fields migrated with the rule.
This test catches source/targets regressions, but not enabled, managed_source, created_at, or updated_at, even though the migration explicitly maps those columns. Add assertions so metadata-preservation regressions fail here. As per coding guidelines, **/*_test.go: Add or update tests for behavior changes.
Proposed test assertions
if !reflect.DeepEqual(rows[0].Targets, wantTargets) {
t.Fatalf("row targets = %v, want %v", rows[0].Targets, wantTargets)
}
+ if !rows[0].Enabled {
+ t.Fatalf("row enabled = false, want true")
+ }
+ if rows[0].ManagedSource != ManagedSourceDashboard {
+ t.Fatalf("row managed source = %q, want %q", rows[0].ManagedSource, ManagedSourceDashboard)
+ }
+ if got := rows[0].CreatedAt.Unix(); got != 100 {
+ t.Fatalf("row created_at = %d, want 100", got)
+ }
+ if got := rows[0].UpdatedAt.Unix(); got != 200 {
+ t.Fatalf("row updated_at = %d, want 200", got)
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| rows, err := store.List(context.Background()) | |
| if err != nil { | |
| t.Fatalf("List() error = %v", err) | |
| } | |
| if len(rows) != 1 { | |
| t.Fatalf("List() returned %d rows, want 1", len(rows)) | |
| } | |
| if rows[0].Source != "gpt-4o" { | |
| t.Fatalf("row source = %q, want gpt-4o", rows[0].Source) | |
| } | |
| wantTargets := []string{"azure/gpt-4o", "gemini/gemini-2.5-pro"} | |
| if !reflect.DeepEqual(rows[0].Targets, wantTargets) { | |
| t.Fatalf("row targets = %v, want %v", rows[0].Targets, wantTargets) | |
| } | |
| rows, err := store.List(context.Background()) | |
| if err != nil { | |
| t.Fatalf("List() error = %v", err) | |
| } | |
| if len(rows) != 1 { | |
| t.Fatalf("List() returned %d rows, want 1", len(rows)) | |
| } | |
| if rows[0].Source != "gpt-4o" { | |
| t.Fatalf("row source = %q, want gpt-4o", rows[0].Source) | |
| } | |
| wantTargets := []string{"azure/gpt-4o", "gemini/gemini-2.5-pro"} | |
| if !reflect.DeepEqual(rows[0].Targets, wantTargets) { | |
| t.Fatalf("row targets = %v, want %v", rows[0].Targets, wantTargets) | |
| } | |
| if !rows[0].Enabled { | |
| t.Fatalf("row enabled = false, want true") | |
| } | |
| if rows[0].ManagedSource != ManagedSourceDashboard { | |
| t.Fatalf("row managed source = %q, want %q", rows[0].ManagedSource, ManagedSourceDashboard) | |
| } | |
| if got := rows[0].CreatedAt.Unix(); got != 100 { | |
| t.Fatalf("row created_at = %d, want 100", got) | |
| } | |
| if got := rows[0].UpdatedAt.Unix(); got != 200 { | |
| t.Fatalf("row updated_at = %d, want 200", got) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/failover/store_sqlite_test.go` around lines 51 - 64, The test in
store_sqlite_test.go currently verifies only Source and Targets, so add
assertions for the migrated metadata fields that the rule maps as well. Update
the existing List() result checks in the relevant test case to validate enabled,
managed_source, created_at, and updated_at on the returned row from store.List,
using the same row variable already referenced in the test so regressions in
metadata preservation fail here too.
Source: Coding guidelines
| SELECT | ||
| %s, | ||
| %s, | ||
| %s, | ||
| %s, | ||
| %s, | ||
| %s | ||
| FROM failover_rules | ||
| WHERE TRIM(COALESCE(%s, '')) <> '' | ||
| `, primaryExpr, targetsExpr, enabledExpr, managedSourceExpr, createdAtExpr, updatedAtExpr, primaryExpr) |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Trim the migrated primary key value before storing it.
Line 112 filters with TRIM(...), but Line 105 inserts the raw legacy value. A legacy source like " gpt-4o " will be migrated with spaces, while Get/Delete trim input and then miss the row.
Proposed fix
SELECT
- %s,
+ TRIM(%s),
%s,
%s,
%s,
%s,
%s📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| SELECT | |
| %s, | |
| %s, | |
| %s, | |
| %s, | |
| %s, | |
| %s | |
| FROM failover_rules | |
| WHERE TRIM(COALESCE(%s, '')) <> '' | |
| `, primaryExpr, targetsExpr, enabledExpr, managedSourceExpr, createdAtExpr, updatedAtExpr, primaryExpr) | |
| SELECT | |
| TRIM(%s), | |
| %s, | |
| %s, | |
| %s, | |
| %s, | |
| %s | |
| FROM failover_rules | |
| WHERE TRIM(COALESCE(%s, '')) <> '' | |
| `, primaryExpr, targetsExpr, enabledExpr, managedSourceExpr, createdAtExpr, updatedAtExpr, primaryExpr) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/failover/store_sqlite.go` around lines 104 - 113, The SQLite
migration in the failover store is selecting the legacy primary key value
without normalizing it, so rows can be inserted with leading/trailing spaces
even though lookups trim inputs. Update the migration query in the failover
store logic (the code building the INSERT/SELECT for failover_rules in
store_sqlite.go) so the primary key expression is TRIMmed before being stored,
matching the normalization used by Get and Delete and the TRIM filter already
applied in the source query.
There was a problem hiding this comment.
♻️ Duplicate comments (2)
internal/admin/dashboard/static/js/modules/failover.js (2)
77-82: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winUse
managed_sourcefor managed-rule UI state.The API/view contract here exposes
managed_source, notmanaged, so config-managed mappings are still treated as editable dashboard rules and shown asOninstead ofConfig. Normalize the managed flag frommanaged_sourcebeforeopenFailoverEdit()andfailoverRuleStatus()consume it.Also applies to: 363-375
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/admin/dashboard/static/js/modules/failover.js` around lines 77 - 82, The failover rule UI is reading the wrong field for managed state, so config-managed mappings are treated like editable dashboard rules. Update the failover flow to normalize and use managed_source instead of managed before openFailoverEdit() sets form state and before failoverRuleStatus() renders the status. Make sure the rule object consistently derives failoverFormManaged and the displayed label from managed_source so managed rules show as Config rather than On.
77-85: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick winRenaming the primary model in edit mode creates a second rule.
failoverFormOriginalSourceis captured but never used on save. The admin API upserts byprimary_model, so editingsourceleaves the old mapping active and persists a new one. Makesourceimmutable while editing, or delete/replace the original key before the PUT.Also applies to: 154-192
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/admin/dashboard/static/js/modules/failover.js` around lines 77 - 85, The edit flow in openFailoverEdit and the save path in failover.js are allowing primary model renames to create a second rule because failoverFormOriginalSource is captured but never applied. Update the edit/save logic so the original primary_model key is preserved or explicitly deleted/replaced before the PUT, and make the source field immutable while failoverFormMode is edit. Ensure the code paths that build the payload and submit the form use failoverFormOriginalSource consistently when editing.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@internal/admin/dashboard/static/js/modules/failover.js`:
- Around line 77-82: The failover rule UI is reading the wrong field for managed
state, so config-managed mappings are treated like editable dashboard rules.
Update the failover flow to normalize and use managed_source instead of managed
before openFailoverEdit() sets form state and before failoverRuleStatus()
renders the status. Make sure the rule object consistently derives
failoverFormManaged and the displayed label from managed_source so managed rules
show as Config rather than On.
- Around line 77-85: The edit flow in openFailoverEdit and the save path in
failover.js are allowing primary model renames to create a second rule because
failoverFormOriginalSource is captured but never applied. Update the edit/save
logic so the original primary_model key is preserved or explicitly
deleted/replaced before the PUT, and make the source field immutable while
failoverFormMode is edit. Ensure the code paths that build the payload and
submit the form use failoverFormOriginalSource consistently when editing.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 998194ca-8d28-4db5-8420-b3e20e6a23fe
📒 Files selected for processing (5)
internal/admin/dashboard/static/css/dashboard.cssinternal/admin/dashboard/static/js/modules/failover.jsinternal/admin/dashboard/static/js/modules/failover.test.cjsinternal/admin/dashboard/templates/failover-icon.htmlinternal/admin/dashboard/templates/model-table-body.html
Summary
Verification
gomodel/internal/admin/dashboard
gomodel/internal/anthropicapi
gomodel/internal/app
gomodel/internal/auditlog
gomodel/internal/authkeys
gomodel/internal/batch
gomodel/internal/batchrewrite
gomodel/internal/budget
gomodel/internal/cache
gomodel/internal/cache/modelcache
gomodel/internal/conversationstore
gomodel/internal/core
gomodel/internal/embedding
gomodel/internal/failover
gomodel/internal/fallback
gomodel/internal/filestore
gomodel/internal/gateway
gomodel/internal/guardrails
gomodel/internal/httpclient
gomodel/internal/live
gomodel/internal/llmclient
gomodel/internal/modeldata
gomodel/internal/modelselectors
gomodel/internal/observability
gomodel/internal/pricingoverrides
gomodel/internal/providers
gomodel/internal/providers/anthropic
gomodel/internal/providers/azure
gomodel/internal/providers/bailian
gomodel/internal/providers/bedrock
gomodel/internal/providers/deepseek
gomodel/internal/providers/gemini
gomodel/internal/providers/googlecommon
gomodel/internal/providers/groq
gomodel/internal/providers/minimax
gomodel/internal/providers/ollama
gomodel/internal/providers/openai
gomodel/internal/providers/opencodego
gomodel/internal/providers/openrouter
gomodel/internal/providers/oracle
gomodel/internal/providers/vertex
gomodel/internal/providers/vllm
gomodel/internal/providers/xai
gomodel/internal/providers/xiaomi
gomodel/internal/providers/zai
gomodel/internal/realtime
gomodel/internal/responsecache
gomodel/internal/responsestore
gomodel/internal/server
gomodel/internal/storage
gomodel/internal/storage/sqlutil
gomodel/internal/streaming
gomodel/internal/usage
gomodel/internal/validation
gomodel/internal/version
gomodel/internal/virtualmodels
gomodel/internal/workflows ./config/...
Summary by CodeRabbit
New Features
Bug Fixes