-
-
Notifications
You must be signed in to change notification settings - Fork 21
Description
Suggested threat: Silent message loss on degraded WebSocket connections
Attack scenario
This is an availability/reliability threat rather than a confidentiality or integrity one. Under degraded network conditions, completed agent responses can silently fail to render in the Control UI — with no error indication to the user.
Exploitation is passive — no attacker action required. Any situation that causes loadChatHistory() to race poorly triggers it: flaky WiFi, high-latency gateway, backgrounded browser tab (reduced timer resolution), or gateway under load.
How it works
- Agent completes a run successfully (gateway logs
run done,aborted=false) - Gateway sends
state="final"event over WebSocket to the browser - Browser receives the event and clears streaming state (
chatStream = null), so the typing indicator disappears - But the final message is never committed to
chatMessages— thestate="final"code path inhandleChatEvent()does not append the message synchronously - The message only appears after an async
loadChatHistory()RPC completes and replaces the entirechatMessagesarray - If that RPC is slow, fails, or races with another state update, the message is silently lost until the next event triggers a fresh
loadChatHistory()call
The user sees: typing indicator disappears, no message appears, no error shown. The response exists server-side but is invisible client-side.
Affected components
- Control UI (
ui/src/ui/controllers/chat.ts,ui/src/ui/app-gateway.ts) - WebSocket event handling — the asymmetry between
state="final"(no commit) andstate="aborted"(synchronous commit) paths
Why the server can't compensate
The gateway's WebSocket keepalive is application-level only (30-second tick broadcasts). The tick watchdog runs on the client side — if the browser's network drops, the server has no mechanism to detect a stale connection. The server continues broadcasting into a TCP socket that hasn't been closed yet (OS-level TCP keepalive timeout is typically 2+ hours).
This means the server cannot detect when a client has missed a state="final" event and proactively resend it.
Severity assessment
Medium. No data is lost (the response exists server-side and appears on next full history reload), no confidentiality or integrity impact, but the user experience failure is complete — the user believes the agent didn't respond when it did. In time-sensitive interactions this could cause cascading confusion (user sends follow-ups, agent responds to those first, messages appear out of order).
Real-world evidence
- Reproduced in a containerized deployment during intermittent connectivity drops — 59-second gap between successful agent response and message rendering
- Existing issue: [Bug]: Control UI fails to render new messages after successful chat.history WebSocket response openclaw#9183
- Existing fix PR: fix: auto-resync webchat on reconnect and prevent message flicker on stream complete openclaw#16767 (synchronous optimistic commit + async fallback)
Suggested MITRE ATLAS mapping
Closest fit: Impact category — service disruption / degraded user experience. Not a traditional attack vector but a reliability gap that the threat model's availability coverage should address.
Analysis assisted by Claude.