Suggested threat: Silent message loss on degraded WebSocket connections

## Suggested threat: Silent message loss on degraded WebSocket connections

### Attack scenario

This is an availability/reliability threat rather than a confidentiality or integrity one. Under degraded network conditions, completed agent responses can silently fail to render in the Control UI — with no error indication to the user.

**Exploitation is passive** — no attacker action required. Any situation that causes `loadChatHistory()` to race poorly triggers it: flaky WiFi, high-latency gateway, backgrounded browser tab (reduced timer resolution), or gateway under load.

### How it works

1. Agent completes a run successfully (gateway logs `run done`, `aborted=false`)
2. Gateway sends `state="final"` event over WebSocket to the browser
3. Browser receives the event and clears streaming state (`chatStream = null`), so the typing indicator disappears
4. **But the final message is never committed to `chatMessages`** — the `state="final"` code path in `handleChatEvent()` does not append the message synchronously
5. The message only appears after an async `loadChatHistory()` RPC completes and replaces the entire `chatMessages` array
6. If that RPC is slow, fails, or races with another state update, the message is silently lost until the next event triggers a fresh `loadChatHistory()` call

The user sees: typing indicator disappears, no message appears, no error shown. The response exists server-side but is invisible client-side.

### Affected components

- **Control UI** (`ui/src/ui/controllers/chat.ts`, `ui/src/ui/app-gateway.ts`)
- **WebSocket event handling** — the asymmetry between `state="final"` (no commit) and `state="aborted"` (synchronous commit) paths

### Why the server can't compensate

The gateway's WebSocket keepalive is application-level only (30-second tick broadcasts). The tick watchdog runs on the **client side** — if the browser's network drops, the server has no mechanism to detect a stale connection. The server continues broadcasting into a TCP socket that hasn't been closed yet (OS-level TCP keepalive timeout is typically 2+ hours).

This means the server cannot detect when a client has missed a `state="final"` event and proactively resend it.

### Severity assessment

**Medium.** No data is lost (the response exists server-side and appears on next full history reload), no confidentiality or integrity impact, but the user experience failure is complete — the user believes the agent didn't respond when it did. In time-sensitive interactions this could cause cascading confusion (user sends follow-ups, agent responds to those first, messages appear out of order).

### Real-world evidence

- Reproduced in a containerized deployment during intermittent connectivity drops — 59-second gap between successful agent response and message rendering
- Existing issue: openclaw/openclaw#9183
- Existing fix PR: openclaw/openclaw#16767 (synchronous optimistic commit + async fallback)

### Suggested MITRE ATLAS mapping

Closest fit: **Impact** category — service disruption / degraded user experience. Not a traditional attack vector but a reliability gap that the threat model's availability coverage should address.

---
*Analysis assisted by Claude.*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Suggested threat: Silent message loss on degraded WebSocket connections #8