Skip to content

Suggested threat: Silent message loss on degraded WebSocket connections #8

@kleddbot

Description

@kleddbot

Suggested threat: Silent message loss on degraded WebSocket connections

Attack scenario

This is an availability/reliability threat rather than a confidentiality or integrity one. Under degraded network conditions, completed agent responses can silently fail to render in the Control UI — with no error indication to the user.

Exploitation is passive — no attacker action required. Any situation that causes loadChatHistory() to race poorly triggers it: flaky WiFi, high-latency gateway, backgrounded browser tab (reduced timer resolution), or gateway under load.

How it works

  1. Agent completes a run successfully (gateway logs run done, aborted=false)
  2. Gateway sends state="final" event over WebSocket to the browser
  3. Browser receives the event and clears streaming state (chatStream = null), so the typing indicator disappears
  4. But the final message is never committed to chatMessages — the state="final" code path in handleChatEvent() does not append the message synchronously
  5. The message only appears after an async loadChatHistory() RPC completes and replaces the entire chatMessages array
  6. If that RPC is slow, fails, or races with another state update, the message is silently lost until the next event triggers a fresh loadChatHistory() call

The user sees: typing indicator disappears, no message appears, no error shown. The response exists server-side but is invisible client-side.

Affected components

  • Control UI (ui/src/ui/controllers/chat.ts, ui/src/ui/app-gateway.ts)
  • WebSocket event handling — the asymmetry between state="final" (no commit) and state="aborted" (synchronous commit) paths

Why the server can't compensate

The gateway's WebSocket keepalive is application-level only (30-second tick broadcasts). The tick watchdog runs on the client side — if the browser's network drops, the server has no mechanism to detect a stale connection. The server continues broadcasting into a TCP socket that hasn't been closed yet (OS-level TCP keepalive timeout is typically 2+ hours).

This means the server cannot detect when a client has missed a state="final" event and proactively resend it.

Severity assessment

Medium. No data is lost (the response exists server-side and appears on next full history reload), no confidentiality or integrity impact, but the user experience failure is complete — the user believes the agent didn't respond when it did. In time-sensitive interactions this could cause cascading confusion (user sends follow-ups, agent responds to those first, messages appear out of order).

Real-world evidence

Suggested MITRE ATLAS mapping

Closest fit: Impact category — service disruption / degraded user experience. Not a traditional attack vector but a reliability gap that the threat model's availability coverage should address.


Analysis assisted by Claude.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions