Skip to content

testmd run reports overall_status=passed + exit 0 while a step FAILED (false green) — readonly_fallback and pure-replay #93

@komal-lt

Description

@komal-lt

Summary

testmd run reports overall_status: "passed" and exit code 0 while a step actually FAILED (steps.failed >= 1, steps.passed = 0). A genuinely failing test is reported as green — for a CI gate this is the worst failure mode (a real regression passes silently). Present in 0.4.1; readonly_fallback path still in 0.4.4.

Mechanism

After a checkpoint analyzer fails on replay (see #85), kane hits a readonly lock conflict and falls back to a readonly_fallback, and in that path the failed run is mislabeled passed:

"reason":"analyzer_failed: @ step 7"      # a checkpoint flakes on replay (#85)
[lock] concurrent session — running in readonly mode (no commit)
replay_decisions:2  author_decisions:0     # PURE REPLAY — no re-author, no commit attempted
"reason":"readonly_fallback"
=> {"type":"test_md_summary","overall_status":"passed","steps":{"passed":0,"failed":1,"skipped":1}}, exit 0

Important (corrected from an earlier version of this report): this is not caused by concurrent CI jobs. Serializing the matrix to max-parallel: 1 reproduces it identically, and the trace shows author_decisions:0 — a pure single-threaded replay with no commit attempted still hits the "concurrent session" readonly lock. So the lock detection appears spurious/stale (no real second session), and a pure replay should not be touching a write/session lock at all.

Repro

  1. A *_test.md with a visual checkpoint that flakes on replay (Multi-checkpoint step (multiple ANALYZE + combining ASSERT) authors green but fails deterministically on replay with null assert values #85).
  2. kane-cli testmd run <file> --agent --headless (single run, no concurrency).
  3. When the checkpoint fails on a replay, the run logs readonly_fallback and emits overall_status:"passed" with steps.failed:1, steps.passed:0, exit 0.

Expected

If any non-optional step fails (steps.failed > 0) or none passed (steps.passed < 1), overall_status must be "failed" and the exit code non-zero. A readonly_fallback must never upgrade a failed run to passed. (Separately: a pure replay shouldn't acquire/conflict on a write lock, and the "concurrent session" detection shouldn't fire with no second session active.)

Impact

High — overall_status + exit code are what CI keys on. We've had to parse test_md_summary and fail the job whenever steps.failed > 0 || steps.passed < 1, ignoring kane's own status.

Env

kane-cli 0.4.1, testmd run --agent --headless [--retry], single run and CI matrix (both). Related: #85 (the analyzer flake that triggers this).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions