Building NodeRoom — a live room where humans and AI agents do high-trust research together, without clobbering each other.
A career, compiled: banking/finance → data engineering → agentic AI, converging on human-agent collaboration systems where the agent leaves receipts.
Meta · agentic QA (PQX) · JPMorgan · 3.5 yrs, credit + agentic-RAG over 100k+ docs · Ideaflow · Founder, NodeBench AI · UC Santa Barbara · full history ↗
| Repo | Layer | What it is |
|---|---|---|
| noderoom ⭐ | Current flagship | Live multi-panel room where humans + NodeAgents edit a shared spreadsheet, note, and post-it wall through one versioned concurrency model — lock → draft → smart-merge, no-clobber, per-element CAS. |
| nodebench-ai | Research engine | Entity intelligence for any company, market, or question — searches + synthesizes with sources, turns each run into a reusable artifact, and ships a hosted public-research MCP. |
| NodeAgent | Agent kernel | The distilled core of NodeBench — one loop, four tool UIs: live context, grounded/cited search, a versioned spreadsheet delta, and a TipTap notebook memo. |
| feature-walkthrough-gif | Proof / media | Playwright → Remotion → ffmpeg turns any feature into an annotated walkthrough GIF — and because it's scripted, the GIFs double as an integration smoke-test. |
| parity-studio | Visual QA | Image (or live app route) → verified componentized ui_kit, self-judged on a 16-check deterministic rubric with honest score drift before any agent touches production. |
| LLM-Prior-Authorization… | Regulated workflow | LLMs auto-fill prior-auth forms from patient notes — structured extraction, a validation pass, and an LLM-as-a-Judge eval that scores on clinical knowledge, not string match. |
Productivity infra: gmail-workspace-public (large inbox → one queue, one decision; private data stays local, public research delegated to NodeBench) · agent-workspace-template (reusable Convex/Next agent-workspace runtime).
flowchart LR
NB["NodeBench AI<br/>research / diligence engine<br/>sourced dossiers · MCP"]
NA["NodeAgent<br/>distilled agent kernel<br/>one loop · four tool UIs"]
NR["NodeRoom<br/>CURRENT FLAGSHIP<br/>live room · lock→draft→merge"]
PF["Proof<br/>reproducible walkthroughs<br/>that double as smoke-tests"]
NB -->|distill the core| NA
NA -->|put humans + agents in one room| NR
NR -->|ship review-ready artifacts| PF
style NR fill:#111,color:#fff,stroke:#111
Five capability buckets, each load-bearing in the work above:
- Banking & diligence — 3.5 yrs at JPMorgan: credit analysis (72 deals, ~$800M, 270 models) plus "LLMsuite," an agentic-RAG diligence tool over 100k+ documents. Turning messy research into structured, cited sheets and risk models — the reason NodeRoom is a War Room, not a toy.
- Data engineering — pipelines, schemas, reactive runtimes (Convex), durable streaming. The plumbing under every live room and report.
- Agentic AI & evals — agentic QA at Meta (PQX) and eval pipelines at Ideaflow: grounded search, tool loops, versioned model deltas, LLM-as-a-Judge scoring, scenario-based tests. Agents that get checked, not trusted — the harness matters more than the model.
- Healthcare / regulated workflows — prior-auth auto-fill with validation + eval: structured extraction where being wrong has consequences.
- Product engineering — Next.js / React / TS surfaces, UI parity harnesses, reproducible demos. The artifacts people actually click.
Multiple humans and multiple NodeAgents research companies in one live room and enrich a shared diligence sheet together:
- Agents claim an affected-range lock (still readable as context), a blocked agent drafts around it, and on unlock the draft smart-merges — committed human edits are never clobbered. Every edit carries a per-element version (CAS).
- Findings stream into the sheet, the note panel, and the post-it wall — no refresh; server-led agent work reaches every client (e.g. the live
Q3DEMOroom,/ask reconcile Q3 revenuefilling a variance column). - Runs two modes from the same code: a deterministic no-key in-memory engine + scripted agents (
npm run demo), and Live with a real Convex reactive backend + a model-routed LLM agent (routes promoted by ladder evidence, not provider brand). - Ends with downstream-ready review artifacts: company brief, runway chart, open-questions list.
People + agents + artifacts + evidence + review + shareability.
📚 Selected earlier work — the arc that compiled into the systems above
| Project | Signal |
|---|---|
| Banking assistant | Finance-document assistant for company/PDF analysis — the diligence reflex, pre-NodeBench. |
| openai-agent-eval-framework | Agent evaluation for classification, context verification, and pruning — the eval discipline, early. |
| CosmaNeura med billing | ICD/CPT recommendation from physician dictation — regulated extraction before the prior-auth system. |
| FluencyMed | Early healthcare AI workflow prototype. |
| voice_email_agent | Email ingestion, summarization, embeddings, voice query — the seed of the Gmail workspace. |
The agent should leave receipts. Sources on every claim, a version on every edit, an eval on every answer, and a demo anyone can reproduce. High-trust work doesn't get faster by trusting the model more — it gets faster by making the model checkable.
📫 LinkedIn · hshum2018@gmail.com




