Homen Shum HomenShum

👋 Hi, I'm Homen Shum

Building NodeRoom — a live room where humans and AI agents do high-trust research together, without clobbering each other.

A career, compiled: banking/finance → data engineering → agentic AI, converging on human-agent collaboration systems where the agent leaves receipts.

_{Meta · agentic QA (PQX) · JPMorgan · 3.5 yrs, credit + agentic-RAG over 100k+ docs · Ideaflow · Founder, NodeBench AI · UC Santa Barbara · full history ↗}

🗺️ The system map — one lineage, not scattered repos

Repo	Layer	What it is
noderoom ⭐	Current flagship	Live multi-panel room where humans + NodeAgents edit a shared spreadsheet, note, and post-it wall through one versioned concurrency model — `lock → draft → smart-merge`, no-clobber, per-element CAS.
nodebench-ai	Research engine	Entity intelligence for any company, market, or question — searches + synthesizes with sources, turns each run into a reusable artifact, and ships a hosted public-research MCP.
NodeAgent	Agent kernel	The distilled core of NodeBench — one loop, four tool UIs: live context, grounded/cited search, a versioned spreadsheet delta, and a TipTap notebook memo.
feature-walkthrough-gif	Proof / media	Playwright → Remotion → ffmpeg turns any feature into an annotated walkthrough GIF — and because it's scripted, the GIFs double as an integration smoke-test.
parity-studio	Visual QA	Image (or live app route) → verified componentized `ui_kit`, self-judged on a 16-check deterministic rubric with honest score drift before any agent touches production.
LLM-Prior-Authorization…	Regulated workflow	LLMs auto-fill prior-auth forms from patient notes — structured extraction, a validation pass, and an LLM-as-a-Judge eval that scores on clinical knowledge, not string match.

_{Productivity infra: gmail-workspace-public (large inbox → one queue, one decision; private data stays local, public research delegated to NodeBench) · agent-workspace-template (reusable Convex/Next agent-workspace runtime).}

🧬 The lineage

flowchart LR
    NB["NodeBench AI<br/>research / diligence engine<br/>sourced dossiers · MCP"]
    NA["NodeAgent<br/>distilled agent kernel<br/>one loop · four tool UIs"]
    NR["NodeRoom<br/>CURRENT FLAGSHIP<br/>live room · lock→draft→merge"]
    PF["Proof<br/>reproducible walkthroughs<br/>that double as smoke-tests"]

    NB -->|distill the core| NA
    NA -->|put humans + agents in one room| NR
    NR -->|ship review-ready artifacts| PF

    style NR fill:#111,color:#fff,stroke:#111

🛠️ A career, compiled

Five capability buckets, each load-bearing in the work above:

Banking & diligence — 3.5 yrs at JPMorgan: credit analysis (72 deals, ~$800M, 270 models) plus "LLMsuite," an agentic-RAG diligence tool over 100k+ documents. Turning messy research into structured, cited sheets and risk models — the reason NodeRoom is a War Room, not a toy.
Data engineering — pipelines, schemas, reactive runtimes (Convex), durable streaming. The plumbing under every live room and report.
Agentic AI & evals — agentic QA at Meta (PQX) and eval pipelines at Ideaflow: grounded search, tool loops, versioned model deltas, LLM-as-a-Judge scoring, scenario-based tests. Agents that get checked, not trusted — the harness matters more than the model.
Healthcare / regulated workflows — prior-auth auto-fill with validation + eval: structured extraction where being wrong has consequences.
Product engineering — Next.js / React / TS surfaces, UI parity harnesses, reproducible demos. The artifacts people actually click.

🎯 Current flagship demo — NodeRoom: Live Startup Diligence War Room

Multiple humans and multiple NodeAgents research companies in one live room and enrich a shared diligence sheet together:

Agents claim an affected-range lock (still readable as context), a blocked agent drafts around it, and on unlock the draft smart-merges — committed human edits are never clobbered. Every edit carries a per-element version (CAS).
Findings stream into the sheet, the note panel, and the post-it wall — no refresh; server-led agent work reaches every client (e.g. the live Q3DEMO room, /ask reconcile Q3 revenue filling a variance column).
Runs two modes from the same code: a deterministic no-key in-memory engine + scripted agents (npm run demo), and Live with a real Convex reactive backend + a model-routed LLM agent (routes promoted by ladder evidence, not provider brand).
Ends with downstream-ready review artifacts: company brief, runway chart, open-questions list.

People + agents + artifacts + evidence + review + shareability.

📚 Selected earlier work — the arc that compiled into the systems above

Project	Signal
Banking assistant	Finance-document assistant for company/PDF analysis — the diligence reflex, pre-NodeBench.
openai-agent-eval-framework	Agent evaluation for classification, context verification, and pruning — the eval discipline, early.
CosmaNeura med billing	ICD/CPT recommendation from physician dictation — regulated extraction before the prior-auth system.
FluencyMed	Early healthcare AI workflow prototype.
voice_email_agent	Email ingestion, summarization, embeddings, voice query — the seed of the Gmail workspace.

💡 What I care about

The agent should leave receipts. Sources on every claim, a version on every edit, an eval on every answer, and a demo anyone can reproduce. High-trust work doesn't get faster by trusting the model more — it gets faster by making the model checkable.

📫 LinkedIn · hshum2018@gmail.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Homen Shum HomenShum

Achievements