Shard-orchestrated agentic coding for VS Code — run capable coding agents on modest hardware, any OpenAI-compatible LLM, and full operational control.
NeuroCode is a VS Code extension that thinks before it calls the model. A local Node.js sidecar indexes your project, builds ranked context shards, and runs multi-step agent loops — while the extension host handles UI, editor integration, and safe file application.
Use any OpenAI-compatible gateway (LiteLLM, vLLM, RunPod proxy, OpenRouter, OpenAI) or Ollama locally. Embeddings stay on-device via Ollama (nomic-embed-text) for semantic search and drift detection.
- Why NeuroCode
- How it works
- Features
- Requirements
- Quick start
- Configuration
- Chat & agent modes
- Architecture
- Commands
- Project layout
- Development
- Troubleshooting
- Documentation
- Contributing
- License
Most assistants optimize for giant context windows and proprietary indexing. NeuroCode optimizes for small, explainable context and deployability:
| Principle | What it means |
|---|---|
| Shard-first | Only relevant files reach the LLM, ranked by priority and capped by token budget |
| Local brain | Indexing, dependency graph, embeddings, and orchestration run in a sidecar you control |
| Pluggable LLM | One config surface (apiBaseUrl, apiKey, model) — no vendor lock-in |
| Bounded agent memory | Agent steps rebuild a fixed-size prompt from session state — tokens stay flat across steps |
| Safe writes | Tool-loop edits go through staged write_file / search_replace; tool JSON is blocked from disk |
| Air-gap ready | Enforce Ollama-only operation for regulated environments |
You send a message (Auto / Ask / Plan / Edit / Agent)
│
▼
Sidecar indexes & assembles shards (or seeds paths from stack traces)
│
▼
LLM router (Auto) or mode pill picks execution path
│
├── Ask / Plan / Edit → single-stream chat with shard context
│
└── Auto / Agent → AgentToolLoop
│
├── read_file / search_code (sidecar disk + vectors)
├── search_replace (minimal line-level edits)
├── write_file (full file, when necessary)
└── reply (done)
│
▼
Extension applies staged writes (with validation) or shows diff review
Agent session model: Full file contents live in the sidecar session cache. Each LLM step receives a rebuilt prompt — task, short session log, and up to three file previews (~1.2k chars each) — so input tokens stay ~2–4k per step instead of growing linearly.
| Area | Capability |
|---|---|
| Chat | Cursor-style sidebar: Auto, Ask, Plan, Edit, Agent; model Auto/Manual; file attachments |
| Agent loop | read_file, search_code, search_replace, write_file, reply via fenced neurocode-tool blocks |
| Intent routing | LLM-first router in Auto mode (neurocode.chat.intentRouter: llm) with seed_paths from errors |
| Shards | Visualizer shows every file sent, reason tag, and token count |
| Change review | Accept / Reject per file, diff editor, Accept All |
| Planning | Multi-step DAG stored in SQLite; optional agent execution |
| Code review | Four parallel specialists (architect, security, performance, test) |
| Memory & drift | Project memory graph; semantic drift alerts after commits |
| Debug | Causal stack-trace analysis; attention heatmap in the gutter |
| Analytics | Per-response tokens, latency, thumbs up/down |
| Enterprise | Air-gap mode, optional RunPod lifecycle, Docker/K8s — see README-ENTERPRISE.md |
| Dependency | Version | Purpose |
|---|---|---|
| VS Code | ≥ 1.120 | Extension host |
| Node.js | ≥ 22.5 | Sidecar child process |
| Ollama | latest | Embeddings + local LLM mode |
| OpenAI-compatible gateway | optional | Cloud or self-hosted chat models |
ollama pull qwen2.5-coder:7b
ollama pull nomic-embed-textgit clone https://github.com/ShahjahanAli/neurocode.git
cd neurocode
npm install
npm run compile- Open the repo in VS Code and press F5 (Extension Development Host).
- Open a project folder in the new window.
- Run NeuroCode: Index Project (
Ctrl+Shift+P). - Open NeuroCode → Chat on the right sidebar, or press Ctrl+Shift+A.
Configure your LLM under Settings → neurocode (see Configuration).
{
"neurocode.llm.mode": "ollama",
"neurocode.llm.ollamaUrl": "http://localhost:11434",
"neurocode.llm.ollamaModel": "qwen2.5-coder:7b"
}{
"neurocode.llm.mode": "gateway",
"neurocode.llm.apiBaseUrl": "https://your-gateway.example.com/v1",
"neurocode.llm.apiKey": "your-bearer-token",
"neurocode.llm.model": "qwen/qwen3-coder",
"neurocode.llm.modelSelection": "auto",
"neurocode.llm.maxOutputTokens": 4096,
"neurocode.llm.fallbackToOllama": false
}{
"neurocode.ui.chatLocation": "right",
"neurocode.chat.mode": "auto",
"neurocode.chat.intentRouter": "llm",
"neurocode.chat.autoApply": true,
"neurocode.chat.autoContinue": true,
"neurocode.chat.fixOnCheck": true,
"neurocode.chat.agentToolMaxSteps": 8,
"neurocode.chat.maxAttachments": 5,
"neurocode.shard.maxTokens": 0,
"neurocode.indexing.autoIndex": true
}| Setting | Default | Description |
|---|---|---|
neurocode.llm.mode |
gateway |
gateway or ollama |
neurocode.llm.maxOutputTokens |
2048 |
Max completion tokens per LLM call; use 4096 for agent write_file |
neurocode.chat.intentRouter |
llm |
Auto mode routing: llm (recommended), hybrid, or heuristic |
neurocode.chat.autoApply |
true |
Apply staged agent writes and implement-mode code blocks |
neurocode.chat.agentToolMaxSteps |
10 |
Max tool iterations per agent request |
neurocode.shard.maxTokens |
0 |
0 = auto budget (3500 Ollama / 6000 gateway) |
Legacy keys (vllmUrl, openaiUrl, provider) map to the unified LLM settings automatically.
| Mode | Behavior | Writes files? |
|---|---|---|
| Auto | LLM router → agent tool loop; reads stack traces; search_replace for small fixes |
When router allows writes |
| Ask | Explain / review; investigate loop (read-only tools) | No |
| Plan | JSON task DAG in SQLite | No (until you execute steps) |
| Edit | Shard-aware implement stream + optional auto-continue | Yes if autoApply |
| Agent | Full tool loop with higher step budget | Yes if autoApply |
Model picker (Auto vs Manual) is separate from chat mode Auto — it selects which model id the gateway uses.
Respond with one fenced block per turn:
{"tool":"search_replace","args":{"path":"app/page.tsx","old_text":"...","new_text":"..."}}
| Tool | Purpose |
|---|---|
read_file |
Read project file (max ~6k chars per call) |
search_code |
Semantic + path search via local index |
search_replace |
Preferred for line-level fixes (low token cost) |
write_file |
Full file rewrite when most of the file changes |
reply |
End loop with a user-facing summary |
- Agent tool-loop responses are not parsed as markdown code blocks for auto-apply.
- Staged content is validated — tool-call JSON and truncated files are rejected before disk write.
- Use git to recover if a file was corrupted by an older build:
git checkout -- path/to/file.
flowchart TB
subgraph ext [VS Code Extension]
UI[React WebViews]
CP[ChatPanel]
SM[SidecarManager]
end
subgraph sc [Sidecar :39291]
IDX[Indexer + CodeGraph]
SH[ShardManager]
VEC[VectorStore]
ORCH[ChatOrchestrator]
AGT[AgentToolLoop + SessionState]
end
subgraph llm [LLM]
GW[OpenAI-compatible /v1]
OL[Ollama]
end
UI --> CP
CP -->|HTTP SSE| sc
sc -->|chat| GW
sc -->|ollama mode| OL
sc -->|embeddings| OL
SM --> sc
Three processes: extension host (UI + vscode.workspace.fs), sidecar (heavy logic + SQLite + vectra), LLM backend (pluggable).
| Command | Key | Description |
|---|---|---|
| NeuroCode: Ask Agent | Ctrl+Shift+A |
Shard-aware single-turn ask |
| NeuroCode: Review Code | Ctrl+Shift+R |
Four-agent parallel review |
| NeuroCode: Find Root Cause | Ctrl+Shift+D |
Causal debug from stack trace |
| NeuroCode: Index Project | — | Build index for shards & search |
| NeuroCode: Plan Multi-Step Task | — | Create task DAG |
| NeuroCode: Toggle Air-Gap Mode | — | Ollama-only enforced |
Sidebar tabs: Overview · Chat · Analytics · Tasks · Shards · Review · Memory · Drift · Genome · Debug
neurocode/
├── src/ Extension host (TypeScript)
├── webview-ui/ React sidebar (Vite)
├── sidecar/ Node agent server (:39291)
│ └── core/ ShardManager, ChatOrchestrator, AgentToolLoop, IntentRouter, …
├── BLUEPRINT.md Architecture & API contract
├── QUICK_REFERENCE.md
└── CHANGELOG.md
Per-project index data: .neurocode/ (local, gitignored by default).
npm run watch # Extension watch build
npm run build:webview # Webview only
npm run lint && npm run check-types
curl http://127.0.0.1:39291/healthPress F5 to launch the Extension Development Host. After sidecar changes, reload the window so the child process restarts.
| Problem | What to do |
|---|---|
| Sidecar failed | Node 22.5+ on PATH; check Output → NeuroCode |
| Empty shards / search | Ollama running; ollama pull nomic-embed-text |
| Gateway errors | apiBaseUrl must end with /v1; test GET /v1/models |
| Agent hits output limit | Raise neurocode.llm.maxOutputTokens to 4096; prefer search_replace |
| High input tokens per step | Reload after update — session rebuild should keep ~2–4k flat |
File contains {"tool":"write_file"... |
Corrupted by older build — git checkout -- <file> then retry |
| Chat stream ended without response | Reload extension; verify gateway key and sidecar health |
| Port in use | Change neurocode.sidecar.port (default 39291) |
| Document | Contents |
|---|---|
| BLUEPRINT.md | Full architecture, API contract, build order |
| QUICK_REFERENCE.md | Modes, tools, settings, token budgets |
| CHANGELOG.md | Release history |
| README-ENTERPRISE.md | Docker, Helm, air-gap deployment |
| CURSOR_PROMPTS.md | Phased implementation guide |
| .cursorrules | Contributor & AI agent conventions |
- Fork the repository and create a feature branch.
- Follow conventions in
.cursorrules. - Run
npm run compilebefore opening a PR. - Open an issue before large architectural changes.
Shahjahan Ali · ZMS Digital Solutions · Dhaka, Bangladesh
Related: HyperZ — AI-native enterprise SaaS framework
MIT © 2026 Shahjahan Ali / ZMS Digital Solutions — see LICENSE.