A FastAPI-based search service and unified CLI for the NCT Precision Oncology Thesaurus (NCT-POT) Drugs dataset. It provides searchable endpoints for drugs, molecular targets, biomarkers, drug classes, and approved indications — with automatic data download on first run.
- 🔍 Drug Search – Find drugs by name or synonym
- 🧬 Target Search – Discover all drugs targeting a specific gene/protein (e.g.,
KRAS,EGFR,ALK) - 💊 Drug Class Search – Browse by therapeutic class (e.g., checkpoint inhibitor, tyrosine kinase inhibitor)
- 🧪 Biomarker Search – Query approved indications by biomarker (e.g.,
KRAS G12C,BRAF V600E) with optional cancer-type filtering - ⚗️ Combination Search – Identify drug pairs for clinical-trial design
- 📋 Full Drug Details – Complete relational data for any drug ID
- ⬇️ Auto-Download – TSV data is fetched automatically from the upstream repository on first startup
- 📖 Interactive Docs – Auto-generated OpenAPI/Swagger UI at
/docs - 🖥️ Unified CLI – One command-line tool for server, search, tests, notebooks, and data management
git clone <your-repo-url>
cd nct-pot
docker-compose up --build -d
# Wait ~20 s for data download, then open interactive docs
open http://localhost:8000/docs# 1. Install dependencies
pip install -r requirements.txt
# Or install the package in editable mode
pip install -e ".[notebook]"
# 2. Download data
nct-pot download --output-dir ./data
# 3. Start the API server
nct-pot serve
# Or run tests
nct-pot testThe nct-pot command is the primary interface for everything in this project.
nct-pot --help| Command | Description | Example |
|---|---|---|
serve |
Start the FastAPI server | nct-pot serve --port 8000 |
search drug <query> |
Search drugs by name or synonym | nct-pot search drug sotorasib |
search target <sym> |
Search by target gene/protein | nct-pot search target KRAS |
search class <name> |
Search by drug class | nct-pot search class "kinase inhibitor" |
search biomarker <name> |
Search by biomarker | nct-pot search biomarker "KRAS G12C" --cancer-type NSCLC |
drug <id> |
Get full details for a drug ID | nct-pot drug DRUG-000001 |
download |
Download NCT-POT TSV data | nct-pot download --output-dir ./data |
test |
Run the pytest suite | nct-pot test -v |
notebook |
Launch Jupyter notebook | nct-pot notebook |
stats |
Show database statistics | nct-pot stats |
mcp |
Start the MCP server | nct-pot mcp |
All commands also work via python -m nct_pot.cli <command> ....
Two transport modes are supported:
The FastAPI server (nct-pot serve) exposes an MCP endpoint at /mcp. Use any MCP HTTP client (e.g., mcp-remote) to connect.
# Terminal 1 — start the API server
nct-pot serveLM Studio config (using mcp-remote):
{
"mcpServers": {
"nct-pot": {
"command": "mcp-remote",
"args": [
"http://localhost:8000/mcp",
"--allow-http"
]
}
}
}A dedicated stdio MCP server that AI assistants can spawn directly.
nct-pot mcpClaude Desktop config:
{
"mcpServers": {
"nct-pot": {
"command": "/Users/rahul/miniforge3/bin/nct-pot",
"args": ["mcp"]
}
}
}Exposed tools:
search_drugs(query, limit)— Search drugs by name or synonymsearch_by_target(target_symbol)— Find drugs targeting a gene/proteinsearch_by_drug_class(class_name)— Find drugs by therapeutic classsearch_by_biomarker(biomarker, cancer_type)— Search by biomarkersearch_combinations(class1, class2)— Find drug combinationsget_drug_details(drug_id)— Get full drug detailsget_stats()— Database statistics
# Start server on custom port
nct-pot serve --host 0.0.0.0 --port 8080
# Search for a drug
nct-pot search drug "nivolumab"
# Find all drugs targeting EGFR
nct-pot search target EGFR
# Find drugs by class
nct-pot search class "checkpoint inhibitor"
# Biomarker-driven query with cancer filter
nct-pot search biomarker "BRAF V600E" --cancer-type "Melanoma"
# Get complete drug record
nct-pot drug DRUG-000001
# Check what is loaded in the database
nct-pot stats
# Run the full test suite
nct-pot test
# Launch the exploratory notebook
nct-pot notebookOnce the server is running (via nct-pot serve or Docker), the following REST endpoints are available:
| Endpoint | Method | Parameters | Description |
|---|---|---|---|
/ |
GET | — | API info + link to /docs |
/search/drugs |
GET | q (required), limit (default 20) |
Fuzzy search by drug name or synonym |
/search/target/{target_symbol} |
GET | target_symbol (path) |
Drugs targeting a gene/protein |
/search/drug-class/{class_name} |
GET | class_name (path) |
Drugs belonging to a class |
/search/biomarker |
GET | biomarker (required), cancer_type (optional) |
Approved indications by biomarker |
/search/combinations |
GET | class1 (required), class2 (required) |
Drug pairs for combination trials |
/drug/{drug_id} |
GET | drug_id (path) |
Full relational details for a drug |
/stats |
GET | — | Row counts for all loaded tables |
# Drug search
curl "http://localhost:8000/search/drugs?q=sotorasib"
# Target landscape
curl "http://localhost:8000/search/target/KRAS"
# Biomarker + cancer type
curl "http://localhost:8000/search/biomarker?biomarker=KRAS%20G12C&cancer_type=NSCLC"
# Combination trial design
curl "http://localhost:8000/search/combinations?class1=checkpoint%20inhibitor&class2=FGFR2%20inhibitor"
# Full drug record
curl "http://localhost:8000/drug/DRUG-000001"You can import the data store directly into your own scripts:
from nct_pot.datastore import NCTPOTDataStore
# Initialize store
store = NCTPOTDataStore(data_dir="./data")
# Search drugs
results = store.search_drugs("sotorasib", limit=10)
# Search by target
target_results = store.search_by_target("KRAS")
# Search by biomarker
biomarker_results = store.search_by_biomarker("BRAF V600E", cancer_type="Melanoma")
# Full details
details = store.get_drug_details("DRUG-000001")An exploratory Jupyter notebook is included in notebooks/:
# Launch via CLI
nct-pot notebookGenerated plots and exports are saved to images/.
The relational schema mirrors the NCT-POT publication:
drug ──┬── drug_drugClass ── drugClass ── drugClass_drugTarget ──┐
│ drugTarget
├── drug_drugTarget ──────────────────────────────────────┘
│
└── drug_drugProduct ── drugProduct ── drugProduct_manufacturer ── manufacturer
approval ──┬── approval_oncotree ── oncotree (cancer types)
└── approval_approvalBiomarker ── approvalBiomarker
| File | Rows (typical) | Description |
|---|---|---|
drug.tsv |
~881 | Drug entries (names, synonyms, NCIT IDs) |
drugClass.tsv |
~367 | Hierarchical drug classification |
drugTarget.tsv |
~706 | Molecular targets (HGNC genes, proteins) |
drugProduct.tsv |
~31 | Commercial products (trade names) |
approval.tsv |
~18 | FDA/EMA approval records |
approvalBiomarker.tsv |
~8 | Approval-relevant biomarkers |
manufacturer.tsv |
~50 | Pharmaceutical manufacturers |
drug_drugClass.tsv |
~1 109 | Drug-to-class mappings |
drug_drugTarget.tsv |
~1 074 | Drug-to-target mappings |
drug_drugProduct.tsv |
~31 | Drug-to-product mappings |
drugClass_drugTarget.tsv |
~474 | Class-to-target mappings |
approval_oncotree.tsv |
~20 | Approval-to-cancer-type mappings |
approval_approvalBiomarker.tsv |
~10 | Approval-to-biomarker mappings |
drugProduct_manufacturer.tsv |
~31 | Product-to-manufacturer mappings |
nct-pot/
├── nct_pot/ # Python package
│ ├── __init__.py
│ ├── models.py # Dataclasses (Drug, DrugClass, ...)
│ ├── datastore.py # NCTPOTDataStore loader + search
│ ├── api.py # FastAPI application
│ ├── cli.py # Unified CLI entrypoint
│ └── export/
│ ├── extension.py # BulkExporter (CSV/TSV/JSON)
│ └── bulk_cli.py # Standalone bulk-export CLI
├── scripts/
│ ├── download_data.sh # Automatic TSV downloader
│ ├── entrypoint.sh # Docker startup script
│ ├── setup.sh # Local setup helper
│ └── merge_api.sh # Utility script
├── notebooks/
│ └── NCT_POT_Drugs_Analysis.ipynb
├── images/ # Generated plots (.gitkeep)
├── data/ # TSV data directory (gitignored)
├── docker/
│ └── Dockerfile
├── tests/ # Pytest suite (46 tests)
│ ├── conftest.py
│ ├── test_models.py
│ ├── test_datastore.py
│ ├── test_api.py
│ └── fixtures/ # Minimal test dataset
├── docker-compose.yml
├── requirements.txt
├── pyproject.toml
├── README.md
└── .gitignore
The Dockerfile is based on python:3.11-slim and includes:
- System deps:
gcc,curl,wget,git - Python deps: FastAPI, Uvicorn, Pandas, Pydantic
- Entrypoint:
scripts/entrypoint.shchecks for./data/drug.tsv; if missing, it runsscripts/download_data.shto fetch all 14 TSV files before starting Uvicorn. - Health check:
GET /statsevery 30 s - Port:
8000
# Start
docker-compose up --build -d
# Stop
docker-compose downData downloaded into ./data/ persists across restarts via a bind mount.
The underlying dataset is the NCT Precision Oncology Thesaurus Drugs from the National Center for Tumor Diseases (NCT) Heidelberg.
- Original source:
TMO-HD/NCT-POT - Download mirror used by this project:
Takshan/NCT-POT
Important: The data is released under CC-BY-NC 4.0 International. You must:
- Use it for non-commercial purposes only
- Email the TMO Heidelberg team before using the content
- Not use it for direct clinical decision-making without expert verification
If you use this data or API in research, cite:
Kreutzfeldt S, Knurr A, Hübschmann D, Horak P, Fröhling S. NCT Precision Oncology Thesaurus Drugs — a Curated Database for Drugs, Drug Classes, and Drug Targets in Precision Cancer Medicine. medRxiv 2022.09.11.22279783. doi:10.1101/2022.09.11.22279783
| Symptom | Cause | Fix |
|---|---|---|
SyntaxError: illegal target for annotation |
Invalid @ in dataclass field names |
Already patched — pull latest code |
Container restarts with unexpected EOF |
Malformed inline entrypoint.sh in Dockerfile |
Already patched — entrypoint.sh is now a standalone file |
Read-only file system during download |
docker-compose.yml had :ro mount |
Already patched — volume is now writable |
KeyError: 'id_oncotree' |
Wrong column name in biomarker search | Already patched — uses code_oncotree |
✗ (download failed) for all files |
Wrong GitHub org or missing /tsv/ path |
Already patched — points to Takshan/NCT-POT (mirror of TMO-HD/NCT-POT) |
Only drug.tsv downloads, then script exits |
set -e + bash arithmetic ((0)) returns 1 |
Already patched — set -e removed from download_data.sh |
- API Code: MIT
- NCT-POT Data: CC-BY-NC 4.0 International (non-commercial use only)
- Original data repository:
TMO-HD/NCT-POT - Download mirror:
Takshan/NCT-POT - API issues: Open an issue in this repository