Skip to content

Takshan/NCT-POT-API

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NCT-POT Drugs Search API

A FastAPI-based search service and unified CLI for the NCT Precision Oncology Thesaurus (NCT-POT) Drugs dataset. It provides searchable endpoints for drugs, molecular targets, biomarkers, drug classes, and approved indications — with automatic data download on first run.

Features

  • 🔍 Drug Search – Find drugs by name or synonym
  • 🧬 Target Search – Discover all drugs targeting a specific gene/protein (e.g., KRAS, EGFR, ALK)
  • 💊 Drug Class Search – Browse by therapeutic class (e.g., checkpoint inhibitor, tyrosine kinase inhibitor)
  • 🧪 Biomarker Search – Query approved indications by biomarker (e.g., KRAS G12C, BRAF V600E) with optional cancer-type filtering
  • ⚗️ Combination Search – Identify drug pairs for clinical-trial design
  • 📋 Full Drug Details – Complete relational data for any drug ID
  • ⬇️ Auto-Download – TSV data is fetched automatically from the upstream repository on first startup
  • 📖 Interactive Docs – Auto-generated OpenAPI/Swagger UI at /docs
  • 🖥️ Unified CLI – One command-line tool for server, search, tests, notebooks, and data management

Quick Start

Docker (Recommended)

git clone <your-repo-url>
cd nct-pot
docker-compose up --build -d

# Wait ~20 s for data download, then open interactive docs
open http://localhost:8000/docs

Local Installation

# 1. Install dependencies
pip install -r requirements.txt

# Or install the package in editable mode
pip install -e ".[notebook]"

# 2. Download data
nct-pot download --output-dir ./data

# 3. Start the API server
nct-pot serve

# Or run tests
nct-pot test

CLI Reference

The nct-pot command is the primary interface for everything in this project.

nct-pot --help

Commands

Command Description Example
serve Start the FastAPI server nct-pot serve --port 8000
search drug <query> Search drugs by name or synonym nct-pot search drug sotorasib
search target <sym> Search by target gene/protein nct-pot search target KRAS
search class <name> Search by drug class nct-pot search class "kinase inhibitor"
search biomarker <name> Search by biomarker nct-pot search biomarker "KRAS G12C" --cancer-type NSCLC
drug <id> Get full details for a drug ID nct-pot drug DRUG-000001
download Download NCT-POT TSV data nct-pot download --output-dir ./data
test Run the pytest suite nct-pot test -v
notebook Launch Jupyter notebook nct-pot notebook
stats Show database statistics nct-pot stats
mcp Start the MCP server nct-pot mcp

All commands also work via python -m nct_pot.cli <command> ....

MCP Server

Two transport modes are supported:

1. HTTP/SSE (recommended for LM Studio)

The FastAPI server (nct-pot serve) exposes an MCP endpoint at /mcp. Use any MCP HTTP client (e.g., mcp-remote) to connect.

# Terminal 1 — start the API server
nct-pot serve

LM Studio config (using mcp-remote):

{
  "mcpServers": {
    "nct-pot": {
      "command": "mcp-remote",
      "args": [
        "http://localhost:8000/mcp",
        "--allow-http"
      ]
    }
  }
}

2. stdio (recommended for Claude Desktop)

A dedicated stdio MCP server that AI assistants can spawn directly.

nct-pot mcp

Claude Desktop config:

{
  "mcpServers": {
    "nct-pot": {
      "command": "/Users/rahul/miniforge3/bin/nct-pot",
      "args": ["mcp"]
    }
  }
}

Exposed tools:

  • search_drugs(query, limit) — Search drugs by name or synonym
  • search_by_target(target_symbol) — Find drugs targeting a gene/protein
  • search_by_drug_class(class_name) — Find drugs by therapeutic class
  • search_by_biomarker(biomarker, cancer_type) — Search by biomarker
  • search_combinations(class1, class2) — Find drug combinations
  • get_drug_details(drug_id) — Get full drug details
  • get_stats() — Database statistics

CLI Examples

# Start server on custom port
nct-pot serve --host 0.0.0.0 --port 8080

# Search for a drug
nct-pot search drug "nivolumab"

# Find all drugs targeting EGFR
nct-pot search target EGFR

# Find drugs by class
nct-pot search class "checkpoint inhibitor"

# Biomarker-driven query with cancer filter
nct-pot search biomarker "BRAF V600E" --cancer-type "Melanoma"

# Get complete drug record
nct-pot drug DRUG-000001

# Check what is loaded in the database
nct-pot stats

# Run the full test suite
nct-pot test

# Launch the exploratory notebook
nct-pot notebook

API Endpoints

Once the server is running (via nct-pot serve or Docker), the following REST endpoints are available:

Endpoint Method Parameters Description
/ GET API info + link to /docs
/search/drugs GET q (required), limit (default 20) Fuzzy search by drug name or synonym
/search/target/{target_symbol} GET target_symbol (path) Drugs targeting a gene/protein
/search/drug-class/{class_name} GET class_name (path) Drugs belonging to a class
/search/biomarker GET biomarker (required), cancer_type (optional) Approved indications by biomarker
/search/combinations GET class1 (required), class2 (required) Drug pairs for combination trials
/drug/{drug_id} GET drug_id (path) Full relational details for a drug
/stats GET Row counts for all loaded tables

Example API Queries

# Drug search
curl "http://localhost:8000/search/drugs?q=sotorasib"

# Target landscape
curl "http://localhost:8000/search/target/KRAS"

# Biomarker + cancer type
curl "http://localhost:8000/search/biomarker?biomarker=KRAS%20G12C&cancer_type=NSCLC"

# Combination trial design
curl "http://localhost:8000/search/combinations?class1=checkpoint%20inhibitor&class2=FGFR2%20inhibitor"

# Full drug record
curl "http://localhost:8000/drug/DRUG-000001"

Python SDK

You can import the data store directly into your own scripts:

from nct_pot.datastore import NCTPOTDataStore

# Initialize store
store = NCTPOTDataStore(data_dir="./data")

# Search drugs
results = store.search_drugs("sotorasib", limit=10)

# Search by target
target_results = store.search_by_target("KRAS")

# Search by biomarker
biomarker_results = store.search_by_biomarker("BRAF V600E", cancer_type="Melanoma")

# Full details
details = store.get_drug_details("DRUG-000001")

Notebooks

An exploratory Jupyter notebook is included in notebooks/:

# Launch via CLI
nct-pot notebook

Generated plots and exports are saved to images/.


Data Model

The relational schema mirrors the NCT-POT publication:

drug ──┬── drug_drugClass ── drugClass ── drugClass_drugTarget ──┐
       │                                                        drugTarget
       ├── drug_drugTarget ──────────────────────────────────────┘
       │
       └── drug_drugProduct ── drugProduct ── drugProduct_manufacturer ── manufacturer

approval ──┬── approval_oncotree ── oncotree (cancer types)
            └── approval_approvalBiomarker ── approvalBiomarker

TSV Files Loaded

File Rows (typical) Description
drug.tsv ~881 Drug entries (names, synonyms, NCIT IDs)
drugClass.tsv ~367 Hierarchical drug classification
drugTarget.tsv ~706 Molecular targets (HGNC genes, proteins)
drugProduct.tsv ~31 Commercial products (trade names)
approval.tsv ~18 FDA/EMA approval records
approvalBiomarker.tsv ~8 Approval-relevant biomarkers
manufacturer.tsv ~50 Pharmaceutical manufacturers
drug_drugClass.tsv ~1 109 Drug-to-class mappings
drug_drugTarget.tsv ~1 074 Drug-to-target mappings
drug_drugProduct.tsv ~31 Drug-to-product mappings
drugClass_drugTarget.tsv ~474 Class-to-target mappings
approval_oncotree.tsv ~20 Approval-to-cancer-type mappings
approval_approvalBiomarker.tsv ~10 Approval-to-biomarker mappings
drugProduct_manufacturer.tsv ~31 Product-to-manufacturer mappings

Project Structure

nct-pot/
├── nct_pot/                    # Python package
│   ├── __init__.py
│   ├── models.py               # Dataclasses (Drug, DrugClass, ...)
│   ├── datastore.py            # NCTPOTDataStore loader + search
│   ├── api.py                  # FastAPI application
│   ├── cli.py                  # Unified CLI entrypoint
│   └── export/
│       ├── extension.py        # BulkExporter (CSV/TSV/JSON)
│       └── bulk_cli.py         # Standalone bulk-export CLI
├── scripts/
│   ├── download_data.sh        # Automatic TSV downloader
│   ├── entrypoint.sh           # Docker startup script
│   ├── setup.sh                # Local setup helper
│   └── merge_api.sh            # Utility script
├── notebooks/
│   └── NCT_POT_Drugs_Analysis.ipynb
├── images/                     # Generated plots (.gitkeep)
├── data/                       # TSV data directory (gitignored)
├── docker/
│   └── Dockerfile
├── tests/                      # Pytest suite (46 tests)
│   ├── conftest.py
│   ├── test_models.py
│   ├── test_datastore.py
│   ├── test_api.py
│   └── fixtures/               # Minimal test dataset
├── docker-compose.yml
├── requirements.txt
├── pyproject.toml
├── README.md
└── .gitignore

Docker Details

The Dockerfile is based on python:3.11-slim and includes:

  • System deps: gcc, curl, wget, git
  • Python deps: FastAPI, Uvicorn, Pandas, Pydantic
  • Entrypoint: scripts/entrypoint.sh checks for ./data/drug.tsv; if missing, it runs scripts/download_data.sh to fetch all 14 TSV files before starting Uvicorn.
  • Health check: GET /stats every 30 s
  • Port: 8000

Docker Compose

# Start
docker-compose up --build -d

# Stop
docker-compose down

Data downloaded into ./data/ persists across restarts via a bind mount.


Data Source & License

The underlying dataset is the NCT Precision Oncology Thesaurus Drugs from the National Center for Tumor Diseases (NCT) Heidelberg.

Important: The data is released under CC-BY-NC 4.0 International. You must:

  • Use it for non-commercial purposes only
  • Email the TMO Heidelberg team before using the content
  • Not use it for direct clinical decision-making without expert verification

Citation

If you use this data or API in research, cite:

Kreutzfeldt S, Knurr A, Hübschmann D, Horak P, Fröhling S. NCT Precision Oncology Thesaurus Drugs — a Curated Database for Drugs, Drug Classes, and Drug Targets in Precision Cancer Medicine. medRxiv 2022.09.11.22279783. doi:10.1101/2022.09.11.22279783


Troubleshooting

Symptom Cause Fix
SyntaxError: illegal target for annotation Invalid @ in dataclass field names Already patched — pull latest code
Container restarts with unexpected EOF Malformed inline entrypoint.sh in Dockerfile Already patched — entrypoint.sh is now a standalone file
Read-only file system during download docker-compose.yml had :ro mount Already patched — volume is now writable
KeyError: 'id_oncotree' Wrong column name in biomarker search Already patched — uses code_oncotree
✗ (download failed) for all files Wrong GitHub org or missing /tsv/ path Already patched — points to Takshan/NCT-POT (mirror of TMO-HD/NCT-POT)
Only drug.tsv downloads, then script exits set -e + bash arithmetic ((0)) returns 1 Already patched — set -e removed from download_data.sh

License

  • API Code: MIT
  • NCT-POT Data: CC-BY-NC 4.0 International (non-commercial use only)

Contact

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors