Data upload
Oncology studies combine molecular datasets with clinical cohort metadata. Register samples and endpoints in the metadata service before dispatching analysis jobs.
Samples
Register samples via the study Data page or the metadata API:
curl -X POST http://localhost:8000/oncology/studies/{study_id}/samples \
-H "Content-Type: application/json" \
-d '{
"external_id": "S1",
"patient_id": "P1",
"timepoint": "baseline",
"tumor_site": "lung",
"treatment": "anti-pd1",
"response_status": "partial_response",
"group_label": "responder"
}'
Key fields:
| Field | Purpose |
|---|---|
external_id | Sample identifier used across pipelines |
patient_id | Patient key for clinical endpoint joins |
timepoint | Longitudinal position (baseline, on-treatment, relapse) |
group_label | Cohort comparison group (responder, non-responder, arm A/B) |
response_status | Clinical response category |
Clinical endpoints
Add survival and outcome records per patient:
curl -X POST http://localhost:8000/oncology/studies/{study_id}/endpoints \
-H "Content-Type: application/json" \
-d '{
"patient_id": "P1",
"endpoint_type": "overall_survival",
"time_to_event_days": 900,
"event_observed": false,
"response_status": "partial_response"
}'
Survival analysis reads clinical CSV files with patient_id, time_to_event_days, and event_observed columns. Endpoint records in the metadata service provide the structured source for export.
Mutation datasets
Register MAF or VCF files linked to the study. The mutation landscape pipeline requires a file path in job parameters:
| Parameter | Required | Purpose |
|---|---|---|
mutation_path | Yes | Path to MAF-style CSV on the analysis server |
copy_number_path | No | Copy number alteration table |
panel_size_mb | No | Sequencing panel size for TMB calculation (default 38.0) |
top_genes | No | Number of genes in oncoprint (default 20) |
Repertoire datasets
Register TCR/BCR sequencing files for immune profile analysis. Pass the repertoire path in job parameters:
| Parameter | Required | Purpose |
|---|---|---|
expression_path | Yes | Bulk or pseudo-bulk expression matrix |
repertoire_path | No | TCR/BCR clonotype table |
Expression data for communication and immune pipelines
Cell-cell communication requires per-cell expression and metadata CSV files:
| File | Required columns |
|---|---|
| Expression CSV | cell_id plus gene columns |
| Metadata CSV | cell_id, cell_type; optional condition for comparison |
Immune profile requires a bulk expression matrix CSV with sample rows and gene columns.
File storage
Uploads and registered dataset paths live under data/oncology/. The metadata API records file locations in biochem_onco_* tables in Supabase. These paths are gitignored.
Next steps
- File formats — MAF, clinical CSV, expression matrix schemas
- Sample data — test fixtures for local development
- Study workflow