File formats
MAF / somatic mutation tables
Mutation landscape accepts MAF-style CSV files. Required columns:
| Column | Purpose |
|---|---|
Hugo_Symbol | Gene name |
Tumor_Sample_Barcode | Sample identifier |
Variant_Classification | Missense, nonsense, frameshift, splice site, silent, etc. |
Chromosome | Chromosome label |
Start_Position | Variant start coordinate |
Reference_Allele | Reference base |
Tumor_Seq_Allele2 | Alternate allele |
VCF files can be registered as mutation datasets; the pipeline reads MAF-style tabular exports for analysis.
Copy number alterations
Optional CSV for mutation landscape copy number summary:
| Column | Required | Purpose |
|---|---|---|
sample_id | Yes | Sample identifier |
gene | Yes | Gene symbol |
chromosome | No | Chromosome label |
alteration | Yes | amplification or deletion |
Clinical endpoints
Survival analysis reads clinical CSV files:
| Column | Required | Purpose |
|---|---|---|
patient_id | Yes | Patient identifier |
time_to_event_days | Yes | Survival or PFS time in days |
event_observed | Yes | 1 = event, 0 = censored |
treatment_arm | No | Treatment group for stratification |
Molecular feature matrix
Optional CSV for survival stratification and Cox regression:
| Column | Required | Purpose |
|---|---|---|
patient_id | Yes | Join key to clinical table |
| Feature columns | No | TMB, immune scores, gene signatures, etc. |
Longitudinal trajectories
Optional CSV for treatment timepoint analysis:
| Column | Required | Purpose |
|---|---|---|
patient_id | Yes | Patient identifier |
timepoint | Yes | baseline, on_treatment, response, relapse |
feature | Yes | Feature name |
value | Yes | Numeric measurement |
Per-cell expression (communication)
| Column | Required | Purpose |
|---|---|---|
cell_id | Yes | Cell identifier |
| Gene columns | Yes | Numeric expression values |
Cell metadata (communication)
| Column | Required | Purpose |
|---|---|---|
cell_id | Yes | Cell identifier |
cell_type | Yes | Annotated cell type label |
condition | No | Treatment or response group for comparison |
Ligand-receptor database (optional)
Custom CSV for cell-cell communication:
| Column | Required | Purpose |
|---|---|---|
ligand | Yes | Ligand gene symbol |
receptor | Yes | Receptor gene symbol |
pathway | No | Signaling pathway label |
Bulk expression (immune profile)
Samples-as-rows, genes-as-columns CSV. First column should identify samples.
Repertoire (immune profile)
TCR/BCR clonotype table with clonotype identifiers, CDR3 sequences, and frequency counts. Format follows scirpy-compatible tabular exports.
Artifact outputs
Pipeline results under data/oncology/artifacts/{run_id}/:
| Artifact | Contents |
|---|---|
result.json | Full pipeline output with summary, tables, and visualization data |
Single-cell and spatial omics
10x Genomics, Visium, Xenium, and bulk RNA-seq formats are handled by the Computational Biology area. Oncology links compbio datasets to study samples via shared patient/sample keys.
Whole-slide images
H&E and IHC whole-slide images are handled by the Pathology area. Oncology uses pathology outputs for spatial TME context without duplicating WSI ingestion.