Sample data

Test fixtures support manual testing, demos, and automated regression checks. All examples below mirror the oncology test suite under backend-analysis/tests/oncology/.

Mutation landscape fixture

import pandas as pd

maf = pd.DataFrame({
    "Hugo_Symbol": ["TP53", "KRAS", "TP53", "PIK3CA", "BRCA1", "EGFR", "CDKN2A"],
    "Tumor_Sample_Barcode": ["S1", "S1", "S2", "S2", "S3", "S3", "S3"],
    "Variant_Classification": [
        "Missense_Mutation", "Missense_Mutation", "Nonsense_Mutation",
        "Frame_Shift_Del", "Missense_Mutation", "Silent", "Splice_Site",
    ],
    "Chromosome": ["17", "12", "17", "3", "17", "7", "9"],
    "Start_Position": [7579472, 25398284, 7578406, 178936091, 43071077, 55249071, 21971120],
    "Reference_Allele": ["C", "G", "C", "T", "C", "A", "G"],
    "Tumor_Seq_Allele2": ["T", "A", "A", "TA", "G", "G", "T"],
})
cna = pd.DataFrame({
    "sample_id": ["S1", "S2", "S3"],
    "gene": ["EGFR", "PTEN", "CDKN2A"],
    "chromosome": ["7", "10", "9"],
    "alteration": ["amplification", "deletion", "deletion"],
})
maf.to_csv("/tmp/cohort.maf", index=False)
cna.to_csv("/tmp/copy_number.csv", index=False)

Expected results (from test_mutation_landscape.py):

3 samples, 7 variants
TMB per megabase > 0 for all samples
TP53 ranked as top mutated gene
Oncoprint matrix includes TP53
Dominant mutational signature assigned

Cell-cell communication fixture

import pandas as pd

expression = pd.DataFrame({
    "cell_id": ["tumor_1", "tumor_2", "t_1", "t_2", "myeloid_1", "myeloid_2"],
    "TGFB1": [5.0, 4.0, 0.0, 0.0, 2.0, 1.0],
    "TGFBR1": [1.0, 1.0, 4.0, 4.0, 2.0, 2.0],
    "CXCL9": [0.0, 0.0, 3.0, 4.0, 1.0, 1.0],
    "CXCR3": [1.0, 1.0, 5.0, 5.0, 0.0, 0.0],
    "CD274": [6.0, 5.0, 0.0, 0.0, 1.0, 1.0],
    "PDCD1": [0.0, 0.0, 5.0, 4.0, 0.0, 0.0],
})
metadata = pd.DataFrame({
    "cell_id": ["tumor_1", "tumor_2", "t_1", "t_2", "myeloid_1", "myeloid_2"],
    "cell_type": ["Tumor", "Tumor", "T cell", "T cell", "Myeloid", "Myeloid"],
    "condition": ["baseline", "treated", "baseline", "treated", "baseline", "treated"],
})
expression.to_csv("/tmp/expression.csv", index=False)
metadata.to_csv("/tmp/metadata.csv", index=False)

Expected: 3 LR pairs tested, 3 cell types, pathway scores > 0, condition comparison deltas present.

Survival analysis fixture

import pandas as pd

clinical = pd.DataFrame({
    "patient_id": ["P1", "P2", "P3", "P4", "P5", "P6"],
    "time_to_event_days": [900, 820, 760, 320, 280, 240],
    "event_observed": [0, 0, 1, 1, 1, 1],
    "treatment_arm": ["anti-pd1", "anti-pd1", "anti-pd1", "chemo", "chemo", "chemo"],
})
features = pd.DataFrame({
    "patient_id": ["P1", "P2", "P3", "P4", "P5", "P6"],
    "tmb": [9.0, 8.0, 7.5, 2.0, 1.0, 1.5],
    "cd8_score": [2.0, 1.8, 1.6, -0.8, -1.0, -0.9],
})
clinical.to_csv("/tmp/clinical.csv", index=False)
features.to_csv("/tmp/features.csv", index=False)

Expected: 6 patients, 2 KM groups when stratified by treatment_arm, log-rank and Cox outputs present.

Test fixtures location

Automated tests live under:

backend-analysis/tests/oncology/

Tests create temporary fixtures in pytest tmp_path — no committed binary fixtures required.

Suggested demo workflow

Create oncology study from /areas/oncology/studies/new
Write MAF fixture to /tmp/cohort.maf
Run mutation landscape job via API or Mutations page
Write clinical fixture to /tmp/clinical.csv
Run survival job with stratify_by: treatment_arm
Launch interpret job with both run IDs
Review results on workflow pages and in run history

Full product details: products/oncology/product.md and products/oncology/todo.md.