Key concepts
Oncology in Gradient Biotech is organized around tumor samples, molecular alterations, immune context, signaling networks, clinical endpoints, and reproducible analysis runs. This page explains the oncology concepts behind the workflows and the app concepts used to manage them.
Tumor cohorts and samples
Most Oncology workflows begin with a study cohort: a set of patients, samples, and clinical metadata organized around a tumor type, treatment, response group, trial arm, or translational research question.
Common cohort concepts:
- Patient: person or participant represented in the study.
- Sample: molecular or tissue specimen linked to a patient.
- Timepoint: collection stage, such as baseline, on-treatment, response, relapse, or recurrence.
- Tumor site: anatomical location or lesion source.
- Treatment arm: therapy or experimental group.
- Response status: outcome label such as responder, non-responder, stable disease, progression, or pathological response.
- Group label: category used for comparison.
Good oncology analysis depends on consistent patient and sample identifiers. These keys connect expression, mutation, repertoire, pathology, and clinical endpoint tables.
Clinical endpoints
Clinical endpoints describe outcomes that can be linked to molecular features.
Common endpoint concepts:
- Overall survival: time from a defined start point to death from any cause.
- Progression-free survival: time to progression or death, depending on study definition.
- Censoring: subject has not had the event by the last follow-up time.
- Event observed: indicator that the endpoint event occurred.
- Response Evaluation Criteria in Solid Tumors (RECIST): radiographic response framework used in many solid tumor studies.
- Pathological response: tissue-based response assessment, often after neoadjuvant therapy.
Endpoints feed survival analysis and responder-versus-non-responder comparisons. Their meaning depends on study design, follow-up time, treatment context, and censoring rules.
Tumor microenvironment
The tumor microenvironment (TME) includes tumor cells, immune cells, stromal cells, vasculature, extracellular matrix, and signaling factors surrounding the tumor.
Oncology uses Computational Biology infrastructure for single-cell clustering, spatial transcriptomics, differential expression, and biomarker workflows, then adds oncology-specific interpretation such as tumor microenvironment composition, immune infiltration, communication networks, mutation context, and clinical outcomes.
Tumor microenvironment outputs should be interpreted with tissue source, sampling bias, tumor purity, spatial context, and annotation quality.
Cell-cell communication
Cell-cell communication estimates signaling between cell populations using per-cell expression data, cell metadata, ligand-receptor pairs, pathway aggregation, and condition comparison.
Core concepts:
- Ligand: signaling molecule expressed or secreted by a source cell population.
- Receptor: target molecule expressed by a receiving cell population.
- Ligand-receptor interaction: candidate signaling relationship between source and receiver cell types.
- Interaction score: computed evidence or strength for an interaction.
- Sender and receiver network: directed view of which cell types may send or receive signals.
- Pathway aggregation: grouping interactions into signaling pathways.
- Condition comparison: change in signaling between response groups, treatments, or disease states.
Communication outputs are computational evidence. They are strongest when supported by expression quality, expected biology, spatial proximity, and experimental validation.
Immuno-oncology profiling
Immuno-oncology profiling summarizes immune context in a tumor cohort. It can use bulk expression, immune deconvolution, exhaustion programs, response phenotypes, and optional receptor repertoire data.
Important concepts:
- Immune deconvolution: estimation of immune cell fractions from bulk expression data.
- Tumor Immune Dysfunction and Exclusion (TIDE): framework for immune escape and checkpoint response hypotheses.
- Exhaustion score: gene program score associated with T cell dysfunction or chronic stimulation.
- Checkpoint target: immune regulatory molecule such as programmed cell death protein 1 or programmed death-ligand 1.
- Immune phenotype: summary label describing immune-inflamed, excluded, exhausted, or related states.
- Immuno-oncology (IO) response prediction: research-only score or label intended for exploratory stratification.
These outputs support hypothesis generation about immune infiltration, exclusion, exhaustion, and therapy response. They are not treatment recommendations.
Repertoire analysis
Repertoire analysis summarizes T cell receptor and B cell receptor sequencing data when it is registered with the study.
Key concepts:
- T cell receptor (TCR): receptor sequence used by T cells to recognize antigen context.
- B cell receptor (BCR): receptor sequence linked to antibody-producing lineages.
- Complementarity-determining region 3 (CDR3): highly variable receptor region often used for clonotype definitions.
- Clonotype: group of cells or sequences inferred to share receptor identity.
- Clonal expansion: enrichment of a clonotype across cells, samples, or timepoints.
- Diversity: distribution of clonotypes in a sample or cohort.
Expanded clonotypes can be biologically important, but they do not identify antigen targets by themselves. Interpret repertoire outputs with assay depth, chain pairing, tissue source, and immune state.
Mutation landscape
The mutation landscape summarizes somatic alterations across tumor samples.
Common mutation concepts:
- Somatic variant: alteration found in tumor deoxyribonucleic acid (DNA) relative to a reference or matched normal context.
- Mutation Annotation Format (MAF): tabular format commonly used for somatic mutation records.
- Variant Call Format (VCF): file format for genomic variant records.
- Variant classification: mutation type, such as missense, nonsense, frameshift, splice site, or silent.
- Driver alteration: mutation or copy number event believed to contribute to tumor biology.
- Tumor mutational burden (TMB): number of mutations normalized by sequenced genomic territory.
- Oncoprint: compact matrix visualization of alterations across samples and genes.
- Mutational signature: pattern of mutation types associated with biological or technical processes.
Mutation results depend on panel size, variant filtering, tumor purity, sequencing assay, and whether germline or matched-normal filtering was performed.
Copy number and pathway context
Copy number alterations summarize amplifications and deletions in genomic regions or genes. They can be interpreted alongside point mutations and pathway enrichment.
Examples:
- Amplification: increased copy number of a region or gene.
- Deletion: reduced copy number of a region or gene.
- Co-occurrence: alterations that appear together more often than expected.
- Mutual exclusivity: alterations that rarely appear together.
- Driver pathway enrichment: mapping altered genes to cancer-relevant pathways.
Copy number and pathway summaries help connect individual alterations to larger biological programs, but they should be reviewed against the assay type and cohort size.
Survival analysis
Survival analysis connects clinical endpoint timing to groups or molecular features.
Common survival concepts:
- Kaplan-Meier curve: estimate of survival probability over time.
- Log-rank test: statistical comparison between survival curves.
- Cox proportional hazards regression: model estimating associations between covariates and event hazard.
- Hazard ratio: relative event hazard between groups or feature levels.
- Longitudinal trajectory: feature values tracked across timepoints.
- Stratification: splitting subjects by treatment arm, mutation status, immune phenotype, or another feature.
Survival outputs depend on endpoint definitions, follow-up length, censoring, sample size, and covariate selection. They are research statistics, not clinical predictions.
Artificial intelligence interpretation
Artificial intelligence interpretation summarizes completed oncology pipeline outputs with metric citations.
Interpretation contexts include:
| Context | When | Output |
|---|---|---|
| Cell-cell communication | Communication run complete | Signaling axis summary with cited interaction scores and pathway rankings |
| Immune phenotype | Immune profile run complete | Deconvolution, Tumor Immune Dysfunction and Exclusion phenotype, exhaustion, and repertoire narrative |
| Mutation landscape | Mutation run complete | Tumor mutational burden, driver alterations, signature, and oncoprint interpretation |
| Survival | Survival run complete | Kaplan-Meier and Cox findings with cited hazard ratios and P values |
| Multi-modal summary | Multiple source runs | Integrated narrative connecting tumor microenvironment, immune, mutation, and outcome findings |
Interpretations do not compute new metrics, diagnose cancer, or recommend treatment. They explain completed outputs and include a research disclaimer.
Study
A study is the top-level container for one oncology project. It holds samples, clinical endpoints, mutation and repertoire datasets, pipeline runs, artifacts, and interpretation outputs. Create one study per trial, tumor cohort, or translational analysis project.
Optional metadata includes tumor type, cancer stage, treatment arm, and response status.
Sample
A sample links molecular data to cohort metadata. Each sample records:
- Patient and sample identifiers
- Timepoint
- Tumor site
- Treatment
- Response status
- Group label
- Flexible JavaScript Object Notation (JSON) metadata for study-specific fields
Samples align keys across single-cell, bulk expression, mutation, repertoire, pathology, and clinical tables.
Mutation dataset
A mutation dataset is a registered Mutation Annotation Format or Variant Call Format file linked to the study. The mutation landscape pipeline reads the file path and produces tumor mutational burden, variant classification summaries, oncoprint data, signature decomposition, and optional copy number summaries.
Repertoire dataset
A repertoire dataset is a registered T cell receptor or B cell receptor sequencing file. The immune profile pipeline uses repertoire paths for clonotype detection, diversity metrics, and clonal expansion analysis.
Pipeline run
A run is one execution of an oncology analysis pipeline.
Each run stores:
- Unique run identifier, prefixed with
onco-run- - Pipeline type and version
- JavaScript Object Notation parameter record
- Status:
queued->running->completeorfailed - Output artifacts, such as JSON summaries, oncoprint data, Kaplan-Meier curves, and interpretations
- Timestamps and summary statistics
Runs chain through provenance. Interpretation references completed source runs, and survival analysis can stratify by features from immune or mutation pipelines.
Job polling
All analysis runs asynchronously in backend-analysis. The frontend polls GET /oncology/jobs/{run_id} until the job completes. Status badges update on workflow pages.
Artifacts
Pipeline outputs are written to data/oncology/artifacts/{run_id}/ as JavaScript Object Notation.
Examples:
result.json: full pipeline output with summary statistics, tables, and visualization data- Communication artifacts: interaction tables, pathway scores, and sender/receiver network
- Immune profile artifacts: deconvolution fractions, Tumor Immune Dysfunction and Exclusion phenotype, exhaustion scores, and repertoire diversity
- Mutation landscape artifacts: tumor mutational burden table, oncoprint matrix, signature decomposition, and copy number summary
- Survival artifacts: Kaplan-Meier curve data, log-rank P value, and Cox regression coefficients
- Interpretation artifacts: narrative JSON with metric citations
Relationship to other research areas
Single-cell quality control, clustering, spatial transcriptomics, differential expression, and biomarker machine learning are owned by the Computational Biology area. Oncology calls those pipelines for tumor microenvironment profiling and adds the oncology-specific layer: communication networks, immune deconvolution, mutation analysis, and survival statistics.
Whole-slide imaging, tissue segmentation, and image-based spatial quantification are owned by the Pathology area. Oncology can use pathology outputs for pathology-integrated tumor microenvironment context without duplicating image processing.
Provenance and reproducibility
Every analysis should be reproducible. The study Run history panel lists all pipeline executions with status, timestamps, pipeline type, parameters, and artifact links. Re-run jobs with identical parameters from workflow pages when upstream data or settings change.
Interpretation scope
The Oncology area supports exploratory and translational oncology research. It is not a regulated clinical diagnostic, clinical decision support tool, or treatment selection system. Treat outputs as structured research evidence for expert review.