Gradient Biotech
U

Menu

Biomarker discovery

The biomarker pipeline supports translational workflows: feature selection from expression data, reduction to a predictive gene panel, and cross-validated classifier performance.

Inputs

  • Expression matrix in AnnData (cells or samples as observations)
  • Class labels in obs — default column name condition, configurable per run
  • At least two classes with sufficient samples per class

Pipeline stages

Optional WGCNA

Co-expression network analysis runs on bulk-style matrices. On single-cell data WGCNA is skipped automatically because module structure is unreliable at cell-level sparsity.

Feature selection

MethodDescription
mRMRMinimum redundancy maximum relevance; capped input size for performance
Random forestImportance ranking; used automatically for large gene sets
CombinedMerges rankings from multiple methods

Classification

Trains a classifier (SVM, k-NN, or random forest) with cross-validation and reports accuracy, F1, and confusion matrix summaries.

Results

Analyze → Biomarker Results shows:

  • Ranked selected genes with scores
  • Classifier performance metrics across CV folds
  • Links from Runs and Interpret

KnowSeq alignment

The current implementation covers feature selection and ML classification. Not yet implemented:

  • Coverage-style multi-class DEG extraction across all pairwise comparisons
  • Consistency selection across resampling
  • Disease evidence retrieval from external databases

Tips

  • Prefer pseudobulk or sample-level aggregation when biological replicates matter; cell-level labels inflate performance.
  • Check class balance in Explore before running; severe imbalance affects CV metrics.
  • Use enrichment on the selected gene list in a follow-up DE/enrichment run for pathway context.