Single-cell

The single-cell pipeline follows a standard Scanpy-style workflow ported from Seurat-class methods. All steps operate on AnnData and chain through on-disk checkpoints.

Pipeline steps

1. Quality control

Filters cells and optionally genes based on:

Minimum and maximum genes per cell
Minimum counts per cell
Maximum mitochondrial gene percentage

Outputs QC metrics, histograms, and per-sample retention charts visible in Explore → Sample QC.

2. Normalization

Log-normalizes counts, identifies highly variable genes, and optionally scales. Writes a checkpoint used by clustering.

3. Clustering

Builds a neighbor graph, runs Leiden clustering at a configurable resolution, and computes UMAP coordinates for interactive exploration.

4. Differential expression

Two modes:

Mode	When to use
Cluster markers	Find genes enriched in each cluster vs all others
Contrast	Compare groups defined on the Data page (for example treated vs control)

Uses Wilcoxon rank-sum tests by default. Results include log fold change, adjusted p-values, and percent expressing.

5. Pathway enrichment

Over-representation analysis (ORA) against Gene Ontology biological process terms. Requires gene symbols compatible with the bundled reference (HGNC for demo data).

Checkpoints

Intermediate .h5ad files live under data/processed/{study_id}/{dataset_id}/. Re-running an upstream step deletes downstream checkpoints and marks dependent results stale.

Metadata merge

Before each step, saved metadata columns merge into obs. Ensure join keys match barcodes or sample IDs to avoid silent failures.

Not yet available

Batch integration (Harmony, scVI, BBKNN)
Trajectory / pseudotime
Pseudobulk DE with biological replicates
Cell-cycle scoring and regression
Reference-based annotation (SingleR) in the UI

See Quick start for a hands-on walkthrough.

Menu

Documentation