Single-cell
The single-cell pipeline follows a standard Scanpy-style workflow ported from Seurat-class methods. All steps operate on AnnData and chain through on-disk checkpoints.
Pipeline steps
1. Quality control
Filters cells and optionally genes based on:
- Minimum and maximum genes per cell
- Minimum counts per cell
- Maximum mitochondrial gene percentage
Outputs QC metrics, histograms, and per-sample retention charts visible in Explore → Sample QC.
2. Normalization
Log-normalizes counts, identifies highly variable genes, and optionally scales. Writes a checkpoint used by clustering.
3. Clustering
Builds a neighbor graph, runs Leiden clustering at a configurable resolution, and computes UMAP coordinates for interactive exploration.
4. Differential expression
Two modes:
| Mode | When to use |
|---|---|
| Cluster markers | Find genes enriched in each cluster vs all others |
| Contrast | Compare groups defined on the Data page (for example treated vs control) |
Uses Wilcoxon rank-sum tests by default. Results include log fold change, adjusted p-values, and percent expressing.
5. Pathway enrichment
Over-representation analysis (ORA) against Gene Ontology biological process terms. Requires gene symbols compatible with the bundled reference (HGNC for demo data).
Checkpoints
Intermediate .h5ad files live under data/processed/{study_id}/{dataset_id}/. Re-running an upstream step deletes downstream checkpoints and marks dependent results stale.
Metadata merge
Before each step, saved metadata columns merge into obs. Ensure join keys match barcodes or sample IDs to avoid silent failures.
Not yet available
- Batch integration (Harmony, scVI, BBKNN)
- Trajectory / pseudotime
- Pseudobulk DE with biological replicates
- Cell-cycle scoring and regression
- Reference-based annotation (SingleR) in the UI
See Quick start for a hands-on walkthrough.