Single-cell atlas
Build a single-cell RNA-seq atlas from raw or processed data — QC, clustering, marker identification, differential expression, and pathway enrichment with publication-ready figures.
Research question
What cell populations exist in this dataset, what genes define each cluster, and how does expression differ between experimental conditions?
Who this is for
- Wet-lab biologists running their first single-cell analysis without local Seurat/Scanpy setup
- Computational biologists who want transparent parameters and checkpoint provenance
- Core facilities delivering standard QC → clustering → marker → enrichment packages to client labs
Data requirements
| Data | Required | Purpose |
|---|---|---|
.h5ad or convertible 10x/CSV | Yes | AnnData input for all pipeline steps |
Gene symbols in var | Recommended | GO enrichment compatibility |
| Metadata with condition labels | No (required for contrast DE) | Group comparison on Data page |
| Saved contrasts | No (required for contrast DE) | Treated vs. control comparisons |
Workflow
Data → QC → Normalization → Clustering → Explore UMAP → DE → Enrichment → Figures → Snapshot
Step 1 — Upload and prepare
Create a single-cell study and upload an .h5ad file (or convert 10x/CSV with Convert on the Data page). Save metadata on Data → Metadata so columns merge into obs before analysis runs.
Define contrasts on the Data page when condition-level DE is needed.
Step 2 — Find structure
Open Analyze → Find Structure and run in order:
- QC — filter by genes/cells detected, counts, and mitochondrial fraction
- Normalization — log-normalize and select highly variable genes
- Clustering — Leiden clustering and UMAP embedding
Re-running an upstream step invalidates downstream checkpoints and marks dependent results stale.
Step 3 — Explore
Review outputs before formal comparison:
- Explore → Sample QC — retention histograms and per-sample metrics
- Explore → Embedding — interactive UMAP colored by cluster or metadata column
Step 4 — Compare groups
Under Analyze → Compare Groups:
| Mode | When to use |
|---|---|
| Cluster markers | Genes enriched in each cluster vs. all others |
| Contrast | Groups defined on the Data page (e.g. treated vs. control) |
Run Pathway enrichment (ORA against GO biological process) on DE results.
Step 5 — Interpret and publish
- Interpret → Enrichment — GO term tree and pathway context
- Figures — drag UMAP, volcano, and heatmap panels onto the canvas; export PDF
- Runs → Snapshots — freeze parameter set and run IDs for reproducibility
Step 6 — AI interpretation (optional)
Use analytical interpretation on completed DE or enrichment runs for cluster annotations and pathway narratives. Requires Ollama configuration — see repo root ai.md.
Expected outputs
- Filtered AnnData checkpoints under
data/processed/{study_id}/{dataset_id}/ - UMAP embedding with Leiden cluster labels
- Ranked DE table with log fold change and adjusted p-values
- GO enrichment results with term hierarchy
- Multi-panel figure PDF and analysis snapshot
Typical analyses
| Analysis | Comparison | Question |
|---|---|---|
| Cell type discovery | Cluster markers | What genes define each population? |
| Treatment response | Contrast DE | Which genes change after perturbation? |
| Immune infiltration | Condition on metadata | Are T cell clusters expanded in responders? |
| Core client delivery | Full pipeline + snapshot | Reproducible deliverable for requesting lab |