File formats
Primary format: AnnData (.h5ad)
All analysis runs use AnnData as the internal model. Users upload .h5ad directly or convert from other formats on the Data page.
Expected structure:
| Component | Purpose |
|---|---|
X or layers['counts'] | Expression matrix |
obs | Cell or sample metadata (barcodes as index) |
var | Gene metadata (gene symbols in index or gene_symbols column) |
obsm['spatial'] | Required for spatial workflows (x/y coordinates) |
obsm['X_umap'] | Written by clustering; used by UMAP explorer |
Upload formats
| Format | Extension | Conversion |
|---|---|---|
| AnnData | .h5ad | Direct ingest |
| Count matrix | .csv, .tsv | Convert → processed .h5ad |
| 10x Genomics HDF5 | .h5 | Convert |
| 10x MTX bundle | .zip | Convert (matrix + barcodes + genes) |
CSV/10x conversion stores raw counts in layers['counts'] and attempts to restore sample / condition from sidecar metadata when present.
Gene identifiers
Enrichment and pathway tools expect gene symbols (for example HGNC for human). Ensembl IDs or custom prefixes may yield empty enrichment results. Demo fixtures use real HGNC symbols from bundled GO reference sets.
Spatial bundles
Full Visium output folders (H&E images, scale factors, JSON) are not yet ingested natively. Prepare a spatial .h5ad with coordinates in obsm['spatial'], or use the included Visium test fixture.
Export formats
| Output | Format |
|---|---|
| Figure canvas | PDF (client-side export) |
| DE / enrichment tables | In-app tables; CSV export planned |
| Processed checkpoints | .h5ad on server disk (not user-downloadable in prototype) |
Size guidance
The prototype runs locally with datasets up to a few thousand cells comfortably. Large single-cell and spatial datasets will require cloud storage, lazy loading, and tiled imagery in production — see internal architecture docs.