CellFlow trained on Norman 2019

Produced as part of the sc-interp single-cell model comparison repo.

Provenance

Source code commit: fdc2ae0
Runner: scripts/run_cellflow.py
Dataset manifest: data/norman/manifest.yaml

Base model

Trained from scratch. CellFlow is a flow-matching based perturbation prediction framework and does not ship a foundation checkpoint. Perturbation conditions are encoded via ESM2 embeddings of the perturbed gene(s) (facebook/esm2_t6_8M_UR50D).

Training

Architecture and training hyperparameters match the cellflow_reproducibility repo's suppl_fig/norman/downstream_analysis/cellflow/ configs verbatim:
- condition_embedding_dim=1024, hidden_dims=(4096,4096,4096), decoder_dims=(4096,4096,4096), decoder_dropout=0.2
- time_encoder_dims=(2048,2048,2048), time_freqs=1024, cond_output_dropout=0.9
- layers_before_pool.target_gene = mlp[1024,1024] dropout 0.5, layers_after_pool = mlp[1024,1024] dropout 0.2
- match_fn = match_linear(epsilon=0.1, scale_cost='mean', tau_a=1.0, tau_b=1.0)
- optimizer = optax.MultiSteps(optax.adam(5e-5), 20)
- probability_path = {'constant_noise': 1.0}
- pooling = 'attention_token'
Sample representation: 50-dim PCA (sample_rep='X_pca'), fit on the train split cells and projected onto val and test.
Perturbation encoding: ESM2 embeddings per gene symbol, stored in adata.uns['esm2'] and referenced via perturbation_covariate_reps={'target_gene': 'esm2'}.
Split: GEARS simulation split with seed 42, not biolord (the CellFlow paper uses biolord). This is a deliberate divergence so our three-way comparison with scGPT and scLDM uses a single split definition.

Budget and stopping


iterations	200,000
batch size	1024
valid_freq	400,000 (larger than budget = no mid-training eval)
wall clock	0.7 hours (H100 PCIe)
sample_rep	X_pca (50 dims)

Test set metrics (cell-eval)

metric	mean	median	max
pearson_delta	0.5630	0.6814	0.9651
discrimination_score_l1	0.7270	0.8182	1.0000
discrimination_score_l2	0.7452	0.8586	1.0000
discrimination_score_cosine	0.7413	0.8788	1.0000
pearson_edistance	0.6707	0.6707	0.6707
clustering_agreement	0.3252	0.3252	0.3252
overlap_at_N	0.0264	0.0242	0.1008
precision_at_N	0.0936	0.0977	0.2267
mse	0.0032	0.0022	0.0132
mae	0.0156	0.0142	0.0350

The CellFlow paper reports Norman results in terms of R² in gene space and energy distance in 10-dim PCA space (Figure 4N, Methods section 3.5). Our numbers use cell-eval's standard metric set on the GEARS simulation split, so they are not directly comparable to Figure 4N, but they reproduce the paper's headline claim (CellFlow > scGPT on Norman): on our matched evaluation, CellFlow outperforms scGPT on pearson_delta, all discrimination_score variants, pearson_edistance, clustering_agreement, mse, and mae. The two models are tied on DE gene overlap / precision, consistent with the broader observation that current perturbation models capture broad transcriptional programs better than specific regulatory effects.

Known limitations

Uses ESM2 esm2_t6_8M_UR50D (8M param) instead of the paper's esm2_t36_3B_UR50D (3B param). Speed gain for research iteration; gene embedding quality may be slightly lower than the paper.
Uses GEARS simulation split instead of biolord's 5 random splits. Our test perturbations are a different subset of Norman than the paper's.
Training uses valid_freq > num_iterations so there is no mid-training val evaluation. Convergence was not verified via a val curve; future runs should use a smaller valid_freq to plot the learning curve.

Files

CellFlow.pkl — Trained CellFlow model, pickled via cf.save(). Load via cellflow.model.CellFlow.load(path).
training_stats.json — iterations, wall clock, wandb run URL.

Usage

from huggingface_hub import hf_hub_download
from cellflow.model import CellFlow

path = hf_hub_download(
    repo_id="matthewshu/cellflow-norman",
    filename="CellFlow.pkl",
)
cf = CellFlow.load(path)
# Then use sc-interp's run_cellflow.py --hf-repo matthewshu/cellflow-norman

Citation

Dataset: Norman et al. 2019 (Science). Model: Klein, Fleck, Becker et al. 2025 bioRxiv (CellFlow). See the CellFlow repo and the Norman 2019 paper for proper BibTeX entries.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Other

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support