CellFlow trained on Norman 2019
Produced as part of the sc-interp single-cell model comparison repo.
Provenance
- Source code commit:
fdc2ae0 - Runner:
scripts/run_cellflow.py - Dataset manifest:
data/norman/manifest.yaml
Base model
Trained from scratch. CellFlow is a flow-matching based perturbation prediction framework and does not ship a foundation checkpoint. Perturbation conditions are encoded via ESM2 embeddings of the perturbed gene(s) (facebook/esm2_t6_8M_UR50D).
Training
- Architecture and training hyperparameters match the cellflow_reproducibility repo's
suppl_fig/norman/downstream_analysis/cellflow/configs verbatim:condition_embedding_dim=1024,hidden_dims=(4096,4096,4096),decoder_dims=(4096,4096,4096),decoder_dropout=0.2time_encoder_dims=(2048,2048,2048),time_freqs=1024,cond_output_dropout=0.9layers_before_pool.target_gene = mlp[1024,1024] dropout 0.5,layers_after_pool = mlp[1024,1024] dropout 0.2match_fn = match_linear(epsilon=0.1, scale_cost='mean', tau_a=1.0, tau_b=1.0)optimizer = optax.MultiSteps(optax.adam(5e-5), 20)probability_path = {'constant_noise': 1.0}pooling = 'attention_token'
- Sample representation: 50-dim PCA (
sample_rep='X_pca'), fit on the train split cells and projected onto val and test. - Perturbation encoding: ESM2 embeddings per gene symbol, stored in
adata.uns['esm2']and referenced viaperturbation_covariate_reps={'target_gene': 'esm2'}. - Split: GEARS simulation split with seed 42, not biolord (the CellFlow paper uses biolord). This is a deliberate divergence so our three-way comparison with scGPT and scLDM uses a single split definition.
Budget and stopping
| iterations | 200,000 |
| batch size | 1024 |
| valid_freq | 400,000 (larger than budget = no mid-training eval) |
| wall clock | 0.7 hours (H100 PCIe) |
| sample_rep | X_pca (50 dims) |
Test set metrics (cell-eval)
| metric | mean | median | max |
|---|---|---|---|
| pearson_delta | 0.5630 | 0.6814 | 0.9651 |
| discrimination_score_l1 | 0.7270 | 0.8182 | 1.0000 |
| discrimination_score_l2 | 0.7452 | 0.8586 | 1.0000 |
| discrimination_score_cosine | 0.7413 | 0.8788 | 1.0000 |
| pearson_edistance | 0.6707 | 0.6707 | 0.6707 |
| clustering_agreement | 0.3252 | 0.3252 | 0.3252 |
| overlap_at_N | 0.0264 | 0.0242 | 0.1008 |
| precision_at_N | 0.0936 | 0.0977 | 0.2267 |
| mse | 0.0032 | 0.0022 | 0.0132 |
| mae | 0.0156 | 0.0142 | 0.0350 |
The CellFlow paper reports Norman results in terms of R² in gene space and energy distance in 10-dim PCA space (Figure 4N, Methods section 3.5). Our numbers use cell-eval's standard metric set on the GEARS simulation split, so they are not directly comparable to Figure 4N, but they reproduce the paper's headline claim (CellFlow > scGPT on Norman): on our matched evaluation, CellFlow outperforms scGPT on pearson_delta, all discrimination_score variants, pearson_edistance, clustering_agreement, mse, and mae. The two models are tied on DE gene overlap / precision, consistent with the broader observation that current perturbation models capture broad transcriptional programs better than specific regulatory effects.
Known limitations
- Uses ESM2
esm2_t6_8M_UR50D(8M param) instead of the paper'sesm2_t36_3B_UR50D(3B param). Speed gain for research iteration; gene embedding quality may be slightly lower than the paper. - Uses GEARS simulation split instead of biolord's 5 random splits. Our test perturbations are a different subset of Norman than the paper's.
- Training uses
valid_freq > num_iterationsso there is no mid-training val evaluation. Convergence was not verified via a val curve; future runs should use a smaller valid_freq to plot the learning curve.
Files
CellFlow.pkl— Trained CellFlow model, pickled viacf.save(). Load viacellflow.model.CellFlow.load(path).training_stats.json— iterations, wall clock, wandb run URL.
Usage
from huggingface_hub import hf_hub_download
from cellflow.model import CellFlow
path = hf_hub_download(
repo_id="matthewshu/cellflow-norman",
filename="CellFlow.pkl",
)
cf = CellFlow.load(path)
# Then use sc-interp's run_cellflow.py --hf-repo matthewshu/cellflow-norman
Citation
Dataset: Norman et al. 2019 (Science). Model: Klein, Fleck, Becker et al. 2025 bioRxiv (CellFlow). See the CellFlow repo and the Norman 2019 paper for proper BibTeX entries.