CellFlow trained on Norman 2019

Produced as part of the sc-interp single-cell model comparison repo.

Provenance

Base model

Trained from scratch. CellFlow is a flow-matching based perturbation prediction framework and does not ship a foundation checkpoint. Perturbation conditions are encoded via ESM2 embeddings of the perturbed gene(s) (facebook/esm2_t6_8M_UR50D).

Training

  • Architecture and training hyperparameters match the cellflow_reproducibility repo's suppl_fig/norman/downstream_analysis/cellflow/ configs verbatim:
    • condition_embedding_dim=1024, hidden_dims=(4096,4096,4096), decoder_dims=(4096,4096,4096), decoder_dropout=0.2
    • time_encoder_dims=(2048,2048,2048), time_freqs=1024, cond_output_dropout=0.9
    • layers_before_pool.target_gene = mlp[1024,1024] dropout 0.5, layers_after_pool = mlp[1024,1024] dropout 0.2
    • match_fn = match_linear(epsilon=0.1, scale_cost='mean', tau_a=1.0, tau_b=1.0)
    • optimizer = optax.MultiSteps(optax.adam(5e-5), 20)
    • probability_path = {'constant_noise': 1.0}
    • pooling = 'attention_token'
  • Sample representation: 50-dim PCA (sample_rep='X_pca'), fit on the train split cells and projected onto val and test.
  • Perturbation encoding: ESM2 embeddings per gene symbol, stored in adata.uns['esm2'] and referenced via perturbation_covariate_reps={'target_gene': 'esm2'}.
  • Split: GEARS simulation split with seed 42, not biolord (the CellFlow paper uses biolord). This is a deliberate divergence so our three-way comparison with scGPT and scLDM uses a single split definition.

Budget and stopping

iterations 200,000
batch size 1024
valid_freq 400,000 (larger than budget = no mid-training eval)
wall clock 0.7 hours (H100 PCIe)
sample_rep X_pca (50 dims)

Test set metrics (cell-eval)

metric mean median max
pearson_delta 0.5630 0.6814 0.9651
discrimination_score_l1 0.7270 0.8182 1.0000
discrimination_score_l2 0.7452 0.8586 1.0000
discrimination_score_cosine 0.7413 0.8788 1.0000
pearson_edistance 0.6707 0.6707 0.6707
clustering_agreement 0.3252 0.3252 0.3252
overlap_at_N 0.0264 0.0242 0.1008
precision_at_N 0.0936 0.0977 0.2267
mse 0.0032 0.0022 0.0132
mae 0.0156 0.0142 0.0350

The CellFlow paper reports Norman results in terms of R² in gene space and energy distance in 10-dim PCA space (Figure 4N, Methods section 3.5). Our numbers use cell-eval's standard metric set on the GEARS simulation split, so they are not directly comparable to Figure 4N, but they reproduce the paper's headline claim (CellFlow > scGPT on Norman): on our matched evaluation, CellFlow outperforms scGPT on pearson_delta, all discrimination_score variants, pearson_edistance, clustering_agreement, mse, and mae. The two models are tied on DE gene overlap / precision, consistent with the broader observation that current perturbation models capture broad transcriptional programs better than specific regulatory effects.

Known limitations

  • Uses ESM2 esm2_t6_8M_UR50D (8M param) instead of the paper's esm2_t36_3B_UR50D (3B param). Speed gain for research iteration; gene embedding quality may be slightly lower than the paper.
  • Uses GEARS simulation split instead of biolord's 5 random splits. Our test perturbations are a different subset of Norman than the paper's.
  • Training uses valid_freq > num_iterations so there is no mid-training val evaluation. Convergence was not verified via a val curve; future runs should use a smaller valid_freq to plot the learning curve.

Files

  • CellFlow.pkl — Trained CellFlow model, pickled via cf.save(). Load via cellflow.model.CellFlow.load(path).
  • training_stats.json — iterations, wall clock, wandb run URL.

Usage

from huggingface_hub import hf_hub_download
from cellflow.model import CellFlow

path = hf_hub_download(
    repo_id="matthewshu/cellflow-norman",
    filename="CellFlow.pkl",
)
cf = CellFlow.load(path)
# Then use sc-interp's run_cellflow.py --hf-repo matthewshu/cellflow-norman

Citation

Dataset: Norman et al. 2019 (Science). Model: Klein, Fleck, Becker et al. 2025 bioRxiv (CellFlow). See the CellFlow repo and the Norman 2019 paper for proper BibTeX entries.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support