AlloGen Inference Guide
This guide covers how to score binder designs and apply guidance with the bundled Q_θ checkpoint. Training is not part of the public release — only inference and guidance.
Env var. Throughout this doc,
${ALLOGEN_ROOT}is the path to the cloned repo. Eithercdinto it and use relative paths, orexport ALLOGEN_ROOT=/path/to/AlloGen.
Python. Use the env from
environment.yml/requirements.txt. All scripts insertcode/intosys.pathvia a_CODE_DIRboot block, so they work from any CWD.
1. Checkpoint
The Phase 2 weights checkpoints/Q_theta_phase2.pt are the v4-S2 target-swap split model used in the paper. Phase 1 (Q_theta_phase1.pt) is the DockQ regression intermediate.
Pull via Git LFS:
git lfs install
git lfs pull
2. Score binders
2a. Python API
import sys
sys.path.insert(0, 'code')
from models.differentiable_features import DifferentiableQTheta
scorer = DifferentiableQTheta(
checkpoint='checkpoints/Q_theta_phase2.pt',
device='cuda:0',
)
scorer.load_receptor(
holo_path='holo.pdb', rec_chain='A',
apo_path='apo.pdb', apo_chain='A',
)
q_holo = scorer.score('design.pdb', binder_chain='B', state='holo')
q_apo = scorer.score('design.pdb', binder_chain='B', state='apo')
print(f'S = {q_holo - q_apo:.3f}')
2b. CLI on the bundled sample
python code/scripts/evaluate.py \
--target cam \
--checkpoint checkpoints/Q_theta_phase2.pt \
--data_dir data/sample/ \
--outdir /tmp/cam_inference \
--no_wandb
Scores every binder in data/sample/cam/test.pkl and writes tables/eval_cam_test.json with Spearman ρ, AUC, and selectivity gap.
3. Guidance methods (PXDesign)
The shipped guidance code wraps PXDesign as the prior and uses Q_θ as the gradient / classifier signal.
| Script | Method |
|---|---|
code/scripts/pxdesign_guidance/langevin_pxdesign.py |
Post-hoc Langevin refinement |
code/scripts/pxdesign_guidance/smc_pxdesign.py |
Sequential Monte Carlo |
code/scripts/pxdesign_guidance/tds_pxdesign.py |
Twisted Diffusion Sampler |
code/scripts/pxdesign_guidance/guided_pxdesign.py |
Classifier guidance |
code/scripts/pxdesign_guidance/iterative_refinement.py |
Iterative refinement loop |
code/scripts/pxdesign_guidance/qtheta_pxdesign.py |
Q_θ wrapper used by the above |
Common flags:
--checkpoint checkpoints/Q_theta_phase2.pt--holo_pdb your_holo.pdb/--apo_pdb your_apo.pdb--output_dir designs/--device cuda:0--seed 42
Method-specific arguments (steps, batch sizes, guidance scales) are in each script's argparse block.
To plug Q_θ into RFdiffusion, Proteina-ComplexA, or any other backbone prior, see code/scripts/README.md.
4. Bundled sample data
data/sample/cam/test.pkl — held-out test split for Calmodulin (CaM), small enough to run on a laptop CPU in under a minute. The only data shipped in the repo. Score your own targets via the Python API in §2a (raw PDBs as input).
5. Training reproduction
Training data, training scripts, and per-target processed graphs are NOT shipped in this public release. The paper's main result (Phase 2 on the v4-S2 target-swap split) is provided as a frozen checkpoint at checkpoints/Q_theta_phase2.pt. Retraining requires the full pipeline (separate request).
6. Citation
@inproceedings{cao2026allogen,
title = {AlloGen: State-Selective Scoring for Allosteric Binder Design},
author = {Cao, Hanqun and others},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
year = {2026}
}