AlloGen / inference.md
chq1155's picture
AlloGen public release: Q_theta scorer + PXDesign guidance + Colab demo
ad9572d

AlloGen Inference Guide

This guide covers how to score binder designs and apply guidance with the bundled Q_θ checkpoint. Training is not part of the public release — only inference and guidance.

Env var. Throughout this doc, ${ALLOGEN_ROOT} is the path to the cloned repo. Either cd into it and use relative paths, or export ALLOGEN_ROOT=/path/to/AlloGen.

Python. Use the env from environment.yml / requirements.txt. All scripts insert code/ into sys.path via a _CODE_DIR boot block, so they work from any CWD.


1. Checkpoint

The Phase 2 weights checkpoints/Q_theta_phase2.pt are the v4-S2 target-swap split model used in the paper. Phase 1 (Q_theta_phase1.pt) is the DockQ regression intermediate.

Pull via Git LFS:

git lfs install
git lfs pull

2. Score binders

2a. Python API

import sys
sys.path.insert(0, 'code')

from models.differentiable_features import DifferentiableQTheta

scorer = DifferentiableQTheta(
    checkpoint='checkpoints/Q_theta_phase2.pt',
    device='cuda:0',
)
scorer.load_receptor(
    holo_path='holo.pdb', rec_chain='A',
    apo_path='apo.pdb',   apo_chain='A',
)
q_holo = scorer.score('design.pdb', binder_chain='B', state='holo')
q_apo  = scorer.score('design.pdb', binder_chain='B', state='apo')
print(f'S = {q_holo - q_apo:.3f}')

2b. CLI on the bundled sample

python code/scripts/evaluate.py \
    --target cam \
    --checkpoint checkpoints/Q_theta_phase2.pt \
    --data_dir data/sample/ \
    --outdir /tmp/cam_inference \
    --no_wandb

Scores every binder in data/sample/cam/test.pkl and writes tables/eval_cam_test.json with Spearman ρ, AUC, and selectivity gap.


3. Guidance methods (PXDesign)

The shipped guidance code wraps PXDesign as the prior and uses Q_θ as the gradient / classifier signal.

Script Method
code/scripts/pxdesign_guidance/langevin_pxdesign.py Post-hoc Langevin refinement
code/scripts/pxdesign_guidance/smc_pxdesign.py Sequential Monte Carlo
code/scripts/pxdesign_guidance/tds_pxdesign.py Twisted Diffusion Sampler
code/scripts/pxdesign_guidance/guided_pxdesign.py Classifier guidance
code/scripts/pxdesign_guidance/iterative_refinement.py Iterative refinement loop
code/scripts/pxdesign_guidance/qtheta_pxdesign.py Q_θ wrapper used by the above

Common flags:

  • --checkpoint checkpoints/Q_theta_phase2.pt
  • --holo_pdb your_holo.pdb / --apo_pdb your_apo.pdb
  • --output_dir designs/
  • --device cuda:0
  • --seed 42

Method-specific arguments (steps, batch sizes, guidance scales) are in each script's argparse block.

To plug Q_θ into RFdiffusion, Proteina-ComplexA, or any other backbone prior, see code/scripts/README.md.


4. Bundled sample data

data/sample/cam/test.pkl — held-out test split for Calmodulin (CaM), small enough to run on a laptop CPU in under a minute. The only data shipped in the repo. Score your own targets via the Python API in §2a (raw PDBs as input).


5. Training reproduction

Training data, training scripts, and per-target processed graphs are NOT shipped in this public release. The paper's main result (Phase 2 on the v4-S2 target-swap split) is provided as a frozen checkpoint at checkpoints/Q_theta_phase2.pt. Retraining requires the full pipeline (separate request).


6. Citation

@inproceedings{cao2026allogen,
  title     = {AlloGen: State-Selective Scoring for Allosteric Binder Design},
  author    = {Cao, Hanqun and others},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  year      = {2026}
}