AlloGen

AlloGen method overview

State-selectivity scoring + guided generation for allosteric binder design.

🧪 One-click demo for biology users: — score CaM binders and run Q_θ-guided PXDesign sampling in 5 minutes. Notebook lives at notebooks/AlloGen_CaM_demo.ipynb.

AlloGen trains a scorer Q_θ(X, Y) ∈ (0,1) that ranks how well a binder Y discriminates a target's holo (active) state X¹ from its apo (inactive) state X⁰. The selectivity score is:

S(Y) = Q_θ(X¹, Y) − Q_θ(X⁰, Y)

Q_θ serves as both a re-ranker (best-of-K) and a gradient signal for guided generation on top of frozen priors (RFdiffusion, PXDesign, Proteina-ComplexA) via Langevin, SMC, TDS, or classifier guidance.

This repository accompanies the paper AlloGen: State-Selective Scoring for Allosteric Binder Design (NeurIPS 2026).

Installation

conda env create -f environment.yml
conda activate allogen

Or pip-only:

python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

Python 3.10 + PyTorch 2.x are required. A CUDA GPU is recommended for guidance, but CPU works for scoring single designs.

Inference quickstart

# Score the bundled CaM inference sample against the v4-S2 (target-swap) checkpoint
python code/scripts/evaluate.py \
    --target cam \
    --checkpoint checkpoints/Q_theta_phase2.pt \
    --data_dir data/sample/ \
    --outdir /tmp/cam_inference \
    --no_wandb

See inference.md for the scoring API + guidance command lines.

Repo layout

code/
  data/           dataset / graph construction, PDB I/O, target YAMLs
  models/         Q_θ scorer (graph transformer) + differentiable wrapper
  trainers/       two-phase training loop (DockQ regression + selectivity)
  utils/          PDB I/O, backbone frames, SAM optimizer
  scripts/        evaluate, rescore, PXDesign guidance (see scripts/README.md)
checkpoints/      Q_θ paper weights (v4-S2 target-swap split, via Git LFS)
data/sample/      tiny CaM inference sample (test split only)

Checkpoints

Paper weights for the v4-S2 target-swap split are bundled via Git LFS:

git lfs install
git lfs pull

File	Use
`checkpoints/Q_theta_phase1.pt`	Phase 1 (DockQ regression) intermediate checkpoint
`checkpoints/Q_theta_phase2.pt`	Phase 2 (selectivity) — main paper result
`checkpoints/Q_theta_train_curve.csv`	Training curve metadata

Scoring a single design

import sys; sys.path.insert(0, 'code')
from models.differentiable_features import DifferentiableQTheta

scorer = DifferentiableQTheta(
    checkpoint='checkpoints/Q_theta_phase2.pt',
    device='cuda:0',
)
scorer.load_receptor(
    holo_path='your_holo.pdb', rec_chain='A',
    apo_path='your_apo.pdb',   apo_chain='A',
)
q_holo = scorer.score('design.pdb', binder_chain='B', state='holo')
q_apo  = scorer.score('design.pdb', binder_chain='B', state='apo')
print(f'S = {q_holo - q_apo:.3f}')

Guidance methods

The shipped guidance code wraps PXDesign as the prior and uses Q_θ as the gradient / classifier signal. All four method variants (Langevin, SMC, TDS, classifier guidance) live in code/scripts/pxdesign_guidance/.

See inference.md §3 for command lines.

To deploy Q_θ with RFdiffusion, Proteina-ComplexA, or any other backbone prior, see code/scripts/README.md — Q_θ exposes DifferentiableQTheta for ∇_x S(x), and the PXDesign code is a worked template to mirror.

Citation

@inproceedings{cao2026allogen,
  title     = {AlloGen: State-Selective Scoring for Allosteric Binder Design},
  author    = {Cao, Hanqun and others},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  year      = {2026}
}

(BibTeX key will be finalized at camera-ready.)

License

MIT — see LICENSE.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support