# AlloGen Inference Guide This guide covers how to score binder designs and apply guidance with the bundled Q_θ checkpoint. Training is not part of the public release — only inference and guidance. > **Env var.** Throughout this doc, `${ALLOGEN_ROOT}` is the path to the cloned repo. Either `cd` into it and use relative paths, or `export ALLOGEN_ROOT=/path/to/AlloGen`. > **Python.** Use the env from `environment.yml` / `requirements.txt`. All scripts insert `code/` into `sys.path` via a `_CODE_DIR` boot block, so they work from any CWD. --- ## 1. Checkpoint The Phase 2 weights `checkpoints/Q_theta_phase2.pt` are the **v4-S2 target-swap split** model used in the paper. Phase 1 (`Q_theta_phase1.pt`) is the DockQ regression intermediate. Pull via Git LFS: ```bash git lfs install git lfs pull ``` --- ## 2. Score binders ### 2a. Python API ```python import sys sys.path.insert(0, 'code') from models.differentiable_features import DifferentiableQTheta scorer = DifferentiableQTheta( checkpoint='checkpoints/Q_theta_phase2.pt', device='cuda:0', ) scorer.load_receptor( holo_path='holo.pdb', rec_chain='A', apo_path='apo.pdb', apo_chain='A', ) q_holo = scorer.score('design.pdb', binder_chain='B', state='holo') q_apo = scorer.score('design.pdb', binder_chain='B', state='apo') print(f'S = {q_holo - q_apo:.3f}') ``` ### 2b. CLI on the bundled sample ```bash python code/scripts/evaluate.py \ --target cam \ --checkpoint checkpoints/Q_theta_phase2.pt \ --data_dir data/sample/ \ --outdir /tmp/cam_inference \ --no_wandb ``` Scores every binder in `data/sample/cam/test.pkl` and writes `tables/eval_cam_test.json` with Spearman ρ, AUC, and selectivity gap. --- ## 3. Guidance methods (PXDesign) The shipped guidance code wraps **PXDesign** as the prior and uses Q_θ as the gradient / classifier signal. | Script | Method | |---|---| | `code/scripts/pxdesign_guidance/langevin_pxdesign.py` | Post-hoc Langevin refinement | | `code/scripts/pxdesign_guidance/smc_pxdesign.py` | Sequential Monte Carlo | | `code/scripts/pxdesign_guidance/tds_pxdesign.py` | Twisted Diffusion Sampler | | `code/scripts/pxdesign_guidance/guided_pxdesign.py` | Classifier guidance | | `code/scripts/pxdesign_guidance/iterative_refinement.py` | Iterative refinement loop | | `code/scripts/pxdesign_guidance/qtheta_pxdesign.py` | Q_θ wrapper used by the above | Common flags: - `--checkpoint checkpoints/Q_theta_phase2.pt` - `--holo_pdb your_holo.pdb` / `--apo_pdb your_apo.pdb` - `--output_dir designs/` - `--device cuda:0` - `--seed 42` Method-specific arguments (steps, batch sizes, guidance scales) are in each script's `argparse` block. To plug Q_θ into RFdiffusion, Proteina-ComplexA, or any other backbone prior, see `code/scripts/README.md`. --- ## 4. Bundled sample data `data/sample/cam/test.pkl` — held-out test split for Calmodulin (CaM), small enough to run on a laptop CPU in under a minute. **The only data shipped in the repo.** Score your own targets via the Python API in §2a (raw PDBs as input). --- ## 5. Training reproduction Training data, training scripts, and per-target processed graphs are NOT shipped in this public release. The paper's main result (Phase 2 on the **v4-S2 target-swap** split) is provided as a frozen checkpoint at `checkpoints/Q_theta_phase2.pt`. Retraining requires the full pipeline (separate request). --- ## 6. Citation ```bibtex @inproceedings{cao2026allogen, title = {AlloGen: State-Selective Scoring for Allosteric Binder Design}, author = {Cao, Hanqun and others}, booktitle = {Advances in Neural Information Processing Systems (NeurIPS)}, year = {2026} } ```