AlloGen / code /scripts /README.md
chq1155's picture
AlloGen public release: Q_theta scorer + PXDesign guidance + Colab demo
ad9572d

code/scripts/ — entry points

This public release ships only the inference and sampling code for Q_θ.

File / dir Purpose
evaluate.py Score binders in a pre-built *.pkl test set with a Q_θ checkpoint; reports Spearman ρ, AUC, selectivity gap.
rescore.py Re-score raw PDB designs (binder + holo + apo) with Q_θ.
pxdesign_guidance/ PXDesign-prior guidance with Q_θ (Langevin / SMC / TDS / classifier).

Training, baseline scoring (ProteinMPNN / ESM-IF / Rosetta / DFIRE / energy panel), guidance for RFdiffusion / Proteina-ComplexA, and paper-figure aggregation are not shipped; the inference path above is the only supported surface for the public release.


Deploying Q_θ with other base models

Q_θ provides two interfaces:

  1. Re-ranker (best-of-K). Given K candidate binders from any prior, score each with S(Y) = Q_θ(X¹, Y) − Q_θ(X⁰, Y) and pick the top. No gradient signal needed; the prior is unmodified.
  2. Gradient signal for guidance. Compute ∇_Y S(Y) via DifferentiableQTheta (in code/models/differentiable_features.py) and inject into the prior's sampler (Langevin step, SMC weight, TDS twist, classifier guidance score).

The pxdesign_guidance/ subdir is a worked example of interface (2) wrapping PXDesign. To plug Q_θ into another prior, mirror that pattern:

RFdiffusion

  1. Clone RFdiffusion: https://github.com/RosettaCommons/RFdiffusion.
  2. Follow its install + checkpoint download.
  3. In RFdiffusion's diffusion loop, after each denoising step, materialize the predicted backbone, build the holo/apo graph inputs expected by DifferentiableQTheta, and either:
    • Apply a Langevin nudge: x ← x + η · ∇_x S(x).
    • Add a classifier-guidance term to the denoiser's xt-1 mean: μ' = μ + s · σ² · ∇_x log p(y|x), where log p(y|x) ≈ S(x) (Q_θ is treated as the log-likelihood of "is good binder").
  4. Reference template: pxdesign_guidance/guided_pxdesign.py.

Proteina-ComplexA

  1. Clone Proteina: https://github.com/proteinabio/proteina-complexa (or the released artifact).
  2. Use its ComplexA mode that emits binder coords conditioned on a receptor.
  3. Same plug pattern as RFdiffusion — wrap the sampler with DifferentiableQTheta for guidance, or run unguided and re-rank with evaluate.py / rescore.py.

Any backbone prior

The only contract Q_θ enforces:

  • Receptor input is a PDB with holo and apo coordinates.
  • Binder input is a PDB (or coords) with chain id distinct from receptor's.
  • For guidance, expose differentiable Cα + backbone coordinates so ∇_x S(x) flows.

See code/models/differentiable_features.py:DifferentiableQTheta for the exact interface (load_receptor(holo_path, apo_path, …), score(design_path, binder_chain, state), .differentiable_score(coords, …)).


Why other guidance scripts aren't shipped

The RFdiffusion / Proteina guidance variants in our internal tree depend on those projects' un-released CIF formats and patched samplers; we don't want to ship modified third-party code. The PXDesign variants we do ship use only PXDesign's public API and are self-contained.

For citation / reproduction context, see the paper §4 (guidance methods).