YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
TD3B: Transition-Directed Discrete Diffusion for Allosteric Binder Generation
TD3B is a sequence-based generative framework that designs peptide binders with specified agonist or antagonist behavior. It combines a Direction Oracle, a soft binding-affinity gate, and amortized fine-tuning of a pre-trained discrete diffusion model (MDLM).
Installation
conda env create -f env.yml
conda activate td3b
pip install -e .
Data and Checkpoints
Download the pretrained checkpoints and data from Google Drive (TBA).
Place the files as follows:
TD3B/
βββ checkpoints/
β βββ pretrained.ckpt # Pre-trained MDLM weights
β βββ td3b.ckpt # Fine-tuned TD3B model
β βββ direction_oracle.pt # Direction Oracle weights
βββ data/
β βββ train.csv # Training set (target-binder pairs)
β βββ test.csv # Test set
βββ scoring/functions/classifiers/
β βββ binding-affinity.pt
β βββ hemolysis-xgboost.json
β βββ nonfouling-xgboost.json
β βββ permeability-xgboost.json
β βββ solubility-xgboost.json
βββ tokenizer/
βββ new_vocab.txt
βββ new_splits.txt
Code Structure
TD3B/
βββ inference.py # Generate binders (main inference entry point)
βββ finetune_multi_target.py # Multi-target TD3B training
βββ finetune_utils.py # Training utilities
βββ launch_multi_target.sh # Training launcher script
βββ diffusion.py # MDLM backbone (TR2-D2)
βββ roformer.py # RoFormer wrapper
βββ noise_schedule.py # Noise schedules
βββ peptide_mcts.py # MCTS tree search
βββ td3b/
β βββ direction_oracle.py # Direction Oracle (f_Ο)
β βββ td3b_scoring.py # Gated reward R = g_Ο Β· Ο(d*Β·(f_Οβ0.5)/Ο)
β βββ td3b_losses.py # L_WDCE + λ·L_ctr + Ξ²Β·L_KL
β βββ td3b_mcts.py # TD3B-extended MCTS
β βββ td3b_finetune.py # Training loop
β βββ data_utils.py # Data loading utilities
βββ scoring/ # Affinity predictor (g_Ο) and property classifiers
βββ baselines/ # CG, SMC, TDS, PepTune, Unguided baselines
βββ tokenizer/ # SMILES tokenizer (vocab + splits)
βββ configs/ # Model and training configs
βββ utils/ # Misc utilities
Inference
Generate agonist/antagonist binders for target proteins:
python inference.py \
--ckpt_path checkpoints/td3b.ckpt \
--val_csv data/test.csv \
--save_path results/ \
--seed 42 \
--num_pool 32 \
--val_samples_per_target 8 \
--resample_alpha 0.1
This generates 32 candidates per (target, direction), scores them with the Direction Oracle and affinity predictor, applies Algorithm 2 weighted resampling, and saves only valid peptide samples.
Output: results/td3b_results_seed42.csv with columns: target, sequence, direction, affinity, gated_reward, direction_oracle, direction_accuracy.
Training
Multi-target TD3B
- Edit
launch_multi_target.shβ set paths to checkpoints, data, and oracle:
BASE_PATH="/path/to/TD3B"
PRETRAINED_CHECKPOINT="${BASE_PATH}/checkpoints/pretrained.ckpt"
TRAIN_CSV="${BASE_PATH}/data/train.csv"
ORACLE_CKPT="${BASE_PATH}/checkpoints/direction_oracle.pt"
- Launch training:
bash launch_multi_target.sh
Key hyperparameters (in launch_multi_target.sh):
CONTRASTIVE_WEIGHT=0.1β Ξ» for L_ctrKL_BETA=0.1β Ξ² for L_KLSIGMOID_TEMPERATURE=0.1β Ο for gated rewardNUM_ITER=20β MCTS iterations per roundNUM_CHILDREN=16β Children per MCTS expansion
Baselines
Run baseline methods (CG, SMC, TDS, PepTune, Unguided):
cd baselines/
bash run.sh --baseline cg --device cuda:0
bash run.sh --baseline smc --device cuda:0
bash run.sh --baseline tds --device cuda:0
Citation
@inproceedings{
cao2026td3b,
title={TD3B: Transition-Directed Discrete Diffusion for Allosteric Binder Generation},
author={Anonymous},
booktitle={Forty-third International Conference on Machine Learning},
year={2026},
url={https://openreview.net/forum?id=RNuC8Nj6rD}
}
