ID-LoRA-CelebVHQ
This repository contains the ID-LoRA checkpoint trained on the CelebV-HQ dataset, as introduced in the paper ID-LoRA: Identity-Driven Audio-Video Personalization with In-Context LoRA.
Project Page | GitHub | Paper
ID-LoRA (Identity-Driven In-Context LoRA) jointly generates a subject's appearance and voice in a single model, letting a text prompt, a reference image, and a short audio clip govern both modalities together. Built on top of LTX-2, it is the first method to personalize visual appearance and voice within a single generative pass.
Details
| Property | Value |
|---|---|
| Base model | LTX-2 19B |
| Training dataset | CelebV-HQ |
| LoRA rank | 128 |
| Training steps | 6,000 |
| Strategy | audio_ref_only_ic with negative temporal positions |
Usage
To use this checkpoint, please follow the installation instructions in the official GitHub repository.
Two-Stage Inference (Recommended)
The two-stage pipeline generates at the target resolution, then spatially upsamples 2x with a distilled LoRA for sharper output.
python scripts/inference_two_stage.py \
--lora-path lora_weights.safetensors \
--reference-audio reference_speaker.wav \
--first-frame first_frame.png \
--prompt "[VISUAL]: A person speaks in a sunlit park... [SPEECH]: Hello world... [SOUNDS]: ..." \
--output-dir outputs/
Files
lora_weights.safetensors-- LoRA adapter weights (~1.1 GB)training_config.yaml-- Training configuration used to produce this checkpoint
Citation
@misc{dahan2026idloraidentitydrivenaudiovideopersonalization,
title = {ID-LoRA: Identity-Driven Audio-Video Personalization with In-Context LoRA},
author = {Aviad Dahan and Moran Yanuka and Noa Kraicer and Lior Wolf and Raja Giryes},
year = {2026},
eprint = {2603.10256},
archivePrefix = {arXiv},
primaryClass = {cs.SD},
url = {https://arxiv.org/abs/2603.10256}
}
- Downloads last month
- 43
Model tree for AviadDahan/ID-LoRA-CelebVHQ
Base model
Lightricks/LTX-Video