UniVideo: Unified Understanding, Generation, and Editing for Videos

Cong Wei^*,1,2 Quande Liu^†,2 Zixuan Ye² Qiulin Wang² Xintao Wang²

Pengfei Wan² Kun Gai² Wenhu Chen^†,1

¹University of Waterloo ²Kling Team, Kuaishou Technology
^*Work done during an internship at Kling Team, Kuaishou Technology ^†Corresponding author

🔔News

[2026-01-07]: Released Code and Model.
[2025-10-09]: Released Arxiv Preprint and the Project Page

How to use

Please refer to 🔗 GitHub for usage.

Acknowledgement

HunyuanVideo: the base video generation model used in this work. Thanks to the authors for their excellent contribution.
Qwen2.5-VL: the base vlm model used in this work. Thanks to the authors for their excellent contribution.
MetaQueries: we adopt their query implementation. Thanks to the authors for their excellent contribution.

🌟 Citation

If you find UniVideo useful for your research and applications, please cite using this BibTeX:

@article{wei2025univideo,
  title={Univideo: Unified understanding, generation, and editing for videos},
  author={Wei, Cong and Liu, Quande and Ye, Zixuan and Wang, Qiulin and Wang, Xintao and Wan, Pengfei and Gai, Kun and Chen, Wenhu},
  journal={arXiv preprint arXiv:2510.08377},
  year={2025}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Any-to-Any

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DollasAndSpence/UniVideo

Base model

Qwen/Qwen2.5-VL-7B-Instruct

Finetuned

(1128)

this model

Paper for DollasAndSpence/UniVideo

UniVideo: Unified Understanding, Generation, and Editing for Videos

Paper • 2510.08377 • Published Oct 9, 2025 • 81