DollasAndSpence
/

Any-to-Any
English
video

UniVideo: Unified Understanding, Generation, and Editing for Videos

Cong Wei*,1,2 โ€‚ Quande Liuโ€ ,2 โ€‚ Zixuan Ye2 โ€‚ Qiulin Wang2 โ€‚ Xintao Wang2

Pengfei Wan2 โ€‚ Kun Gai2 โ€‚ Wenhu Chenโ€ ,1

1University of Waterloo    2Kling Team, Kuaishou Technology
*Work done during an internship at Kling Team, Kuaishou Technology โ€ Corresponding author

     

๐Ÿ””News

How to use

Acknowledgement

  • HunyuanVideo: the base video generation model used in this work. Thanks to the authors for their excellent contribution.
  • Qwen2.5-VL: the base vlm model used in this work. Thanks to the authors for their excellent contribution.
  • MetaQueries: we adopt their query implementation. Thanks to the authors for their excellent contribution.

๐ŸŒŸ Citation

If you find UniVideo useful for your research and applications, please cite using this BibTeX:

@article{wei2025univideo,
  title={Univideo: Unified understanding, generation, and editing for videos},
  author={Wei, Cong and Liu, Quande and Ye, Zixuan and Wang, Qiulin and Wang, Xintao and Wan, Pengfei and Gai, Kun and Chen, Wenhu},
  journal={arXiv preprint arXiv:2510.08377},
  year={2025}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for DollasAndSpence/UniVideo

Finetuned
(1128)
this model

Paper for DollasAndSpence/UniVideo