Worldscape-MoE Model Weights
This repository contains the model weights introduced in the paper: [Worldscape-MoE: A Unified Mixture-of-Experts Architecture for Scalable Multi-Control Video Generation World Modeling].
Worldscape-MoE is a unified world-model training framework for scalable multi-control video generation. It incorporates a Mixture-of-Experts (MoE) design into Diffusion Transformers to jointly learn from heterogeneous supervisory controls, including camera poses, robotic arms, and hand joints, within a single extensible world model.
By combining shared experts for cross-control world knowledge with modality-specific experts for control specialization, Worldscape-MoE enables positive transfer across diverse control modalities and supports strong embodied world modeling, out-of-distribution generalization, and loco-manipulation generation.
For more details about the project, please refer to the project page:
https://embodiedcity.github.io/Worldscape-MoE/
Demo video:
https://www.youtube.com/watch?v=8H3hJ3XDJFk
Paper, arXiv, and model weights will be released soon.
Citation
If this work has contributed to your research, welcome to cite it:
@misc{fang2026worldscapemoe,
title = {Worldscape-MoE: A Unified Mixture-of-Experts Architecture for Scalable Multi-Control Video Generation World Modeling},
author = {Jianjie Fang and Yongyan Xu and Ziyou Wang and Yuchao Huang and Zhaolu Wang and Rongze Tang and Mingyuan Jia and Baining Zhao and Weichen Zhang and Xin Zhang and Haisheng Su and Yu Shang and Chen Gao and Wei Wu and Xinlei Chen and Yong Li},
year = {2026},
note = {Project page: https://embodiedcity.github.io/Worldscape-MoE/}
}