Worldscape-MoE Model Weights

This repository contains the model weights introduced in the paper: [Worldscape-MoE: A Unified Mixture-of-Experts Architecture for Scalable Multi-Control Video Generation World Modeling].

Worldscape-MoE is a unified world-model training framework for scalable multi-control video generation. It incorporates a Mixture-of-Experts (MoE) design into Diffusion Transformers to jointly learn from heterogeneous supervisory controls, including camera poses, robotic arms, and hand joints, within a single extensible world model.

By combining shared experts for cross-control world knowledge with modality-specific experts for control specialization, Worldscape-MoE enables positive transfer across diverse control modalities and supports strong embodied world modeling, out-of-distribution generalization, and loco-manipulation generation.

For more details about the project, please refer to the project page: https://embodiedcity.github.io/Worldscape-MoE/

Demo video: https://www.youtube.com/watch?v=8H3hJ3XDJFk

Paper, arXiv, and model weights will be released soon.

Citation

If this work has contributed to your research, welcome to cite it:

@misc{fang2026worldscapemoe,
  title        = {Worldscape-MoE: A Unified Mixture-of-Experts Architecture for Scalable Multi-Control Video Generation World Modeling},
  author       = {Jianjie Fang and Yongyan Xu and Ziyou Wang and Yuchao Huang and Zhaolu Wang and Rongze Tang and Mingyuan Jia and Baining Zhao and Weichen Zhang and Xin Zhang and Haisheng Su and Yu Shang and Chen Gao and Wei Wu and Xinlei Chen and Yong Li},
  year         = {2026},
  note         = {Project page: https://embodiedcity.github.io/Worldscape-MoE/}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support