nielsr HF Staff commited on
Commit
e5cd6f9
Β·
verified Β·
1 Parent(s): 4ee434c

Improve model card metadata and content

Browse files

Hi! I'm Niels from the Hugging Face community team.

I've opened this PR to improve the model card for Matrix-Game 3.0. Specifically, I have:
- Added the `library_name: diffusers` tag based on the presence of `model_index.json` and Diffusers versioning in the config files.
- Updated the `pipeline_tag` to `text-to-video` for better discoverability.
- Linked the model to its research paper on the Hugging Face Hub.
- Added a "Quick Start" section with installation and inference instructions sourced from the official GitHub repository.

These changes will help users find and use your model more effectively!

Files changed (1) hide show
  1. README.md +52 -57
README.md CHANGED
@@ -1,96 +1,91 @@
1
  ---
2
- license: apache-2.0
3
- language:
4
- - en
5
  base_model:
6
  - Wan-AI/Wan2.2-TI2V-5B
7
- pipeline_tag: image-text-to-video
 
 
 
 
8
  ---
 
9
  # Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory
 
 
 
10
  <div style="display: flex; justify-content: center; gap: 10px;">
11
  <a href="https://github.com/SkyworkAI/Matrix-Game">
12
  <img src="https://img.shields.io/badge/GitHub-100000?style=flat&logo=github&logoColor=white" alt="GitHub">
13
  </a>
14
- <a href="https://github.com/SkyworkAI/Matrix-Game/blob/main/Matrix-Game-3/assets/pdf/report.pdf">
15
- <img src="https://img.shields.io/badge/Technical Report-b31b1b?style=flat&logo=arxiv&logoColor=white" alt="report">
16
  </a>
17
  <a href="https://matrix-game-v3.github.io/">
18
  <img src="https://img.shields.io/badge/Project%20Page-grey?style=flat&logo=huggingface&color=FFA500" alt="Project Page">
19
  </a>
20
-
21
-
22
  </div>
23
 
24
  ## πŸ“ Overview
25
- **Matrix-Game-3.0** is an open-sourced, memory-augmented interactive world model designed for 720p real-time long-form video generation.
26
-
27
- ## Framework Overview
28
- Our framework unifies three stages into an end-to-end pipeline:
29
- - Data Engine β€” an industrial-scale infinite data engine integrating Unreal Engine synthetic scenes, large-scale automated AAA game collection,and real-world video augmentation to produce high-quality Video-Pose-Action-Prompt quadruplets at scale;
30
- - Model Training β€” a memory-augmented Diffusion Transformer (DiT) with an error buffer that learns action-conditioned generation with memory-enhanced long-horizon consistency;
31
- - Inference Deployment β€” few-step sampling, INT8 quantization, and model distillation achieving 720p@40FPS real-time generation with a 5B model.
32
 
33
  ![Model Overview](./framework.png)
34
 
35
  ## ✨ Key Features
36
- - πŸš€ **Feature 1**: **Upgraded Data Engine**: Combines Unreal Engine-based synthetic data, large-scale automated AAA game data, and real-world video augmentation to generate high-quality Video–Pose–Action–Prompt data.
37
- - πŸ–±οΈ **Feature 2**: **Long-horizon Memory & Consistency**: Uses prediction residuals and frame re-injection for self-correction, while camera-aware memory ensures long-term spatiotemporal consistency.
38
- - 🎬 **Feature 3**: **Real-Time Interactivity & Open Access**: It employs a multi-segment autoregressive distillation strategy based on Distribution Matching Distillation (DMD), combined with model quantization and VAE decoder distillation to support [40fps] real-time generation at 720p resolution with a 5B model, while maintaining stable memory consistency over minute-long sequence.
39
- - πŸ‘ **Feature 3**: **Scale Up 28B-MoE Model**: Scaling up to a 2Γ—14B model further improves generation quality, dynamics, and generalization.
40
-
41
- ## πŸ”₯ Latest Updates
42
-
43
- * [2026-03] πŸŽ‰ Initial release of Matrix-Game-3.0 Model
44
 
45
  ## πŸš€ Quick Start
 
46
  ### Installation
47
- Create a conda environment and install dependencies:
48
- ```
49
  conda create -n matrix-game-3.0 python=3.12 -y
50
  conda activate matrix-game-3.0
51
- # install FlashAttention
52
- # Our project also depends on [FlashAttention](https://github.com/Dao-AILab/flash-attention)
53
  git clone https://github.com/SkyworkAI/Matrix-Game-3.0.git
54
  cd Matrix-Game-3.0
55
  pip install -r requirements.txt
56
  ```
57
 
58
- ### Model Download
59
- ```
60
- pip install "huggingface_hub[cli]"
61
- huggingface-cli download Matrix-Game-3.0 --local-dir Matrix-Game-3.0
62
- ```
63
  ### Inference
64
- Before running inference, you need to prepare:
65
- - Input image
66
- - Text prompt
67
 
68
- After downloading pretrained models, you can use the following command to generate an interactive video with random actions:
69
- ``` sh
70
- torchrun --nproc_per_node=$NUM_GPUS generate.py --size 704*1280 --dit_fsdp --t5_fsdp --ckpt_dir Matrix-Game-3.0 --fa_version 3 --use_int8 --num_iterations 12 --num_inference_steps 3 --image demo_images/000/image.png --prompt "a vintage gas station with a classic car parked under a canopy, set against a desert landscape." --save_name test --seed 42 --compile_vae --lightvae_pruning_rate 0.5 --vae_type mg_lightvae --output_dir ./output
71
- # "num_iterations" refers to the number of iterations you want to generate. The total number of frames generated is given by:57 + (num_iterations - 1) * 40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
  ```
73
- Tips:
74
- If you want to use the base model, you can use "--use_base_model --num_inference_steps 50". Otherwise if you want to generating the interactive videos with your own input actions, you can use "--interactive".
75
- With multiple GPUs, you can pass `--use_async_vae --async_vae_warmup_iters 1` to speed up inference.
76
 
77
  ## ⭐ Acknowledgements
78
- - [Diffusers](https://github.com/huggingface/diffusers) for their excellent diffusion model framework
79
- - [Self-Forcing](https://github.com/guandeh17/Self-Forcing) for their excellent work
80
- - [GameFactory](https://github.com/KwaiVGI/GameFactory) for their idea of action control module
81
- - [LightX2V](https://github.com/ModelTC/lightx2v) for their excellent quantization framework
82
- - [Wan2.2](https://github.com/Wan-Video/Wan2.2) for their strong base model
83
- - [lingbot-world](https://github.com/Robbyant/lingbot-world) for their context parallel framework
84
 
85
  ## πŸ“– Citation
86
- If you find this work useful for your research, please kindly cite our paper:
87
 
88
- ```
89
- @misc{2026matrix,
90
- title={Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory},
91
- author={{Skywork AI Matrix-Game Team}},
92
- year={2026},
93
- howpublished={Technical report},
94
- url={https://github.com/SkyworkAI/Matrix-Game/blob/main/Matrix-Game-3/assets/pdf/report.pdf}
95
- }
96
  ```
 
1
  ---
 
 
 
2
  base_model:
3
  - Wan-AI/Wan2.2-TI2V-5B
4
+ language:
5
+ - en
6
+ license: apache-2.0
7
+ pipeline_tag: text-to-video
8
+ library_name: diffusers
9
  ---
10
+
11
  # Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory
12
+
13
+ Matrix-Game 3.0 is an open-source, memory-augmented interactive world model designed for 720p real-time long-form video generation. It achieves up to 40 FPS real-time generation at 720p resolution with a 5B model while maintaining stable memory consistency over minute-long sequences.
14
+
15
  <div style="display: flex; justify-content: center; gap: 10px;">
16
  <a href="https://github.com/SkyworkAI/Matrix-Game">
17
  <img src="https://img.shields.io/badge/GitHub-100000?style=flat&logo=github&logoColor=white" alt="GitHub">
18
  </a>
19
+ <a href="https://huggingface.co/papers/2604.08995">
20
+ <img src="https://img.shields.io/badge/Paper-b31b1b?style=flat&logo=arxiv&logoColor=white" alt="Paper">
21
  </a>
22
  <a href="https://matrix-game-v3.github.io/">
23
  <img src="https://img.shields.io/badge/Project%20Page-grey?style=flat&logo=huggingface&color=FFA500" alt="Project Page">
24
  </a>
 
 
25
  </div>
26
 
27
  ## πŸ“ Overview
28
+ The Matrix-Game 3.0 framework unifies three stages into an end-to-end pipeline:
29
+ - **Data Engine**: An upgraded industrial-scale data engine integrating Unreal Engine synthetic data and AAA game collection to produce high-quality Video-Pose-Action-Prompt quadruplets.
30
+ - **Model Training**: A memory-augmented Diffusion Transformer (DiT) that learns self-correction by modeling prediction residuals and employs camera-aware memory for long-horizon consistency.
31
+ - **Inference Deployment**: Multi-segment autoregressive distillation (DMD), model quantization, and VAE decoder pruning to achieve efficient real-time inference.
 
 
 
32
 
33
  ![Model Overview](./framework.png)
34
 
35
  ## ✨ Key Features
36
+ - πŸš€ **Real-Time Performance**: Supports 720p @ 40fps generation with the 5B model.
37
+ - πŸ–±οΈ **Long-horizon Consistency**: Stable memory consistency over sequences lasting minutes.
38
+ - 🎬 **Scalability**: Scaling to a 28B-MoE model (2x14B) further improves quality and generalization.
 
 
 
 
 
39
 
40
  ## πŸš€ Quick Start
41
+
42
  ### Installation
43
+ ```bash
 
44
  conda create -n matrix-game-3.0 python=3.12 -y
45
  conda activate matrix-game-3.0
46
+ # install FlashAttention and other dependencies
 
47
  git clone https://github.com/SkyworkAI/Matrix-Game-3.0.git
48
  cd Matrix-Game-3.0
49
  pip install -r requirements.txt
50
  ```
51
 
 
 
 
 
 
52
  ### Inference
53
+ After downloading the pretrained weights, you can generate an interactive video with the following command:
 
 
54
 
55
+ ```bash
56
+ torchrun --nproc_per_node=$NUM_GPUS generate.py \
57
+ --size 704*1280 \
58
+ --dit_fsdp \
59
+ --t5_fsdp \
60
+ --ckpt_dir Matrix-Game-3.0 \
61
+ --fa_version 3 \
62
+ --use_int8 \
63
+ --num_iterations 12 \
64
+ --num_inference_steps 3 \
65
+ --image demo_images/000/image.png \
66
+ --prompt "a vintage gas station with a classic car parked under a canopy, set against a desert landscape." \
67
+ --save_name test \
68
+ --seed 42 \
69
+ --compile_vae \
70
+ --lightvae_pruning_rate 0.5 \
71
+ --vae_type mg_lightvae \
72
+ --output_dir ./output
73
  ```
 
 
 
74
 
75
  ## ⭐ Acknowledgements
76
+ - [Diffusers](https://github.com/huggingface/diffusers) for the diffusion model framework.
77
+ - [Wan2.2](https://github.com/Wan-Video/Wan2.2) for the strong base model.
78
+ - [Self-Forcing](https://github.com/guandeh17/Self-Forcing), [GameFactory](https://github.com/KwaiVGI/GameFactory), [LightX2V](https://github.com/ModelTC/lightx2v), and [lingbot-world](https://github.com/Robbyant/lingbot-world) for their contributions and frameworks.
 
 
 
79
 
80
  ## πŸ“– Citation
81
+ If you find this work useful for your research, please cite:
82
 
83
+ ```bibtex
84
+ @misc{2026matrix,
85
+ title={Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory},
86
+ author={{Skywork AI Matrix-Game Team}},
87
+ year={2026},
88
+ howpublished={Technical report},
89
+ url={https://github.com/SkyworkAI/Matrix-Game/blob/main/Matrix-Game-3/assets/pdf/report.pdf}
90
+ }
91
  ```