Improve model card metadata and content
Browse filesHi! I'm Niels from the Hugging Face community team.
I've opened this PR to improve the model card for Matrix-Game 3.0. Specifically, I have:
- Added the `library_name: diffusers` tag based on the presence of `model_index.json` and Diffusers versioning in the config files.
- Updated the `pipeline_tag` to `text-to-video` for better discoverability.
- Linked the model to its research paper on the Hugging Face Hub.
- Added a "Quick Start" section with installation and inference instructions sourced from the official GitHub repository.
These changes will help users find and use your model more effectively!
README.md
CHANGED
|
@@ -1,96 +1,91 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
language:
|
| 4 |
-
- en
|
| 5 |
base_model:
|
| 6 |
- Wan-AI/Wan2.2-TI2V-5B
|
| 7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
---
|
|
|
|
| 9 |
# Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory
|
|
|
|
|
|
|
|
|
|
| 10 |
<div style="display: flex; justify-content: center; gap: 10px;">
|
| 11 |
<a href="https://github.com/SkyworkAI/Matrix-Game">
|
| 12 |
<img src="https://img.shields.io/badge/GitHub-100000?style=flat&logo=github&logoColor=white" alt="GitHub">
|
| 13 |
</a>
|
| 14 |
-
<a href="https://
|
| 15 |
-
<img src="https://img.shields.io/badge/
|
| 16 |
</a>
|
| 17 |
<a href="https://matrix-game-v3.github.io/">
|
| 18 |
<img src="https://img.shields.io/badge/Project%20Page-grey?style=flat&logo=huggingface&color=FFA500" alt="Project Page">
|
| 19 |
</a>
|
| 20 |
-
|
| 21 |
-
|
| 22 |
</div>
|
| 23 |
|
| 24 |
## π Overview
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
- Data Engine β an industrial-scale infinite data engine integrating Unreal Engine synthetic scenes, large-scale automated AAA game collection,and real-world video augmentation to produce high-quality Video-Pose-Action-Prompt quadruplets at scale;
|
| 30 |
-
- Model Training β a memory-augmented Diffusion Transformer (DiT) with an error buffer that learns action-conditioned generation with memory-enhanced long-horizon consistency;
|
| 31 |
-
- Inference Deployment β few-step sampling, INT8 quantization, and model distillation achieving 720p@40FPS real-time generation with a 5B model.
|
| 32 |
|
| 33 |

|
| 34 |
|
| 35 |
## β¨ Key Features
|
| 36 |
-
- π **
|
| 37 |
-
- π±οΈ **
|
| 38 |
-
- π¬ **
|
| 39 |
-
- π **Feature 3**: **Scale Up 28B-MoE Model**: Scaling up to a 2Γ14B model further improves generation quality, dynamics, and generalization.
|
| 40 |
-
|
| 41 |
-
## π₯ Latest Updates
|
| 42 |
-
|
| 43 |
-
* [2026-03] π Initial release of Matrix-Game-3.0 Model
|
| 44 |
|
| 45 |
## π Quick Start
|
|
|
|
| 46 |
### Installation
|
| 47 |
-
|
| 48 |
-
```
|
| 49 |
conda create -n matrix-game-3.0 python=3.12 -y
|
| 50 |
conda activate matrix-game-3.0
|
| 51 |
-
# install FlashAttention
|
| 52 |
-
# Our project also depends on [FlashAttention](https://github.com/Dao-AILab/flash-attention)
|
| 53 |
git clone https://github.com/SkyworkAI/Matrix-Game-3.0.git
|
| 54 |
cd Matrix-Game-3.0
|
| 55 |
pip install -r requirements.txt
|
| 56 |
```
|
| 57 |
|
| 58 |
-
### Model Download
|
| 59 |
-
```
|
| 60 |
-
pip install "huggingface_hub[cli]"
|
| 61 |
-
huggingface-cli download Matrix-Game-3.0 --local-dir Matrix-Game-3.0
|
| 62 |
-
```
|
| 63 |
### Inference
|
| 64 |
-
|
| 65 |
-
- Input image
|
| 66 |
-
- Text prompt
|
| 67 |
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
```
|
| 73 |
-
Tips:
|
| 74 |
-
If you want to use the base model, you can use "--use_base_model --num_inference_steps 50". Otherwise if you want to generating the interactive videos with your own input actions, you can use "--interactive".
|
| 75 |
-
With multiple GPUs, you can pass `--use_async_vae --async_vae_warmup_iters 1` to speed up inference.
|
| 76 |
|
| 77 |
## β Acknowledgements
|
| 78 |
-
- [Diffusers](https://github.com/huggingface/diffusers) for
|
| 79 |
-
- [
|
| 80 |
-
- [GameFactory](https://github.com/KwaiVGI/GameFactory)
|
| 81 |
-
- [LightX2V](https://github.com/ModelTC/lightx2v) for their excellent quantization framework
|
| 82 |
-
- [Wan2.2](https://github.com/Wan-Video/Wan2.2) for their strong base model
|
| 83 |
-
- [lingbot-world](https://github.com/Robbyant/lingbot-world) for their context parallel framework
|
| 84 |
|
| 85 |
## π Citation
|
| 86 |
-
If you find this work useful for your research, please
|
| 87 |
|
| 88 |
-
```
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
```
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
| 2 |
base_model:
|
| 3 |
- Wan-AI/Wan2.2-TI2V-5B
|
| 4 |
+
language:
|
| 5 |
+
- en
|
| 6 |
+
license: apache-2.0
|
| 7 |
+
pipeline_tag: text-to-video
|
| 8 |
+
library_name: diffusers
|
| 9 |
---
|
| 10 |
+
|
| 11 |
# Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory
|
| 12 |
+
|
| 13 |
+
Matrix-Game 3.0 is an open-source, memory-augmented interactive world model designed for 720p real-time long-form video generation. It achieves up to 40 FPS real-time generation at 720p resolution with a 5B model while maintaining stable memory consistency over minute-long sequences.
|
| 14 |
+
|
| 15 |
<div style="display: flex; justify-content: center; gap: 10px;">
|
| 16 |
<a href="https://github.com/SkyworkAI/Matrix-Game">
|
| 17 |
<img src="https://img.shields.io/badge/GitHub-100000?style=flat&logo=github&logoColor=white" alt="GitHub">
|
| 18 |
</a>
|
| 19 |
+
<a href="https://huggingface.co/papers/2604.08995">
|
| 20 |
+
<img src="https://img.shields.io/badge/Paper-b31b1b?style=flat&logo=arxiv&logoColor=white" alt="Paper">
|
| 21 |
</a>
|
| 22 |
<a href="https://matrix-game-v3.github.io/">
|
| 23 |
<img src="https://img.shields.io/badge/Project%20Page-grey?style=flat&logo=huggingface&color=FFA500" alt="Project Page">
|
| 24 |
</a>
|
|
|
|
|
|
|
| 25 |
</div>
|
| 26 |
|
| 27 |
## π Overview
|
| 28 |
+
The Matrix-Game 3.0 framework unifies three stages into an end-to-end pipeline:
|
| 29 |
+
- **Data Engine**: An upgraded industrial-scale data engine integrating Unreal Engine synthetic data and AAA game collection to produce high-quality Video-Pose-Action-Prompt quadruplets.
|
| 30 |
+
- **Model Training**: A memory-augmented Diffusion Transformer (DiT) that learns self-correction by modeling prediction residuals and employs camera-aware memory for long-horizon consistency.
|
| 31 |
+
- **Inference Deployment**: Multi-segment autoregressive distillation (DMD), model quantization, and VAE decoder pruning to achieve efficient real-time inference.
|
|
|
|
|
|
|
|
|
|
| 32 |
|
| 33 |

|
| 34 |
|
| 35 |
## β¨ Key Features
|
| 36 |
+
- π **Real-Time Performance**: Supports 720p @ 40fps generation with the 5B model.
|
| 37 |
+
- π±οΈ **Long-horizon Consistency**: Stable memory consistency over sequences lasting minutes.
|
| 38 |
+
- π¬ **Scalability**: Scaling to a 28B-MoE model (2x14B) further improves quality and generalization.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
|
| 40 |
## π Quick Start
|
| 41 |
+
|
| 42 |
### Installation
|
| 43 |
+
```bash
|
|
|
|
| 44 |
conda create -n matrix-game-3.0 python=3.12 -y
|
| 45 |
conda activate matrix-game-3.0
|
| 46 |
+
# install FlashAttention and other dependencies
|
|
|
|
| 47 |
git clone https://github.com/SkyworkAI/Matrix-Game-3.0.git
|
| 48 |
cd Matrix-Game-3.0
|
| 49 |
pip install -r requirements.txt
|
| 50 |
```
|
| 51 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 52 |
### Inference
|
| 53 |
+
After downloading the pretrained weights, you can generate an interactive video with the following command:
|
|
|
|
|
|
|
| 54 |
|
| 55 |
+
```bash
|
| 56 |
+
torchrun --nproc_per_node=$NUM_GPUS generate.py \
|
| 57 |
+
--size 704*1280 \
|
| 58 |
+
--dit_fsdp \
|
| 59 |
+
--t5_fsdp \
|
| 60 |
+
--ckpt_dir Matrix-Game-3.0 \
|
| 61 |
+
--fa_version 3 \
|
| 62 |
+
--use_int8 \
|
| 63 |
+
--num_iterations 12 \
|
| 64 |
+
--num_inference_steps 3 \
|
| 65 |
+
--image demo_images/000/image.png \
|
| 66 |
+
--prompt "a vintage gas station with a classic car parked under a canopy, set against a desert landscape." \
|
| 67 |
+
--save_name test \
|
| 68 |
+
--seed 42 \
|
| 69 |
+
--compile_vae \
|
| 70 |
+
--lightvae_pruning_rate 0.5 \
|
| 71 |
+
--vae_type mg_lightvae \
|
| 72 |
+
--output_dir ./output
|
| 73 |
```
|
|
|
|
|
|
|
|
|
|
| 74 |
|
| 75 |
## β Acknowledgements
|
| 76 |
+
- [Diffusers](https://github.com/huggingface/diffusers) for the diffusion model framework.
|
| 77 |
+
- [Wan2.2](https://github.com/Wan-Video/Wan2.2) for the strong base model.
|
| 78 |
+
- [Self-Forcing](https://github.com/guandeh17/Self-Forcing), [GameFactory](https://github.com/KwaiVGI/GameFactory), [LightX2V](https://github.com/ModelTC/lightx2v), and [lingbot-world](https://github.com/Robbyant/lingbot-world) for their contributions and frameworks.
|
|
|
|
|
|
|
|
|
|
| 79 |
|
| 80 |
## π Citation
|
| 81 |
+
If you find this work useful for your research, please cite:
|
| 82 |
|
| 83 |
+
```bibtex
|
| 84 |
+
@misc{2026matrix,
|
| 85 |
+
title={Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory},
|
| 86 |
+
author={{Skywork AI Matrix-Game Team}},
|
| 87 |
+
year={2026},
|
| 88 |
+
howpublished={Technical report},
|
| 89 |
+
url={https://github.com/SkyworkAI/Matrix-Game/blob/main/Matrix-Game-3/assets/pdf/report.pdf}
|
| 90 |
+
}
|
| 91 |
```
|