Improve model card metadata and content

Hi! I'm Niels from the Hugging Face community team.

I've opened this PR to improve the model card for Matrix-Game 3.0. Specifically, I have:
- Added the `library_name: diffusers` tag based on the presence of `model_index.json` and Diffusers versioning in the config files.
- Updated the `pipeline_tag` to `text-to-video` for better discoverability.
- Linked the model to its research paper on the Hugging Face Hub.
- Added a "Quick Start" section with installation and inference instructions sourced from the official GitHub repository.

These changes will help users find and use your model more effectively!

Files changed (1) hide show

README.md +52 -57

README.md CHANGED Viewed

@@ -1,96 +1,91 @@
 ---
-license: apache-2.0
-language:
-- en
 base_model:
 - Wan-AI/Wan2.2-TI2V-5B
-pipeline_tag: image-text-to-video
 ---
 # Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory
 <div style="display: flex; justify-content: center; gap: 10px;">
   <a href="https://github.com/SkyworkAI/Matrix-Game">
     <img src="https://img.shields.io/badge/GitHub-100000?style=flat&logo=github&logoColor=white" alt="GitHub">
   </a>
-  <a href="https://github.com/SkyworkAI/Matrix-Game/blob/main/Matrix-Game-3/assets/pdf/report.pdf">
-    <img src="https://img.shields.io/badge/Technical Report-b31b1b?style=flat&logo=arxiv&logoColor=white" alt="report">
   </a>
   <a href="https://matrix-game-v3.github.io/">
     <img src="https://img.shields.io/badge/Project%20Page-grey?style=flat&logo=huggingface&color=FFA500" alt="Project Page">
   </a>
 </div>
 ## 📝 Overview
-**Matrix-Game-3.0** is an open-sourced, memory-augmented interactive world model designed for 720p real-time long-form video generation.
-## Framework Overview
-Our framework unifies three stages into an end-to-end pipeline:
-- Data Engine — an industrial-scale infinite data engine integrating Unreal Engine synthetic scenes, large-scale automated AAA game collection,and real-world video augmentation to produce high-quality Video-Pose-Action-Prompt quadruplets at scale;
-- Model Training — a memory-augmented Diffusion Transformer (DiT) with an error buffer that learns action-conditioned generation with memory-enhanced long-horizon consistency;
-- Inference Deployment — few-step sampling, INT8 quantization, and model distillation achieving 720p@40FPS real-time generation with a 5B model.
 ![Model Overview](./framework.png)
 ## ✨ Key Features
-- 🚀 **Feature 1**: **Upgraded Data Engine**: Combines Unreal Engine-based synthetic data, large-scale automated AAA game data, and real-world video augmentation to generate high-quality Video–Pose–Action–Prompt data.
-- 🖱️ **Feature 2**: **Long-horizon Memory & Consistency**: Uses prediction residuals and frame re-injection for self-correction, while camera-aware memory ensures long-term spatiotemporal consistency.
-- 🎬 **Feature 3**: **Real-Time Interactivity & Open Access**: It employs a multi-segment autoregressive distillation strategy based on Distribution Matching Distillation (DMD), combined with model quantization and VAE decoder distillation to support [40fps] real-time generation at 720p resolution with a 5B model, while maintaining stable memory consistency over minute-long sequence.
-- 👍 **Feature 3**: **Scale Up 28B-MoE Model**: Scaling up to a 2×14B model further improves generation quality, dynamics, and generalization.
-## 🔥 Latest Updates
-* [2026-03] 🎉 Initial release of Matrix-Game-3.0 Model
 ## 🚀 Quick Start
 ### Installation
-Create a conda environment and install dependencies:
-```
 conda create -n matrix-game-3.0 python=3.12 -y
 conda activate matrix-game-3.0
-# install FlashAttention
-# Our project also depends on [FlashAttention](https://github.com/Dao-AILab/flash-attention)
 git clone https://github.com/SkyworkAI/Matrix-Game-3.0.git
 cd Matrix-Game-3.0
 pip install -r requirements.txt
 ```
-### Model Download
-```
-pip install "huggingface_hub[cli]"
-huggingface-cli download Matrix-Game-3.0 --local-dir Matrix-Game-3.0
-```
 ### Inference
-Before running inference, you need to prepare:
-- Input image
-- Text prompt
-After downloading pretrained models, you can use the following command to generate an interactive video with random actions:
-``` sh
-torchrun --nproc_per_node=$NUM_GPUS generate.py --size 704*1280 --dit_fsdp --t5_fsdp --ckpt_dir Matrix-Game-3.0 --fa_version 3 --use_int8 --num_iterations 12 --num_inference_steps 3 --image demo_images/000/image.png --prompt "a vintage gas station with a classic car parked under a canopy, set against a desert landscape." --save_name test --seed 42 --compile_vae --lightvae_pruning_rate 0.5 --vae_type mg_lightvae --output_dir ./output
-# "num_iterations" refers to the number of iterations you want to generate. The total number of frames generated is given by:57 + (num_iterations - 1) * 40
 ```
-Tips:
-If you want to use the base model, you can use "--use_base_model --num_inference_steps 50". Otherwise if you want to generating the interactive videos with your own input actions, you can use "--interactive".
-With multiple GPUs, you can pass `--use_async_vae --async_vae_warmup_iters 1` to speed up inference.
 ## ⭐ Acknowledgements
-- [Diffusers](https://github.com/huggingface/diffusers) for their excellent diffusion model framework
-- [Self-Forcing](https://github.com/guandeh17/Self-Forcing) for their excellent work
-- [GameFactory](https://github.com/KwaiVGI/GameFactory) for their idea of action control module
-- [LightX2V](https://github.com/ModelTC/lightx2v) for their excellent quantization framework
-- [Wan2.2](https://github.com/Wan-Video/Wan2.2) for their strong base model
-- [lingbot-world](https://github.com/Robbyant/lingbot-world) for their context parallel framework
 ## 📖 Citation
-If you find this work useful for your research, please kindly cite our paper:
-```
-  @misc{2026matrix,
-    title={Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory},
-    author={{Skywork AI Matrix-Game Team}},
-    year={2026},
-    howpublished={Technical report},
-    url={https://github.com/SkyworkAI/Matrix-Game/blob/main/Matrix-Game-3/assets/pdf/report.pdf}
-  }
 ```

 ---
 base_model:
 - Wan-AI/Wan2.2-TI2V-5B
+language:
+- en
+license: apache-2.0
+pipeline_tag: text-to-video
+library_name: diffusers
 ---
 # Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory
+Matrix-Game 3.0 is an open-source, memory-augmented interactive world model designed for 720p real-time long-form video generation. It achieves up to 40 FPS real-time generation at 720p resolution with a 5B model while maintaining stable memory consistency over minute-long sequences.
 <div style="display: flex; justify-content: center; gap: 10px;">
   <a href="https://github.com/SkyworkAI/Matrix-Game">
     <img src="https://img.shields.io/badge/GitHub-100000?style=flat&logo=github&logoColor=white" alt="GitHub">
   </a>
+  <a href="https://huggingface.co/papers/2604.08995">
+    <img src="https://img.shields.io/badge/Paper-b31b1b?style=flat&logo=arxiv&logoColor=white" alt="Paper">
   </a>
   <a href="https://matrix-game-v3.github.io/">
     <img src="https://img.shields.io/badge/Project%20Page-grey?style=flat&logo=huggingface&color=FFA500" alt="Project Page">
   </a>
 </div>
 ## 📝 Overview
+The Matrix-Game 3.0 framework unifies three stages into an end-to-end pipeline:
+- **Data Engine**: An upgraded industrial-scale data engine integrating Unreal Engine synthetic data and AAA game collection to produce high-quality Video-Pose-Action-Prompt quadruplets.
+- **Model Training**: A memory-augmented Diffusion Transformer (DiT) that learns self-correction by modeling prediction residuals and employs camera-aware memory for long-horizon consistency.
+- **Inference Deployment**: Multi-segment autoregressive distillation (DMD), model quantization, and VAE decoder pruning to achieve efficient real-time inference.
 ![Model Overview](./framework.png)
 ## ✨ Key Features
+- 🚀 **Real-Time Performance**: Supports 720p @ 40fps generation with the 5B model.
+- 🖱️ **Long-horizon Consistency**: Stable memory consistency over sequences lasting minutes.
+- 🎬 **Scalability**: Scaling to a 28B-MoE model (2x14B) further improves quality and generalization.
 ## 🚀 Quick Start
 ### Installation
+```bash
 conda create -n matrix-game-3.0 python=3.12 -y
 conda activate matrix-game-3.0
+# install FlashAttention and other dependencies
 git clone https://github.com/SkyworkAI/Matrix-Game-3.0.git
 cd Matrix-Game-3.0
 pip install -r requirements.txt
 ```
 ### Inference
+After downloading the pretrained weights, you can generate an interactive video with the following command:
+```bash
+torchrun --nproc_per_node=$NUM_GPUS generate.py \
+    --size 704*1280 \
+    --dit_fsdp \
+    --t5_fsdp \
+    --ckpt_dir Matrix-Game-3.0 \
+    --fa_version 3 \
+    --use_int8 \
+    --num_iterations 12 \
+    --num_inference_steps 3 \
+    --image demo_images/000/image.png \
+    --prompt "a vintage gas station with a classic car parked under a canopy, set against a desert landscape." \
+    --save_name test \
+    --seed 42 \
+    --compile_vae \
+    --lightvae_pruning_rate 0.5 \
+    --vae_type mg_lightvae \
+    --output_dir ./output
 ```
 ## ⭐ Acknowledgements
+- [Diffusers](https://github.com/huggingface/diffusers) for the diffusion model framework.
+- [Wan2.2](https://github.com/Wan-Video/Wan2.2) for the strong base model.
+- [Self-Forcing](https://github.com/guandeh17/Self-Forcing), [GameFactory](https://github.com/KwaiVGI/GameFactory), [LightX2V](https://github.com/ModelTC/lightx2v), and [lingbot-world](https://github.com/Robbyant/lingbot-world) for their contributions and frameworks.
 ## 📖 Citation
+If you find this work useful for your research, please cite:
+```bibtex
+@misc{2026matrix,
+  title={Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory},
+  author={{Skywork AI Matrix-Game Team}},
+  year={2026},
+  howpublished={Technical report},
+  url={https://github.com/SkyworkAI/Matrix-Game/blob/main/Matrix-Game-3/assets/pdf/report.pdf}
+}
 ```