VidTwin: Video VAE with Decoupled Structure and Dynamics Paper • 2412.17726 • Published Dec 23, 2024 • 9
MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft Paper • 2504.08388 • Published Apr 11, 2025 • 42
HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models Paper • 2503.11513 • Published Mar 14, 2025
Reinforcement Learning with Inverse Rewards for World Model Post-training Paper • 2509.23958 • Published Sep 28, 2025
Memory Forcing: Spatio-Temporal Memory for Consistent Scene Generation on Minecraft Paper • 2510.03198 • Published Oct 3, 2025
microsoft/paza-whisper-large-v3-turbo Automatic Speech Recognition • 0.8B • Updated 4 days ago • 56 • 1
microsoft/paza-Phi-4-multimodal-instruct Automatic Speech Recognition • 6B • Updated 4 days ago • 173 • 1