None defined yet.
ImageWAM: Do World Action Models Really Need Video Generation, or Just Image Editing?
ViT-Up: Faithful Feature Upsampling for Vision Transformers