Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Vanqi 's Collections
Interesting work but not directly related
From Vision to Motion

From Vision to Motion

updated about 22 hours ago
Upvote
-

  • HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning

    Paper • 2603.17024 • Published 22 days ago • 107

  • WorldAgents: Can Foundation Image Models be Agents for 3D World Models?

    Paper • 2603.19708 • Published 20 days ago • 13

  • MACRO: Advancing Multi-Reference Image Generation with Structured Long-Context Data

    Paper • 2603.25319 • Published 14 days ago • 32

  • ArtHOI: Taming Foundation Models for Monocular 4D Reconstruction of Hand-Articulated-Object Interactions

    Paper • 2603.25791 • Published 13 days ago • 5

  • Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence?

    Paper • 2604.03016 • Published 6 days ago • 28

  • The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook

    Paper • 2604.02029 • Published 7 days ago • 132

  • OpenWorldLib: A Unified Codebase and Definition of Advanced World Models

    Paper • 2604.04707 • Published 3 days ago • 164
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs