7 1

Ying

Heting

AI & ML interests

None yet

Recent Activity

upvoted an article about 12 hours ago

VLX-Go: Vision-Language Short-Horizon Waypoint Prediction for Embodied Navigation

upvoted an article 1 day ago

VLX-Seek: Improving VLM Fine-Grained Perception via Region Reference Instead of Coordinate Generation

upvoted an article 2 days ago

VLX-Flow: Continuous Video Understanding for Real-Time Multimodal Interaction

View all activity

Organizations

None yet

upvoted an article about 12 hours ago

Article

VLX-Go: Vision-Language Short-Horizon Waypoint Prediction for Embodied Navigation

omlab

•

about 12 hours ago

• 6

upvoted an article 1 day ago

Article

VLX-Seek: Improving VLM Fine-Grained Perception via Region Reference Instead of Coordinate Generation

omlab

•

1 day ago

• 9

upvoted an article 2 days ago

Article

VLX-Flow: Continuous Video Understanding for Real-Time Multimodal Interaction

omlab

•

2 days ago

• 9

updated a model 11 days ago

omlab/OmTrackVLA-0.6B

Other • 0.6B • Updated 11 days ago • 64 • 4

upvoted a paper 27 days ago

Which Pretraining Paradigm Better Serves Spatial Intelligence? An Empirical Comparison of Vision-Language and Video Generation Models

Paper • 2605.28132 • Published May 27 • 25

upvoted a collection 5 months ago

Qwen3-TTS

Collection

7 items • Updated Jan 22 • 368

liked a Space 12 months ago

3d-Model-Playground

👀

Control 3D models using gestures and voice

upvoted a paper about 1 year ago

VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model

Paper • 2504.07615 • Published Apr 10, 2025 • 36

authored a paper over 1 year ago

OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer

Paper • 2406.16620 • Published Jun 24, 2024 • 3

Ying

AI & ML interests

Recent Activity

Organizations

Heting's activity

VLX-Go: Vision-Language Short-Horizon Waypoint Prediction for Embodied Navigation

VLX-Seek: Improving VLM Fine-Grained Perception via Region Reference Instead of Coordinate Generation

VLX-Flow: Continuous Video Understanding for Real-Time Multimodal Interaction

3d-Model-Playground