ASID-Caption

community

AI & ML interests

Video Understanding, Audio-Visual Learning, Multimodal LLMs, Video Captioning, Instruction Tuning, Dataset Curation

Recent Activity

lyhisme  updated a Space about 22 hours ago
AudioVisual-Caption/README
lyhisme  published a Space about 22 hours ago
AudioVisual-Caption/README
lyhisme  updated a dataset 1 day ago
AudioVisual-Caption/ASID-1M
View all activity

ASID-Caption

We build ASID-Caption, a data-and-model suite for fine-grained audiovisual video understanding.

Our goal is to move beyond “one video → one generic caption” by providing attribute-structured supervision and quality-verified annotations, enabling models to produce more complete, more controllable, and more temporally consistent descriptions that cover both visual content and audio cues.

What we release

  • ASID-1M: a large-scale collection of attribute-structured audiovisual instructions with both single-attribute and all-attributes training formats.
  • ASID-Verify: a scalable curation pipeline that generates, ensembles, verifies, and refines annotations to improve semantic and temporal consistency.
  • ASID-Captioner: Qwen2.5-Omni-based audiovisual captioning models fine-tuned on ASID-1M.

Research interests

  • Video understanding & video captioning
  • Audio-visual learning
  • Multimodal LLMs / instruction tuning
  • Data curation, verification, and quality control

models 0

None public yet