AudioVisual-Caption/ASID-Captioner-3B
Image-Text-to-Text
•
5B
•
Updated
•
9
Video Understanding, Audio-Visual Learning, Multimodal LLMs, Video Captioning, Instruction Tuning, Dataset Curation