AVBench Models
This repository hosts the evaluator models used in AVBench, a benchmark for text-to-audio-video generation quality and cross-modal consistency.
AVBench in brief
AVBench evaluates generated content on two splits:
- Normal split: common, easier samples.
- Hard split: challenging samples with stronger cross-modal requirements.
It covers cross-modal alignment (Audio-Text / Video-Text / Audio-Video) and generation quality dimensions.
Dataset link:
Model zoo used by AVBench
| Model | Use in AVBench | Trained / merged from |
|---|---|---|
Qwen2-Audio-7B-AudioTextMatching-Merged |
Audio-Text consistency scoring (AT) | Qwen/Qwen2-Audio-7B-Instruct |
Qwen2.5-Omni-7B-VideoTextMatching-Merged |
Video-Text consistency scoring (VT) | Qwen/Qwen2.5-Omni-7B |
Qwen2.5-Omni-7B-AudioVideoMatching-Merged |
Audio-Video consistency scoring (AV) | Qwen/Qwen2.5-Omni-7B |
Notes
These models are released for AVBench evaluation and analysis.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for iiiiii123/AVBench_model
Base model
Qwen/Qwen2-Audio-7B-Instruct