@weiwchu thank you for your comments π
The two data vendors mentioned earlier are offering datasets directly to ASR service providers for training. With that in mind, it feels like we should be extra prudent about mixing private data into our evaluations.
Completely agree, that is why we don't include these new datasets in the default average WER computation. We hope that users have this nuanced view of the data sources. But also on the types of content, which is why we added splits on scripted/conversational and American/non-American accents.
It would be a huge step forward to open a channel where the research community can recommend these datasets to the leaderboard directly.
This is possible on our GitHub repo! This checklist describes how a new model or dataset can be contributed.