| --- |
| license: cc-by-nc-4.0 |
| language: |
| - ar |
| --- |
| |
| ## Checkpoints |
|
|
| ### Pre-Trained Models |
|
|
| Model | Pre-train Dataset | Model | Tokenizer | |
| | --- | --- | --- | --- | |
| | ArTST v3 base | Multilingual | [Hugging Face](https://huggingface.co/MBZUAI/ArTSTv3/blob/main/pretrain_checkpoint.pt) | [Hugging Face](https://huggingface.co/MBZUAI/ArTSTv3/blob/main/tokenizer_artstv3.model) |
|
|
| # Acknowledgements |
|
|
| ArTST is built on [SpeechT5](https://arxiv.org/abs/2110.07205) Architecture. If you use any of ArTST models, please cite |
|
|
| ``` |
| @inproceedings{toyin2023artst, |
| title={ArTST: Arabic Text and Speech Transformer}, |
| author={Toyin, Hawau and Djanibekov, Amirbek and Kulkarni, Ajinkya and Aldarmaki, Hanan}, |
| booktitle={Proceedings of ArabicNLP 2023}, |
| pages={41--51}, |
| year={2023} |
| } |
| |
| @misc{djanibekov2024dialectalcoveragegeneralizationarabic, |
| title={Dialectal Coverage And Generalization in Arabic Speech Recognition}, |
| author={Amirbek Djanibekov and Hawau Olamide Toyin and Raghad Alshalan and Abdullah Alitr and Hanan Aldarmaki}, |
| year={2024}, |
| eprint={2411.05872}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.CL}, |
| url={https://arxiv.org/abs/2411.05872}, |
| } |
| ``` |