chart ahmed-masry/ChartQA Viewer • Updated Jun 22, 2024 • 32.7k • 1.65k • 31 oroikon/chart_captioning Viewer • Updated Oct 8, 2023 • 8.82k • 138 • 12 heegyu/chart2text_statista Viewer • Updated Oct 12, 2023 • 34.8k • 3.48k • 9 nourheshamshaheen/typed_final_chart_to_table Viewer • Updated Nov 12, 2023 • 2.81k • 39 • 6
Document Undestanding Models Mizukiluke/ureader-instruction-1.0 Viewer • Updated Oct 13, 2023 • 24.5k • 66 • 15
Captioning docling-project/USPTO-30K Viewer • Updated Aug 24, 2023 • 30k • 748 • 9 MMInstruction/ArxivCap Viewer • Updated Oct 3, 2024 • 573k • 183k • 58 mPLUG/M-Paper Preview • Updated Jan 13, 2024 • 729 • 13
DocQA jp1924/DocStruct4M Viewer • Updated Feb 5, 2025 • 3.05M • 7 • 4 howard-hou/OCR-VQA Viewer • Updated Apr 24, 2023 • 208k • 3.49k • 60 vikhyatk/docmatix-single Viewer • Updated Jul 19, 2024 • 565k • 54 • 6 MMInstruction/ArxivQA Viewer • Updated Mar 5, 2024 • 100k • 328 • 38
Page to MD A dataset of image-text pairs sourced from research papers on arXiv, where each image is derived from a PDF page and paired with its corresponding OCR v1v1d/Arxiv_MD_v2_2k Viewer • Updated Jun 24, 2024 • 3.04k • 6 v1v1d/Arxiv_MD_v2 Viewer • Updated Jun 24, 2024 • 14.2k • 319 v1v1d/Arxiv_MD_v1_1k Viewer • Updated Jun 23, 2024 • 1.14k • 42 v1v1d/Arxiv_MD_v1 Viewer • Updated Jun 18, 2024 • 9.96k • 134
OCR wendlerc/RenderedText Viewer • Updated Oct 23, 2025 • 12M • 26.8k • 58 Salesforce/blip3-ocr-200m Viewer • Updated Feb 3, 2025 • 96M • 709 • 42 openpecha/OCR-Google_Books Viewer • Updated Oct 20, 2025 • 751k • 484 openpecha/OCR-Norbuketaka Viewer • Updated Oct 14, 2025 • 2.24M • 36
Table Extraction docling-project/PubTables-1M_OTSL Viewer • Updated Aug 31, 2023 • 1.88M • 2.89k • 7 docling-project/PubTabNet_OTSL Viewer • Updated Aug 31, 2023 • 395k • 3.76k • 5 docling-project/FinTabNet_OTSL Viewer • Updated Aug 31, 2023 • 109k • 826 • 7
Layout Detection docling-project/DocLayNet-v1.1 Viewer • Updated Sep 1, 2023 • 63.5k • 2.55k • 27 docling-project/DocLayNet Updated Jan 25, 2023 • 809 • 137 vikp/doclaynet_processed Viewer • Updated Nov 30, 2023 • 80.9k • 1.19k • 6 psyche/publaynet Viewer • Updated Jul 30, 2024 • 347k • 84
VQA wyu1/Leopard-Instruct Viewer • Updated Nov 8, 2024 • 1.03M • 84.7k • 65 neulab/PangeaInstruct Updated Feb 2, 2025 • 479 • 86 MMInstruction/ArxivQA Viewer • Updated Mar 5, 2024 • 100k • 328 • 38 vidore/arxivqa_train Viewer • Updated Jun 20, 2025 • 95k • 333
Latex Extract A dataset collection of image-text pairs, where each image contains mathematical formulas, and each corresponding text provides the relevant LaTeX v1v1d/Latexify_v1_clean Viewer • Updated Jul 29, 2024 • 11k • 46 • 1 v1v1d/Latexify_v1 Viewer • Updated Jul 29, 2024 • 234k • 7 • 1 OleehyO/latex-formulas Viewer • Updated Aug 13, 2025 • 1.56M • 512 • 99 unsloth/LaTeX_OCR Viewer • Updated Nov 21, 2024 • 76.3k • 4.68k • 77
chart ahmed-masry/ChartQA Viewer • Updated Jun 22, 2024 • 32.7k • 1.65k • 31 oroikon/chart_captioning Viewer • Updated Oct 8, 2023 • 8.82k • 138 • 12 heegyu/chart2text_statista Viewer • Updated Oct 12, 2023 • 34.8k • 3.48k • 9 nourheshamshaheen/typed_final_chart_to_table Viewer • Updated Nov 12, 2023 • 2.81k • 39 • 6
OCR wendlerc/RenderedText Viewer • Updated Oct 23, 2025 • 12M • 26.8k • 58 Salesforce/blip3-ocr-200m Viewer • Updated Feb 3, 2025 • 96M • 709 • 42 openpecha/OCR-Google_Books Viewer • Updated Oct 20, 2025 • 751k • 484 openpecha/OCR-Norbuketaka Viewer • Updated Oct 14, 2025 • 2.24M • 36
Document Undestanding Models Mizukiluke/ureader-instruction-1.0 Viewer • Updated Oct 13, 2023 • 24.5k • 66 • 15
Table Extraction docling-project/PubTables-1M_OTSL Viewer • Updated Aug 31, 2023 • 1.88M • 2.89k • 7 docling-project/PubTabNet_OTSL Viewer • Updated Aug 31, 2023 • 395k • 3.76k • 5 docling-project/FinTabNet_OTSL Viewer • Updated Aug 31, 2023 • 109k • 826 • 7
Captioning docling-project/USPTO-30K Viewer • Updated Aug 24, 2023 • 30k • 748 • 9 MMInstruction/ArxivCap Viewer • Updated Oct 3, 2024 • 573k • 183k • 58 mPLUG/M-Paper Preview • Updated Jan 13, 2024 • 729 • 13
Layout Detection docling-project/DocLayNet-v1.1 Viewer • Updated Sep 1, 2023 • 63.5k • 2.55k • 27 docling-project/DocLayNet Updated Jan 25, 2023 • 809 • 137 vikp/doclaynet_processed Viewer • Updated Nov 30, 2023 • 80.9k • 1.19k • 6 psyche/publaynet Viewer • Updated Jul 30, 2024 • 347k • 84
DocQA jp1924/DocStruct4M Viewer • Updated Feb 5, 2025 • 3.05M • 7 • 4 howard-hou/OCR-VQA Viewer • Updated Apr 24, 2023 • 208k • 3.49k • 60 vikhyatk/docmatix-single Viewer • Updated Jul 19, 2024 • 565k • 54 • 6 MMInstruction/ArxivQA Viewer • Updated Mar 5, 2024 • 100k • 328 • 38
VQA wyu1/Leopard-Instruct Viewer • Updated Nov 8, 2024 • 1.03M • 84.7k • 65 neulab/PangeaInstruct Updated Feb 2, 2025 • 479 • 86 MMInstruction/ArxivQA Viewer • Updated Mar 5, 2024 • 100k • 328 • 38 vidore/arxivqa_train Viewer • Updated Jun 20, 2025 • 95k • 333
Page to MD A dataset of image-text pairs sourced from research papers on arXiv, where each image is derived from a PDF page and paired with its corresponding OCR v1v1d/Arxiv_MD_v2_2k Viewer • Updated Jun 24, 2024 • 3.04k • 6 v1v1d/Arxiv_MD_v2 Viewer • Updated Jun 24, 2024 • 14.2k • 319 v1v1d/Arxiv_MD_v1_1k Viewer • Updated Jun 23, 2024 • 1.14k • 42 v1v1d/Arxiv_MD_v1 Viewer • Updated Jun 18, 2024 • 9.96k • 134
Latex Extract A dataset collection of image-text pairs, where each image contains mathematical formulas, and each corresponding text provides the relevant LaTeX v1v1d/Latexify_v1_clean Viewer • Updated Jul 29, 2024 • 11k • 46 • 1 v1v1d/Latexify_v1 Viewer • Updated Jul 29, 2024 • 234k • 7 • 1 OleehyO/latex-formulas Viewer • Updated Aug 13, 2025 • 1.56M • 512 • 99 unsloth/LaTeX_OCR Viewer • Updated Nov 21, 2024 • 76.3k • 4.68k • 77