FireRedTeam
/

FireRed-OCR

@@ -1,11 +1,11 @@
 ---
-license: apache-2.0
 base_model:
 - Qwen/Qwen3-VL-2B-Instruct
-pipeline_tag: image-to-text
 ---
 <p align="center">
   <img src="./assets/logo.png" width="600"/>
 </p>
@@ -100,6 +100,7 @@ cd FireRed-OCR
 **2. Inference**
 ```python
 from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
 from conv_for_infer import generate_conv
@@ -112,7 +113,7 @@ model = Qwen3VLForConditionalGeneration.from_pretrained(
 # We recommend enabling flash_attention_2 for better acceleration and memory saving, especially in multi-image and video scenarios.
 # model = Qwen3VLForConditionalGeneration.from_pretrained(
-#     "FireRedTeam/FireRed-OCR,
 #     dtype=torch.bfloat16,
 #     attn_implementation="flash_attention_2",
 #     device_map="auto",
@@ -217,7 +218,7 @@ We evaluate FireRed-OCR on **OmniDocBench v1.5** and **FireRedBench**.
     </tr>
   </thead>
   <tbody>
-    <tr><td>GPT-5.2🔒</td><td>68.09</td><td>0.238</td><td>66.33</td><td>61.74</td><td>68.00</td><td>0.38</td></tr>
     <tr><td>Gemini-3.0 Pro🔒</td><td>79.68</td><td>0.169</td><td>80.11</td><td>75.82</td><td>82.73</td><td>0.353</td></tr>
     <tr>
       <td colspan="7" align="center"><strong>Pipeline</strong></td>
@@ -234,94 +235,6 @@ We evaluate FireRed-OCR on **OmniDocBench v1.5** and **FireRedBench**.
   </tbody>
 </table>
-### Additional Benchmarks
-<table>
-  <thead>
-    <tr>
-      <th>Model</th>
-      <th>OmniDocBench v1.5</th>
-      <th>FireRedBench</th>
-      <th>OCRBench(TextRec)</th>
-      <th>TEDS_TEST</th>
-      <th>PubTabNet</th>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-      <td>GPT-5.2🔒</td>
-      <td>85.50</td>
-      <td>68.09</td>
-      <td>93.0</td>
-      <td>67.6</td>
-      <td>84.4</td>
-    </tr>
-    <tr>
-      <td>Gemini-3.0 Pro🔒</td>
-      <td>90.33</td>
-      <td>79.68</td>
-      <td>91.9</td>
-      <td>81.8</td>
-      <td>91.4</td>
-    </tr>
-    <tr>
-      <td colspan="6" align="center"><strong>Pipeline</strong></td>
-    </tr>
-    <tr>
-      <td>MinerU2.5</td>
-      <td>90.67</td>
-      <td>-</td>
-      <td>-</td>
-      <td>85.4</td>
-      <td>88.4</td>
-    </tr>
-    <tr>
-      <td>PaddleOCR-VL-1.5</td>
-      <td>94.50</td>
-      <td>76.47</td>
-      <td>53.5 / 87.0</td>
-      <td>83.3</td>
-      <td>84.6</td>
-    </tr>
-    <tr>
-      <td>GLM-OCR</td>
-      <td>94.60</td>
-      <td>74.33</td>
-      <td>61.0 / 95.0</td>
-      <td>86.0</td>
-      <td>85.2</td>
-    </tr>
-    <tr>
-      <td colspan="6" align="center"><strong>End-to-end</strong></td>
-    </tr>
-    <tr>
-      <td>dots.ocr</td>
-      <td>88.41</td>
-      <td>72.93</td>
-      <td>92.1</td>
-      <td>62.4</td>
-      <td>71.0</td>
-    </tr>
-    <tr>
-      <td>DeepSeek-OCR 2</td>
-      <td>91.09</td>
-      <td>61.61</td>
-      <td>48.5</td>
-      <td>-</td>
-      <td>-</td>
-    </tr>
-    <tr>
-      <td><strong>FireRed-OCR-2B</strong></td>
-      <td><strong>92.94</strong></td>
-      <td><strong>74.62</strong></td>
-      <td><strong>93.5</strong></td>
-      <td><strong>80.6</strong></td>
-      <td><strong>77.0</strong></td>
-    </tr>
-  </tbody>
-</table>
-> For PaddleOCR-VL-1.5 and GLM-OCR on OCRBench, scores are reported as API / pure VLM.
 ## 📜 License Agreement
 The code and the weights of FireRed-OCR are licensed under Apache 2.0.
@@ -333,11 +246,12 @@ We kindly encourage citation of our work if you find it useful.
 ```bibtex
 @article{fireredocr,
   title={FireRed-OCR Technical Report},
-  author={Super Intelligence Team， Xiaohongshu Inc.},
-  year={202X},
   archivePrefix={arXiv},
   primaryClass={cs.CV},
-  url={https://github.com/FireRedTeam/FireRed-OCR}
 }
 ```

 ---
 base_model:
 - Qwen/Qwen3-VL-2B-Instruct
+license: apache-2.0
+pipeline_tag: image-text-to-text
+library_name: transformers
 ---
 <p align="center">
   <img src="./assets/logo.png" width="600"/>
 </p>
 **2. Inference**
 ```python
+import torch
 from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
 from conv_for_infer import generate_conv
 # We recommend enabling flash_attention_2 for better acceleration and memory saving, especially in multi-image and video scenarios.
 # model = Qwen3VLForConditionalGeneration.from_pretrained(
+#     "FireRedTeam/FireRed-OCR",
 #     dtype=torch.bfloat16,
 #     attn_implementation="flash_attention_2",
 #     device_map="auto",
     </tr>
   </thead>
   <tbody>
+    <tr><td>GPT-5.2🔒</td><td>68.09</td><td>0.238</td><td>66.33</td><td>61.74</td><td>68.00</td><td>0.380</td></tr>
     <tr><td>Gemini-3.0 Pro🔒</td><td>79.68</td><td>0.169</td><td>80.11</td><td>75.82</td><td>82.73</td><td>0.353</td></tr>
     <tr>
       <td colspan="7" align="center"><strong>Pipeline</strong></td>
   </tbody>
 </table>
 ## 📜 License Agreement
 The code and the weights of FireRed-OCR are licensed under Apache 2.0.
 ```bibtex
 @article{fireredocr,
   title={FireRed-OCR Technical Report},
+  author={Super Intelligence Team, Xiaohongshu Inc.},
+  year={2026},
+  eprint={2603.01840},
   archivePrefix={arXiv},
   primaryClass={cs.CV},
+  url={https://arxiv.org/abs/2603.01840}
 }
 ```