Add library_name and update pipeline_tag

#4
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +10 -96
README.md CHANGED
@@ -1,11 +1,11 @@
1
  ---
2
- license: apache-2.0
3
  base_model:
4
  - Qwen/Qwen3-VL-2B-Instruct
5
- pipeline_tag: image-to-text
 
 
6
  ---
7
 
8
-
9
  <p align="center">
10
  <img src="./assets/logo.png" width="600"/>
11
  </p>
@@ -100,6 +100,7 @@ cd FireRed-OCR
100
 
101
  **2. Inference**
102
  ```python
 
103
  from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
104
  from conv_for_infer import generate_conv
105
 
@@ -112,7 +113,7 @@ model = Qwen3VLForConditionalGeneration.from_pretrained(
112
 
113
  # We recommend enabling flash_attention_2 for better acceleration and memory saving, especially in multi-image and video scenarios.
114
  # model = Qwen3VLForConditionalGeneration.from_pretrained(
115
- # "FireRedTeam/FireRed-OCR,
116
  # dtype=torch.bfloat16,
117
  # attn_implementation="flash_attention_2",
118
  # device_map="auto",
@@ -217,7 +218,7 @@ We evaluate FireRed-OCR on **OmniDocBench v1.5** and **FireRedBench**.
217
  </tr>
218
  </thead>
219
  <tbody>
220
- <tr><td>GPT-5.2🔒</td><td>68.09</td><td>0.238</td><td>66.33</td><td>61.74</td><td>68.00</td><td>0.38</td></tr>
221
  <tr><td>Gemini-3.0 Pro🔒</td><td>79.68</td><td>0.169</td><td>80.11</td><td>75.82</td><td>82.73</td><td>0.353</td></tr>
222
  <tr>
223
  <td colspan="7" align="center"><strong>Pipeline</strong></td>
@@ -234,94 +235,6 @@ We evaluate FireRed-OCR on **OmniDocBench v1.5** and **FireRedBench**.
234
  </tbody>
235
  </table>
236
 
237
- ### Additional Benchmarks
238
-
239
- <table>
240
- <thead>
241
- <tr>
242
- <th>Model</th>
243
- <th>OmniDocBench v1.5</th>
244
- <th>FireRedBench</th>
245
- <th>OCRBench(TextRec)</th>
246
- <th>TEDS_TEST</th>
247
- <th>PubTabNet</th>
248
- </tr>
249
- </thead>
250
- <tbody>
251
- <tr>
252
- <td>GPT-5.2🔒</td>
253
- <td>85.50</td>
254
- <td>68.09</td>
255
- <td>93.0</td>
256
- <td>67.6</td>
257
- <td>84.4</td>
258
- </tr>
259
- <tr>
260
- <td>Gemini-3.0 Pro🔒</td>
261
- <td>90.33</td>
262
- <td>79.68</td>
263
- <td>91.9</td>
264
- <td>81.8</td>
265
- <td>91.4</td>
266
- </tr>
267
- <tr>
268
- <td colspan="6" align="center"><strong>Pipeline</strong></td>
269
- </tr>
270
- <tr>
271
- <td>MinerU2.5</td>
272
- <td>90.67</td>
273
- <td>-</td>
274
- <td>-</td>
275
- <td>85.4</td>
276
- <td>88.4</td>
277
- </tr>
278
- <tr>
279
- <td>PaddleOCR-VL-1.5</td>
280
- <td>94.50</td>
281
- <td>76.47</td>
282
- <td>53.5 / 87.0</td>
283
- <td>83.3</td>
284
- <td>84.6</td>
285
- </tr>
286
- <tr>
287
- <td>GLM-OCR</td>
288
- <td>94.60</td>
289
- <td>74.33</td>
290
- <td>61.0 / 95.0</td>
291
- <td>86.0</td>
292
- <td>85.2</td>
293
- </tr>
294
- <tr>
295
- <td colspan="6" align="center"><strong>End-to-end</strong></td>
296
- </tr>
297
- <tr>
298
- <td>dots.ocr</td>
299
- <td>88.41</td>
300
- <td>72.93</td>
301
- <td>92.1</td>
302
- <td>62.4</td>
303
- <td>71.0</td>
304
- </tr>
305
- <tr>
306
- <td>DeepSeek-OCR 2</td>
307
- <td>91.09</td>
308
- <td>61.61</td>
309
- <td>48.5</td>
310
- <td>-</td>
311
- <td>-</td>
312
- </tr>
313
- <tr>
314
- <td><strong>FireRed-OCR-2B</strong></td>
315
- <td><strong>92.94</strong></td>
316
- <td><strong>74.62</strong></td>
317
- <td><strong>93.5</strong></td>
318
- <td><strong>80.6</strong></td>
319
- <td><strong>77.0</strong></td>
320
- </tr>
321
- </tbody>
322
- </table>
323
- > For PaddleOCR-VL-1.5 and GLM-OCR on OCRBench, scores are reported as API / pure VLM.
324
-
325
  ## 📜 License Agreement
326
 
327
  The code and the weights of FireRed-OCR are licensed under Apache 2.0.
@@ -333,11 +246,12 @@ We kindly encourage citation of our work if you find it useful.
333
  ```bibtex
334
  @article{fireredocr,
335
  title={FireRed-OCR Technical Report},
336
- author={Super Intelligence Team Xiaohongshu Inc.},
337
- year={202X},
 
338
  archivePrefix={arXiv},
339
  primaryClass={cs.CV},
340
- url={https://github.com/FireRedTeam/FireRed-OCR}
341
  }
342
  ```
343
 
 
1
  ---
 
2
  base_model:
3
  - Qwen/Qwen3-VL-2B-Instruct
4
+ license: apache-2.0
5
+ pipeline_tag: image-text-to-text
6
+ library_name: transformers
7
  ---
8
 
 
9
  <p align="center">
10
  <img src="./assets/logo.png" width="600"/>
11
  </p>
 
100
 
101
  **2. Inference**
102
  ```python
103
+ import torch
104
  from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
105
  from conv_for_infer import generate_conv
106
 
 
113
 
114
  # We recommend enabling flash_attention_2 for better acceleration and memory saving, especially in multi-image and video scenarios.
115
  # model = Qwen3VLForConditionalGeneration.from_pretrained(
116
+ # "FireRedTeam/FireRed-OCR",
117
  # dtype=torch.bfloat16,
118
  # attn_implementation="flash_attention_2",
119
  # device_map="auto",
 
218
  </tr>
219
  </thead>
220
  <tbody>
221
+ <tr><td>GPT-5.2🔒</td><td>68.09</td><td>0.238</td><td>66.33</td><td>61.74</td><td>68.00</td><td>0.380</td></tr>
222
  <tr><td>Gemini-3.0 Pro🔒</td><td>79.68</td><td>0.169</td><td>80.11</td><td>75.82</td><td>82.73</td><td>0.353</td></tr>
223
  <tr>
224
  <td colspan="7" align="center"><strong>Pipeline</strong></td>
 
235
  </tbody>
236
  </table>
237
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
238
  ## 📜 License Agreement
239
 
240
  The code and the weights of FireRed-OCR are licensed under Apache 2.0.
 
246
  ```bibtex
247
  @article{fireredocr,
248
  title={FireRed-OCR Technical Report},
249
+ author={Super Intelligence Team, Xiaohongshu Inc.},
250
+ year={2026},
251
+ eprint={2603.01840},
252
  archivePrefix={arXiv},
253
  primaryClass={cs.CV},
254
+ url={https://arxiv.org/abs/2603.01840}
255
  }
256
  ```
257