viraman commited on
Commit
38ff849
·
verified ·
1 Parent(s): e0c91db

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -38
README.md CHANGED
@@ -75,6 +75,44 @@ This model is ready for commercial use.
75
 
76
  Governing Terms: Use of this model is governed by the [NVIDIA Nemotron Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-nemotron-open-model-license/).
77
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
78
  ### Deployment Geography: Global
79
 
80
  ### Use Case
@@ -523,44 +561,6 @@ The GitHub Crawl was collected using the GitHub REST API and the Amazon S3 API.
523
  * Data Collection Method by dataset: Hybrid: Human, Synthetic
524
  * Labeling Method by dataset: Hybrid: Automated, Human, Synthetic
525
 
526
- #### Evaluation Results:
527
-
528
- #### Benchmark Results (Reasoning On)
529
-
530
- We evaluated our model in \*\*Reasoning-On\*\* mode across these benchmarks.
531
-
532
- | Benchmark | NVIDIA-Nemotron-3-Nano-4B-BF16 |
533
- | :---- | :---: |
534
- | AIME25 | 78.5 |
535
- | MATH500 | 95.4 |
536
- | GPQA | 53.2 |
537
- | LCB | 51.8 |
538
- | BFCL v3 | 61.1 |
539
- | IFEVAL-Prompt | 87.9 |
540
- | IFEVAL-Instruction | 92 |
541
- | Tau2-Airline | 33.3 |
542
- | Tau2-Retail | 39.8 |
543
- | Tau2-Telecom | 33 |
544
-
545
- We also evaluated our model in \*\*Reasoning-off\*\* mode across these benchmarks
546
-
547
- | Benchmark | NVIDIA-Nemotron-3-Nano-4B-BF16 |
548
- | :---- | ----- |
549
- | BFCL v3 | 61.1 |
550
- | IFBench-Prompt | 43.2 |
551
- | IFBench-Instruction | 44.2 |
552
- | Orak | 22.9 |
553
- | IFEval-Prompt | 82.8 |
554
- | IFEval-Instruction | 88 |
555
- | HaluEval | 62.2 |
556
- | RULER (128k) | 91.1 |
557
- | Tau2-Airline | 28.0 |
558
- | Tau2-Retail | 34.8 |
559
- | Tau2-Telecom | 24.9 |
560
- | EQ-Bench3 | 63.2 |
561
-
562
- All evaluations were done using [NeMo-Skills](https://github.com/NVIDIA/NeMo-Skills/tree/main/docs) & [Orak](https://github.com/krafton-ai/Orak). For Orak we evaluated on three games (Super Mario, Darkest Dungeon & StarDew Valley)
563
-
564
  ## Inference
565
 
566
  - Engines: HF, vLLM, llama-cpp, TRT-LLM, SGLang
 
75
 
76
  Governing Terms: Use of this model is governed by the [NVIDIA Nemotron Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-nemotron-open-model-license/).
77
 
78
+ #### Evaluation Results:
79
+
80
+ #### Benchmark Results (Reasoning On)
81
+
82
+ We evaluated our model in \*\*Reasoning-off\*\* mode across these benchmarks
83
+
84
+ | Benchmark | NVIDIA-Nemotron-3-Nano-4B-BF16 |
85
+ | :---- | ----- |
86
+ | BFCL v3 | 61.1 |
87
+ | IFBench-Prompt | 43.2 |
88
+ | IFBench-Instruction | 44.2 |
89
+ | Orak | 22.9 |
90
+ | IFEval-Prompt | 82.8 |
91
+ | IFEval-Instruction | 88 |
92
+ | HaluEval | 62.2 |
93
+ | RULER (128k) | 91.1 |
94
+ | Tau2-Airline | 28.0 |
95
+ | Tau2-Retail | 34.8 |
96
+ | Tau2-Telecom | 24.9 |
97
+ | EQ-Bench3 | 63.2 |
98
+
99
+ We also evaluated our model in \*\*Reasoning-On\*\* mode across these benchmarks.
100
+
101
+ | Benchmark | NVIDIA-Nemotron-3-Nano-4B-BF16 |
102
+ | :---- | :---: |
103
+ | AIME25 | 78.5 |
104
+ | MATH500 | 95.4 |
105
+ | GPQA | 53.2 |
106
+ | LCB | 51.8 |
107
+ | BFCL v3 | 61.1 |
108
+ | IFEVAL-Prompt | 87.9 |
109
+ | IFEVAL-Instruction | 92 |
110
+ | Tau2-Airline | 33.3 |
111
+ | Tau2-Retail | 39.8 |
112
+ | Tau2-Telecom | 33 |
113
+
114
+ All evaluations were done using [NeMo-Skills](https://github.com/NVIDIA/NeMo-Skills/tree/main/docs) & [Orak](https://github.com/krafton-ai/Orak). For Orak we evaluated on three games (Super Mario, Darkest Dungeon & StarDew Valley)
115
+
116
  ### Deployment Geography: Global
117
 
118
  ### Use Case
 
561
  * Data Collection Method by dataset: Hybrid: Human, Synthetic
562
  * Labeling Method by dataset: Hybrid: Automated, Human, Synthetic
563
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
564
  ## Inference
565
 
566
  - Engines: HF, vLLM, llama-cpp, TRT-LLM, SGLang