Update README.md
Browse files
README.md
CHANGED
|
@@ -75,6 +75,44 @@ This model is ready for commercial use.
|
|
| 75 |
|
| 76 |
Governing Terms: Use of this model is governed by the [NVIDIA Nemotron Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-nemotron-open-model-license/).
|
| 77 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 78 |
### Deployment Geography: Global
|
| 79 |
|
| 80 |
### Use Case
|
|
@@ -523,44 +561,6 @@ The GitHub Crawl was collected using the GitHub REST API and the Amazon S3 API.
|
|
| 523 |
* Data Collection Method by dataset: Hybrid: Human, Synthetic
|
| 524 |
* Labeling Method by dataset: Hybrid: Automated, Human, Synthetic
|
| 525 |
|
| 526 |
-
#### Evaluation Results:
|
| 527 |
-
|
| 528 |
-
#### Benchmark Results (Reasoning On)
|
| 529 |
-
|
| 530 |
-
We evaluated our model in \*\*Reasoning-On\*\* mode across these benchmarks.
|
| 531 |
-
|
| 532 |
-
| Benchmark | NVIDIA-Nemotron-3-Nano-4B-BF16 |
|
| 533 |
-
| :---- | :---: |
|
| 534 |
-
| AIME25 | 78.5 |
|
| 535 |
-
| MATH500 | 95.4 |
|
| 536 |
-
| GPQA | 53.2 |
|
| 537 |
-
| LCB | 51.8 |
|
| 538 |
-
| BFCL v3 | 61.1 |
|
| 539 |
-
| IFEVAL-Prompt | 87.9 |
|
| 540 |
-
| IFEVAL-Instruction | 92 |
|
| 541 |
-
| Tau2-Airline | 33.3 |
|
| 542 |
-
| Tau2-Retail | 39.8 |
|
| 543 |
-
| Tau2-Telecom | 33 |
|
| 544 |
-
|
| 545 |
-
We also evaluated our model in \*\*Reasoning-off\*\* mode across these benchmarks
|
| 546 |
-
|
| 547 |
-
| Benchmark | NVIDIA-Nemotron-3-Nano-4B-BF16 |
|
| 548 |
-
| :---- | ----- |
|
| 549 |
-
| BFCL v3 | 61.1 |
|
| 550 |
-
| IFBench-Prompt | 43.2 |
|
| 551 |
-
| IFBench-Instruction | 44.2 |
|
| 552 |
-
| Orak | 22.9 |
|
| 553 |
-
| IFEval-Prompt | 82.8 |
|
| 554 |
-
| IFEval-Instruction | 88 |
|
| 555 |
-
| HaluEval | 62.2 |
|
| 556 |
-
| RULER (128k) | 91.1 |
|
| 557 |
-
| Tau2-Airline | 28.0 |
|
| 558 |
-
| Tau2-Retail | 34.8 |
|
| 559 |
-
| Tau2-Telecom | 24.9 |
|
| 560 |
-
| EQ-Bench3 | 63.2 |
|
| 561 |
-
|
| 562 |
-
All evaluations were done using [NeMo-Skills](https://github.com/NVIDIA/NeMo-Skills/tree/main/docs) & [Orak](https://github.com/krafton-ai/Orak). For Orak we evaluated on three games (Super Mario, Darkest Dungeon & StarDew Valley)
|
| 563 |
-
|
| 564 |
## Inference
|
| 565 |
|
| 566 |
- Engines: HF, vLLM, llama-cpp, TRT-LLM, SGLang
|
|
|
|
| 75 |
|
| 76 |
Governing Terms: Use of this model is governed by the [NVIDIA Nemotron Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-nemotron-open-model-license/).
|
| 77 |
|
| 78 |
+
#### Evaluation Results:
|
| 79 |
+
|
| 80 |
+
#### Benchmark Results (Reasoning On)
|
| 81 |
+
|
| 82 |
+
We evaluated our model in \*\*Reasoning-off\*\* mode across these benchmarks
|
| 83 |
+
|
| 84 |
+
| Benchmark | NVIDIA-Nemotron-3-Nano-4B-BF16 |
|
| 85 |
+
| :---- | ----- |
|
| 86 |
+
| BFCL v3 | 61.1 |
|
| 87 |
+
| IFBench-Prompt | 43.2 |
|
| 88 |
+
| IFBench-Instruction | 44.2 |
|
| 89 |
+
| Orak | 22.9 |
|
| 90 |
+
| IFEval-Prompt | 82.8 |
|
| 91 |
+
| IFEval-Instruction | 88 |
|
| 92 |
+
| HaluEval | 62.2 |
|
| 93 |
+
| RULER (128k) | 91.1 |
|
| 94 |
+
| Tau2-Airline | 28.0 |
|
| 95 |
+
| Tau2-Retail | 34.8 |
|
| 96 |
+
| Tau2-Telecom | 24.9 |
|
| 97 |
+
| EQ-Bench3 | 63.2 |
|
| 98 |
+
|
| 99 |
+
We also evaluated our model in \*\*Reasoning-On\*\* mode across these benchmarks.
|
| 100 |
+
|
| 101 |
+
| Benchmark | NVIDIA-Nemotron-3-Nano-4B-BF16 |
|
| 102 |
+
| :---- | :---: |
|
| 103 |
+
| AIME25 | 78.5 |
|
| 104 |
+
| MATH500 | 95.4 |
|
| 105 |
+
| GPQA | 53.2 |
|
| 106 |
+
| LCB | 51.8 |
|
| 107 |
+
| BFCL v3 | 61.1 |
|
| 108 |
+
| IFEVAL-Prompt | 87.9 |
|
| 109 |
+
| IFEVAL-Instruction | 92 |
|
| 110 |
+
| Tau2-Airline | 33.3 |
|
| 111 |
+
| Tau2-Retail | 39.8 |
|
| 112 |
+
| Tau2-Telecom | 33 |
|
| 113 |
+
|
| 114 |
+
All evaluations were done using [NeMo-Skills](https://github.com/NVIDIA/NeMo-Skills/tree/main/docs) & [Orak](https://github.com/krafton-ai/Orak). For Orak we evaluated on three games (Super Mario, Darkest Dungeon & StarDew Valley)
|
| 115 |
+
|
| 116 |
### Deployment Geography: Global
|
| 117 |
|
| 118 |
### Use Case
|
|
|
|
| 561 |
* Data Collection Method by dataset: Hybrid: Human, Synthetic
|
| 562 |
* Labeling Method by dataset: Hybrid: Automated, Human, Synthetic
|
| 563 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 564 |
## Inference
|
| 565 |
|
| 566 |
- Engines: HF, vLLM, llama-cpp, TRT-LLM, SGLang
|