nvidia
/

NVIDIA-Nemotron-3-Nano-4B-BF16

@@ -75,6 +75,44 @@ This model is ready for commercial use.
 Governing Terms: Use of this model is governed by the [NVIDIA Nemotron Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-nemotron-open-model-license/).
 ### Deployment Geography: Global
 ### Use Case
@@ -523,44 +561,6 @@ The GitHub Crawl was collected using the GitHub REST API and the Amazon S3 API.
 * Data Collection Method by dataset: Hybrid: Human, Synthetic
 * Labeling Method by dataset: Hybrid: Automated, Human, Synthetic
-#### Evaluation Results:
-#### Benchmark Results (Reasoning On)
-We evaluated our model in \*\*Reasoning-On\*\* mode across these benchmarks.
-| Benchmark | NVIDIA-Nemotron-3-Nano-4B-BF16 |
-| :---- | :---: |
-| AIME25 | 78.5 |
-| MATH500 | 95.4 |
-| GPQA | 53.2 |
-| LCB | 51.8 |
-| BFCL v3 | 61.1 |
-| IFEVAL-Prompt | 87.9 |
-| IFEVAL-Instruction | 92 |
-| Tau2-Airline | 33.3 |
-| Tau2-Retail | 39.8 |
-| Tau2-Telecom | 33 |
-We also evaluated our model in \*\*Reasoning-off\*\* mode across these benchmarks
-| Benchmark | NVIDIA-Nemotron-3-Nano-4B-BF16 |
-| :---- | ----- |
-| BFCL v3 | 61.1 |
-| IFBench-Prompt | 43.2 |
-| IFBench-Instruction | 44.2 |
-| Orak  | 22.9 |
-| IFEval-Prompt | 82.8 |
-| IFEval-Instruction | 88 |
-| HaluEval | 62.2 |
-| RULER (128k) | 91.1 |
-| Tau2-Airline | 28.0 |
-| Tau2-Retail | 34.8 |
-| Tau2-Telecom | 24.9 |
-| EQ-Bench3 | 63.2 |
-All evaluations were done using [NeMo-Skills](https://github.com/NVIDIA/NeMo-Skills/tree/main/docs) & [Orak](https://github.com/krafton-ai/Orak). For Orak we evaluated on three games (Super Mario, Darkest Dungeon & StarDew Valley)
 ## Inference
 - Engines: HF, vLLM, llama-cpp, TRT-LLM, SGLang

 Governing Terms: Use of this model is governed by the [NVIDIA Nemotron Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-nemotron-open-model-license/).
+#### Evaluation Results:
+#### Benchmark Results (Reasoning On)
+We evaluated our model in \*\*Reasoning-off\*\* mode across these benchmarks
+| Benchmark | NVIDIA-Nemotron-3-Nano-4B-BF16 |
+| :---- | ----- |
+| BFCL v3 | 61.1 |
+| IFBench-Prompt | 43.2 |
+| IFBench-Instruction | 44.2 |
+| Orak  | 22.9 |
+| IFEval-Prompt | 82.8 |
+| IFEval-Instruction | 88 |
+| HaluEval | 62.2 |
+| RULER (128k) | 91.1 |
+| Tau2-Airline | 28.0 |
+| Tau2-Retail | 34.8 |
+| Tau2-Telecom | 24.9 |
+| EQ-Bench3 | 63.2 |
+We also evaluated our model in \*\*Reasoning-On\*\* mode across these benchmarks.
+| Benchmark | NVIDIA-Nemotron-3-Nano-4B-BF16 |
+| :---- | :---: |
+| AIME25 | 78.5 |
+| MATH500 | 95.4 |
+| GPQA | 53.2 |
+| LCB | 51.8 |
+| BFCL v3 | 61.1 |
+| IFEVAL-Prompt | 87.9 |
+| IFEVAL-Instruction | 92 |
+| Tau2-Airline | 33.3 |
+| Tau2-Retail | 39.8 |
+| Tau2-Telecom | 33 |
+All evaluations were done using [NeMo-Skills](https://github.com/NVIDIA/NeMo-Skills/tree/main/docs) & [Orak](https://github.com/krafton-ai/Orak). For Orak we evaluated on three games (Super Mario, Darkest Dungeon & StarDew Valley)
 ### Deployment Geography: Global
 ### Use Case
 * Data Collection Method by dataset: Hybrid: Human, Synthetic
 * Labeling Method by dataset: Hybrid: Automated, Human, Synthetic
 ## Inference
 - Engines: HF, vLLM, llama-cpp, TRT-LLM, SGLang