Add average row to benchmarks, add datasets to YAML card
Browse files
README.md
CHANGED
|
@@ -10,6 +10,9 @@ tags:
|
|
| 10 |
- language-model
|
| 11 |
- canon-layers
|
| 12 |
- rope-yarn
|
|
|
|
|
|
|
|
|
|
| 13 |
library_name: transformers
|
| 14 |
pipeline_tag: text-generation
|
| 15 |
---
|
|
@@ -139,8 +142,9 @@ Evaluated using [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluat
|
|
| 139 |
| ARC-Challenge | 26.62% | 25.51% | -1.11 |
|
| 140 |
| MMLU (5-shot) | 22.95% | 22.95% | 0.00 |
|
| 141 |
| SciQ | 22.00% | 21.30% | -0.70 |
|
|
|
|
| 142 |
|
| 143 |
-
Context extension to 32K preserved short-context benchmark performance with negligible change
|
| 144 |
|
| 145 |
## Context Extension Results
|
| 146 |
|
|
|
|
| 10 |
- language-model
|
| 11 |
- canon-layers
|
| 12 |
- rope-yarn
|
| 13 |
+
datasets:
|
| 14 |
+
- codelion/sutra-10B
|
| 15 |
+
- allenai/dolma3_longmino_mix-50B-1025
|
| 16 |
library_name: transformers
|
| 17 |
pipeline_tag: text-generation
|
| 18 |
---
|
|
|
|
| 142 |
| ARC-Challenge | 26.62% | 25.51% | -1.11 |
|
| 143 |
| MMLU (5-shot) | 22.95% | 22.95% | 0.00 |
|
| 144 |
| SciQ | 22.00% | 21.30% | -0.70 |
|
| 145 |
+
| **Average** | **35.56%** | **35.62%** | **+0.06** |
|
| 146 |
|
| 147 |
+
Context extension to 32K preserved short-context benchmark performance with negligible change.
|
| 148 |
|
| 149 |
## Context Extension Results
|
| 150 |
|