OLMo-2-7B-Instruct โ€” GGUF Quants

Quantized GGUF versions of allenai/OLMo-2-1124-7B-Instruct โ€” the Allen Institute for AI's fully open 7B language model. OLMo-2 is released under Apache 2.0 with full training data, code, intermediate checkpoints, and evaluation data all publicly available โ€” the most transparent large language model at this scale.

Available Files

File Quant Size Use Case
OLMo-2-7B-Instruct-Q8_0.gguf Q8_0 ~7.4GB Maximum quality
OLMo-2-7B-Instruct-Q6_K.gguf Q6_K ~5.7GB Near-lossless
OLMo-2-7B-Instruct-Q5_K_M.gguf Q5_K_M ~5.0GB High quality
OLMo-2-7B-Instruct-Q4_K_M.gguf Q4_K_M ~4.2GB Recommended default
OLMo-2-7B-Instruct-Q3_K_M.gguf Q3_K_M ~3.5GB Low VRAM
OLMo-2-7B-Instruct-IQ4_XS.gguf IQ4_XS ~3.8GB Imatrix 4-bit
OLMo-2-7B-Instruct-IQ3_XXS.gguf IQ3_XXS ~2.7GB Imatrix 3-bit
OLMo-2-7B-Instruct-IQ2_M.gguf IQ2_M ~2.4GB Imatrix 2-bit
OLMo-2-7B-Instruct-IQ1_S.gguf IQ1_S ~1.6GB Extreme compression
OLMo-2-7B-Instruct-fp16.gguf FP16 ~14.0GB Full precision
imatrix.dat โ€” โ€” Importance matrix

Usage

# llama.cpp
./llama-cli -m OLMo-2-7B-Instruct-Q4_K_M.gguf \
  --ctx-size 4096 -n 512 \
  -p "<|endoftext|><|user|>\nHello!<|assistant|>\n"

# Ollama
ollama run hf.co/DuoNeural/OLMo-2-7B-Instruct-GGUF:Q4_K_M

About OLMo-2-7B

  • Parameters: 7B
  • Context: 4096 tokens
  • Architecture: Modified decoder-only transformer (RoPE, SwiGLU, QK-norm)
  • Training Data: Dolmino Mix 1124 โ€” fully open dataset
  • License: Apache 2.0 (commercial use OK)
  • Unique value: All artifacts open โ€” weights, data, code, evals, checkpoints

OLMo-2 is the gold standard for research reproducibility in the 7B class. If you need a model where you can trace every decision back to first principles, this is it.


Quantized by DuoNeural using llama.cpp on RTX 5090.


DuoNeural

DuoNeural is an open AI research lab โ€” human + AI in collaboration.

DuoNeural Research Publications

Open access, CC BY 4.0. Authored by Archon, Jesse Caldwell, Aura โ€” DuoNeural.

Downloads last month
765
GGUF
Model size
7B params
Architecture
olmo2
Hardware compatibility
Log In to add your hardware

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for DuoNeural/OLMo-2-7B-Instruct-GGUF