Transformers
GGUF
falcon-h1
edge
drawing

Table of Contents

  1. TL;DR
  2. Model Details
  3. Training Details
  4. Usage
  5. Evaluation
  6. Citation

TL;DR

Model Details

Model Description

  • Developed by: https://www.tii.ae
  • Model type: Causal decoder-only
  • Architecture: Hybrid Transformers + Mamba architecture
  • Language(s) (NLP): English
  • Number of Parameters: 90M
  • License: Falcon-LLM License

Training details

For more details about the training protocol of this model, please refer to the Falcon-H1-Tiny technical blogpost.

Usage

Currently to use this model you can either rely on Hugging Face transformers, vLLM, sglang, llama.cpp, ollama or mlx library. You should use this model for Python code generation or Python fill-in-the-middle task. The FIM formart is the following:

<|prefix|>{prefix}<|suffix|>{suffix}<|middle|>

Inference

llama.cpp

You can find all GGUF files compatible with llama.cpp under our official collection - an example setup could be:

brew install llama.cpp 
pip install huggingface_hub 
hf download tiiuae/Falcon-H1-Tiny-90M-Coder-GGUF Falcon-H1-Tiny-90M-Coder-GGUF-Q8_0.gguf --local-dir ./ 
llama-cli ./Falcon-H1-Tiny-90M-Coder-GGUF-Q8_0.gguf -cnv 

ollama

ollama run hf.co/tiiuae/Falcon-H1-Tiny-90M-Coder-GGUF:Q8_0 

Evaluation

For detailed evaluation of Falcon-H1-Tiny series, please refer to our technical blogpost

Useful links

Citation

If the Falcon-H1-Tiny family of models were helpful to your work, feel free to give us a cite.

@misc{falcon_h1_tiny,
  title={Falcon-H1-Tiny: A series of extremely small, yet powerful language models redefining capabilities at small scale},
  author={Falcon-LLM Team},
  year={2026}, 
}
Downloads last month
2,266
GGUF
Model size
0.1B params
Architecture
falcon-h1
Hardware compatibility
Log In to view the estimation

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including tiiuae/Falcon-H1-Tiny-Coder-90M-GGUF