Qwen 3.5 2B — TFLite (.tflite)

Qwen 3.5 2B as raw TFLite format for on-device inference with the TFLite Interpreter API.

For LiteRT-LM Engine usage, use the bundled version instead: paulsp94/Qwen3.5-2B-LiteRT-LM

What's this

Raw .tflite model file — use this if you're building your own inference pipeline with the TFLite Interpreter API directly. If you want the ready-to-use LiteRT-LM bundle with tokenizer included, use the LiteRT-LM version instead.

Architecture


Base model	Qwen/Qwen3.5-2B
Layers	24 total: 18× GatedDeltaNet linear + 6× GQA full attention
Quantization	int8 dynamic
Format	TFLite (.tflite)
Size	~1.9 GB

Files

qwen35_2b.tflite — The converted model
tokenizer.json — BPE tokenizer (you'll need to handle tokenization yourself)
tokenizer_config.json — Tokenizer configuration
config.json — Original model config

Conversion

Source: allot/tools/model-export

Downloads last month: 109

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for paulsp94/Qwen3.5-2B-TFLite

Base model

Qwen/Qwen3.5-2B-Base

Finetuned

Qwen/Qwen3.5-2B

Finetuned

(85)

this model