Qwen 3.5 2B β€” TFLite (.tflite)

Qwen 3.5 2B as raw TFLite format for on-device inference with the TFLite Interpreter API.

For LiteRT-LM Engine usage, use the bundled version instead: paulsp94/Qwen3.5-2B-LiteRT-LM

What's this

Raw .tflite model file β€” use this if you're building your own inference pipeline with the TFLite Interpreter API directly. If you want the ready-to-use LiteRT-LM bundle with tokenizer included, use the LiteRT-LM version instead.

Architecture

Base model Qwen/Qwen3.5-2B
Layers 24 total: 18Γ— GatedDeltaNet linear + 6Γ— GQA full attention
Quantization int8 dynamic
Format TFLite (.tflite)
Size ~1.9 GB

Files

  • qwen35_2b.tflite β€” The converted model
  • tokenizer.json β€” BPE tokenizer (you'll need to handle tokenization yourself)
  • tokenizer_config.json β€” Tokenizer configuration
  • config.json β€” Original model config

Conversion

Source: allot/tools/model-export

Downloads last month
109
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for paulsp94/Qwen3.5-2B-TFLite

Finetuned
Qwen/Qwen3.5-2B
Finetuned
(85)
this model