MiMo Audio Tokenizer (encoder only) -- GGUF

GGUF conversion of the encoder from XiaomiMiMo/MiMo-Audio-Tokenizer for use with CrispStrobe/CrispASR.

Available variants

File Quant Size Notes
mimo-tokenizer-q4_k.gguf Q4_K 377 MB Encoder + RVQ codebooks

Model details

  • Architecture: 32-layer transformer encoder (1280d, 20 heads) + Conv1d stem + 20 RVQ codebooks
  • Parameters: ~600M (encoder only, decoder/vocoder excluded)
  • Audio: 24kHz input, outputs RVQ tokens at 25 Hz (8 channels used by ASR)
  • License: MIT
  • Source: XiaomiMiMo/MiMo-Audio-Tokenizer

Notes

  • Only the encoder is included (waveform โ†’ RVQ tokens). Decoder/vocoder (TTS reconstruction) excluded.
  • Used as the first stage of MiMo-V2.5-ASR pipeline (tokenizer โ†’ LLM)
Downloads last month
61
GGUF
Model size
0.6B params
Architecture
mimo_tokenizer
Hardware compatibility
Log In to add your hardware
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for cstr/mimo-tokenizer-GGUF

Quantized
(1)
this model