MiMo Audio Tokenizer (encoder only) -- GGUF
GGUF conversion of the encoder from XiaomiMiMo/MiMo-Audio-Tokenizer for use with CrispStrobe/CrispASR.
Available variants
| File | Quant | Size | Notes |
|---|---|---|---|
mimo-tokenizer-q4_k.gguf |
Q4_K | 377 MB | Encoder + RVQ codebooks |
Model details
- Architecture: 32-layer transformer encoder (1280d, 20 heads) + Conv1d stem + 20 RVQ codebooks
- Parameters: ~600M (encoder only, decoder/vocoder excluded)
- Audio: 24kHz input, outputs RVQ tokens at 25 Hz (8 channels used by ASR)
- License: MIT
- Source:
XiaomiMiMo/MiMo-Audio-Tokenizer
Notes
- Only the encoder is included (waveform โ RVQ tokens). Decoder/vocoder (TTS reconstruction) excluded.
- Used as the first stage of MiMo-V2.5-ASR pipeline (tokenizer โ LLM)
- Downloads last month
- 61
Hardware compatibility
Log In to add your hardware
Model tree for cstr/mimo-tokenizer-GGUF
Base model
XiaomiMiMo/MiMo-Audio-Tokenizer