# Qwen3.5-0.8B Chat CoreML CoreML models for [Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) hybrid chat LLM, optimized for Apple Neural Engine. Requires iOS 18+ / macOS 15+. ## Architecture - **24 layers**: 18x DeltaNet (linear attention) + 6x GatedAttention (full SDPA) - **MLState**: Recurrent DeltaNet state + KV cache managed by CoreML - **INT8 palettization**: ~1 GB weights ## Models | File | Size | Description | |------|------|-------------| | `int8/embedding.mlpackage` | 254 MB | Token embedding lookup | | `int8/decoder.mlpackage` | 753 MB | Full decoder (24 layers + LM head) | ## Usage ```swift import Qwen3Chat let chat = try await Qwen35CoreMLChat.fromPretrained(quantization: .int8) let response = try chat.generate(messages: [ ChatMessage(role: .user, content: "Hello!") ]) ``` ## Conversion ```bash python scripts/convert_qwen35_chat_coreml.py --output /tmp/qwen35-coreml --quantize int8 ``` ## License Apache-2.0 --- - **Guide**: [soniqo.audio/guides/chat](https://soniqo.audio/guides/chat) - **Docs**: [soniqo.audio](https://soniqo.audio) - **GitHub**: [soniqo/speech-swift](https://github.com/soniqo/speech-swift)