Qwen3.5-0.8B Chat CoreML

CoreML models for Qwen3.5-0.8B hybrid chat LLM, optimized for Apple Neural Engine. Requires iOS 18+ / macOS 15+.

Architecture

24 layers: 18x DeltaNet (linear attention) + 6x GatedAttention (full SDPA)
MLState: Recurrent DeltaNet state + KV cache managed by CoreML
INT8 palettization: ~1 GB weights

Models

File	Size	Description
`int8/embedding.mlpackage`	254 MB	Token embedding lookup
`int8/decoder.mlpackage`	753 MB	Full decoder (24 layers + LM head)

Usage

import Qwen3Chat

let chat = try await Qwen35CoreMLChat.fromPretrained(quantization: .int8)
let response = try chat.generate(messages: [
    ChatMessage(role: .user, content: "Hello!")
])

Conversion

python scripts/convert_qwen35_chat_coreml.py --output /tmp/qwen35-coreml --quantize int8

License

Apache-2.0

Docs: soniqo.audio/guides/chat
GitHub: soniqo/speech-swift

Downloads last month: 236

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including aufklarer/Qwen3.5-0.8B-Chat-CoreML

CoreML Speech Models

Collection

Speech AI models for Apple Neural Engine via CoreML. iOS/macOS ready. ASR, TTS, VAD, diarization. • 14 items • Updated 3 days ago • 1