# Qwen3.5-0.8B Chat CoreML

CoreML models for [Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) hybrid chat LLM, optimized for Apple Neural Engine. Requires iOS 18+ / macOS 15+.

## Architecture

- **24 layers**: 18x DeltaNet (linear attention) + 6x GatedAttention (full SDPA)
- **MLState**: Recurrent DeltaNet state + KV cache managed by CoreML
- **INT8 palettization**: ~1 GB weights

## Models

| File | Size | Description |
|------|------|-------------|
| `int8/embedding.mlpackage` | 254 MB | Token embedding lookup |
| `int8/decoder.mlpackage` | 753 MB | Full decoder (24 layers + LM head) |

## Usage

```swift
import Qwen3Chat

let chat = try await Qwen35CoreMLChat.fromPretrained(quantization: .int8)
let response = try chat.generate(messages: [
    ChatMessage(role: .user, content: "Hello!")
])
```

## Conversion

```bash
python scripts/convert_qwen35_chat_coreml.py --output /tmp/qwen35-coreml --quantize int8
```

## License

Apache-2.0

---

- **Guide**: [soniqo.audio/guides/chat](https://soniqo.audio/guides/chat)
- **Docs**: [soniqo.audio](https://soniqo.audio)
- **GitHub**: [soniqo/speech-swift](https://github.com/soniqo/speech-swift)