YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Qwen3.5-0.8B Chat CoreML

CoreML models for Qwen3.5-0.8B hybrid chat LLM, optimized for Apple Neural Engine. Requires iOS 18+ / macOS 15+.

Architecture

  • 24 layers: 18x DeltaNet (linear attention) + 6x GatedAttention (full SDPA)
  • MLState: Recurrent DeltaNet state + KV cache managed by CoreML
  • INT8 palettization: ~1 GB weights

Models

File Size Description
int8/embedding.mlpackage 254 MB Token embedding lookup
int8/decoder.mlpackage 753 MB Full decoder (24 layers + LM head)

Usage

import Qwen3Chat

let chat = try await Qwen35CoreMLChat.fromPretrained(quantization: .int8)
let response = try chat.generate(messages: [
    ChatMessage(role: .user, content: "Hello!")
])

Conversion

python scripts/convert_qwen35_chat_coreml.py --output /tmp/qwen35-coreml --quantize int8

License

Apache-2.0


Downloads last month
236
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including aufklarer/Qwen3.5-0.8B-Chat-CoreML