---
language:
- en
- zh
license: apache-2.0
base_model: Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2
tags:
- qwen3.5
- reasoning
- quantized
- awq
- autoawq
- 4bit
- int4
- deltanet
- chain-of-thought
- mtp
pipeline_tag: text-generation
library_name: transformers
model_name: Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-AWQ-4bit
quantized_by: mconcat
---

# Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-AWQ-4bit

AutoAWQ-format 4-bit quantized version of [Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2](https://huggingface.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2).

This checkpoint keeps the same hybrid Qwen3.5 DeltaNet + softmax architecture and Qwen3.5 MTP head as the BF16 source, but exports an AutoAWQ-compatible W4A16 checkpoint for broader AWQ tooling and runtime compatibility.

The published folder includes:
- `model-00001-of-00011.safetensors` ... `model-00011-of-00011.safetensors`
- `model.safetensors.index.json`
- `model.mtp.safetensors`
- `quantization_config.json`
- `processor_config.json`
- `preprocessor_config.json`
- `video_preprocessor_config.json`

## Verified Inference

Local export was completed on **2026-03-31** on a single **NVIDIA RTX PRO 6000 Blackwell Workstation Edition (96 GB)** with:

- `auto-round==0.10.2`
- `transformers==5.3.0`
- `vllm==0.17.1`

What was verified in that run:

- the AutoAWQ export completed successfully
- `quantization_config.json` was written with `quant_method=awq`
- the output uses `bits=4`, `group_size=128`, `sym=false`, `zero_point=true`
- `model.mtp.safetensors` was restored into the output folder

Full local vLLM serve validation for this exact AWQ v2 export is still pending.

## Quantization Strategy

AutoRound AutoAWQ export using W4A16 asymmetric group-wise quantization:

| Precision | Layers |
|-----------|--------|
| **INT4 weights + BF16 activations** | most quantized linear layers |
| **BF16** | `lm_head`, `embed_tokens`, `self_attn.o_proj`, DeltaNet `linear_attn.out_proj`, DeltaNet `in_proj_a`/`in_proj_b`, visual encoder, MTP sidecar |

AWQ details:

- weights: INT4
- activations: BF16/FP16 at inference time
- group size: `128`
- asymmetric quantization: `sym=false`
- zero point: `true`
- format: AutoAWQ `gemm`

Architecture match with the BF16 source:

- `model_type=qwen3_5`
- `64` text layers
- `full_attention_interval=4`
- `mtp_num_hidden_layers=1`
- `max_position_embeddings=262144`

## Local Benchmark Slice

No local benchmark slice is included yet for this AWQ v2 export.

The export completed successfully and the checkpoint layout is ready for upload, but full serve/runtime validation is still pending.

## Usage

### vLLM

```bash
pip install -U vllm==0.17.1 transformers==5.3.0
```

Expected serving command:

```bash
vllm serve mconcat/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-AWQ-4bit \
  --max-model-len 262144 \
  --gpu-memory-utilization 0.85 \
  --max-num-seqs 1 \
  --skip-mm-profiling \
  --reasoning-parser qwen3
```

With MTP enabled:

```bash
vllm serve mconcat/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-AWQ-4bit \
  --max-model-len 262144 \
  --gpu-memory-utilization 0.85 \
  --max-num-seqs 1 \
  --skip-mm-profiling \
  --reasoning-parser qwen3 \
  --speculative-config '{"method":"mtp","num_speculative_tokens":1}'
```

### Transformers

This export is not intended for plain `transformers` inference.

Use a runtime that understands AutoAWQ-format checkpoints, such as vLLM with AWQ support.

## Compatibility

| Framework | Supported | Notes |
|-----------|-----------|-------|
| vLLM >= 0.17.0 | Expected | Intended serving path for this AutoAWQ export; exact local serve validation still pending |
| transformers >= 5.3.0 | No | Plain `transformers` is not the intended inference path for this AutoAWQ checkpoint |
| AutoAWQ-compatible runtimes | Expected | Export format is AutoAWQ-style `quant_method=awq`, `version=gemm` |
| SGLang | Unknown | Not verified for this export |

## Notes

- This is an AutoAWQ-format export, not the compressed-tensors AWQ format.
- The output keeps `self_attn.o_proj` and DeltaNet `linear_attn.out_proj` in BF16 rather than 4-bit.
- The output folder includes the Qwen3.5 MTP sidecar and processor metadata needed for serving compatibility.