Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
yifanyu
/
I-DLM-8B
like
10
Text Generation
Transformers
Safetensors
English
sdar
feature-extraction
diffusion-lm
introspective-decoding
conversational
custom_code
arxiv:
2604.11035
License:
apache-2.0
Model card
Files
Files and versions
xet
Community
2
Deploy
Use this model
Fix SDAR modeling for transformers v5 (rope default / DynamicCache / _init_weights)
#2
by
kashif
HF Staff
- opened
14 days ago
base:
refs/heads/main
←
from:
refs/pr/2
Discussion
Files changed
+121
-490
kashif
14 days ago
No description provided.
Fix SDAR modeling for transformers v5 (rope default / DynamicCache / _init_weights)
8f130ff3
Default store_kv from use_regular_causal so strict-causal callers use HF cache convention
93d2fc3d
Add strict-causal SDPA path that handles GQA + non-square q/k cleanly
e1ceb2a9
Guard against None entries in v5 lazy-init cache layers during retrieve-only path
fded1ba4
Treat store_kv=None as use config default (upstream forwards thread None through)
01ac9c04
v5-align: remove store_kv; always update() on cached forward
16de4810
Edit
Preview
Upload images, audio, and videos by dragging in the text input, pasting, or
clicking here
.
Tap or paste here to upload images
Ready to merge
This branch is ready to get merged automatically.
Comment
·
Sign up
or
log in
to comment