yifanyu
/

I-DLM-8B

Text Generation

feature-extraction

introspective-decoding

Model card Files Files and versions

Fix SDAR modeling for transformers v5 (rope default / DynamicCache / _init_weights)

#2

by kashif HF Staff - opened 14 days ago

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

No description provided.

Fix SDAR modeling for transformers v5 (rope default / DynamicCache / _init_weights)8f130ff3

Default store_kv from use_regular_causal so strict-causal callers use HF cache convention93d2fc3d

Add strict-causal SDPA path that handles GQA + non-square q/k cleanlye1ceb2a9

Guard against None entries in v5 lazy-init cache layers during retrieve-only pathfded1ba4

Treat store_kv=None as use config default (upstream forwards thread None through)01ac9c04

v5-align: remove store_kv; always update() on cached forward16de4810

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment