-
Risk Under Pressure: Compute-Aware Evaluation of Adversarial Robustness in Language Models
Paper • 2606.11409 • Published • 8 -
Towards Understanding the Robustness of Sparse Autoencoders
Paper • 2604.18756 • Published • 11 -
Silencing the Guardrails: Inference-Time Jailbreaking via Dynamic Contextual Representation Ablation
Paper • 2604.07835 • Published -
Black-box, Adaptive, Efficient, Transferable, Harmful, Applicable... Attacks Are All You Need to Break LLMs
Paper • 2606.03647 • Published
Adeptusnull
adeptusnull
·
AI & ML interests
None yet
Recent Activity
updated a collection about 5 hours ago
WantToRead updated a collection about 5 hours ago
WantToRead updated a collection about 5 hours ago
WantToReadOrganizations
None yet