Models That Know How Evaluations Are Designed Score Safer
Paper • 2605.28591 • Published • 6
LLM, trustworthy AI, AI security, privacy, calibration, hallucination
Privacy Collapse: Benign Fine-Tuning Can Break Contextual Privacy in Language Models
Is Multilingual LLM Watermarking Truly Multilingual? A Simple Back-Translation Solution