Watch Before You Answer: Learning from Visually Grounded Post-Training
Paper • 2604.05117 • Published • 29
None defined yet.
Tinted Frames: Question Framing Blinds Vision-Language Models
Improving the Straight-Through Estimator with Zeroth-Order Information