Running CorrSteer: Correlation-Based Steering of Language Models via Sparse Autoencoders 🧭 Steer language model output by clicking visual layers
Sleeping Control Reinforcement Learning 🎛 Explore token-level LLM steering with feature visualizations