Leandro von Werra PRO

lvwerra

huggingface

·

https://www.lvwerra.com

AI & ML interests

NLP and RL

Recent Activity

new activity 27 minutes ago

rl-llm-wiki/knowledge-base:fix: entropy-and-exploration — add Clip-Higher as the KL-free-reasoning-RL entropy-collapse counterweight

new activity about 1 hour ago

rl-llm-wiki/knowledge-base:topic: sycophancy-and-misgeneralization — add Perez et al. (origin of measured sycophancy + RLHF inverse-scaling)

new activity about 1 hour ago

rl-llm-wiki/knowledge-base:topic: nash-and-game-theoretic-po — fold SPPO (now processed, #331) as the squared-error self-play instantiation

View all activity

Organizations

New activity in rl-llm-wiki/knowledge-base 27 minutes ago

fix: entropy-and-exploration — add Clip-Higher as the KL-free-reasoning-RL entropy-collapse counterweight

#333 opened 38 minutes ago by

New activity in rl-llm-wiki/knowledge-base about 1 hour ago

topic: sycophancy-and-misgeneralization — add Perez et al. (origin of measured sycophancy + RLHF inverse-scaling)

#330 opened about 2 hours ago by

topic: nash-and-game-theoretic-po — fold SPPO (now processed, #331) as the squared-error self-play instantiation

#332 opened about 1 hour ago by

New activity in rl-llm-wiki/knowledge-base about 2 hours ago

source: arxiv:2405.00675 — Self-Play Preference Optimization (SPPO)

#331 opened about 2 hours ago by

topic: dpo-and-offline-po — add the online/iterative-DPO recipe (closes the off-policy gap it flags)

#329 opened about 2 hours ago by

topic: rlhf-ppo-pipeline — add the clipped-surrogate mechanism + trust-region runnable check

#328 opened about 2 hours ago by

New activity in rl-llm-wiki/knowledge-base about 3 hours ago

fix: eval-cluster consistency pass — add missing back-links (bidirectional navigation)

#317 opened about 7 hours ago by

topic: preference-reward-models — add BT-fit runnable check + RM design-space table

#327 opened about 3 hours ago by

topic: overoptimization-and-mode-collapse — deepen with Kirk et al. (primary diversity evidence + generalisation↔diversity tradeoff)

#325 opened about 4 hours ago by

topic: reward-hacking — formal mechanism, symptom table, Goodhart runnable check (+ fix dup source)

#324 opened about 4 hours ago by

topic: nash-and-game-theoretic-po — add runnable check demonstrating intransitivity→Nash

#326 opened about 4 hours ago by

New activity in rl-llm-wiki/knowledge-base about 5 hours ago

topic: process-vs-outcome-rewards — add mechanism, design-space table, runnable trace-error check

#322 opened about 5 hours ago by

topic: rl-for-math-and-code — add verifier mechanism, results table, runnable check (structural enrichment)

#323 opened about 5 hours ago by

topic: reward-model-ensembles — deepen to the flagship bar (12.2KB → 16.2KB)

#321 opened about 6 hours ago by

New activity in rl-llm-wiki/knowledge-base about 6 hours ago

topic: human-preference-collection — deepen to the flagship bar (11.2KB → 16.7KB)

#320 opened about 6 hours ago by

topic: human-preference-collection — deepen to the flagship bar (11.2KB → 16.7KB)

#320 opened about 6 hours ago by

topic: reward-model-ensembles — deepen to the flagship bar (12.2KB → 16.2KB)

#321 opened about 6 hours ago by

topic: ai-feedback-data — deepen to the flagship bar (10.7KB → 18.5KB)

#318 opened about 7 hours ago by

topic: reasoning-emergence §5 — add the mechanism (cognitive behaviors + entropy collapse) to the created-vs-surfaced debate

#319 opened about 7 hours ago by

New activity in rl-llm-wiki/knowledge-base about 7 hours ago

fix: eval-cluster consistency pass — add missing back-links (bidirectional navigation)

#317 opened about 7 hours ago by