DecomposeRL: Learning to Ask Useful, Informative, and Diverse Questions for Semi-Supervised, Traceable Claim Verification
Abstract
DecomposeRL is an accurate claim verifier that produces inspectable traces by framing decomposition as a reinforcement learning policy trained with GRPO and multi-faceted rewards, achieving high performance with minimal labeled data through a data-curation funnel.
Claim verification splits between end-to-end classifiers that are accurate but yields no inspectable traces, and decomposition-based methods produce inspectable traces but lag performance on benchmark datasets. We propose DecomposeRL an accurate claim-verifier that produce inspectable traces. DecomposeRL frames decomposition as an RL policy trained with GRPO and a multi-faceted reward ensemble, enabling both fully supervised and semi-supervised learning from unlabeled claims. DecomposeRL addresses the prohibitive training cost of GRPO with a data-curation funnel that distills 115K fact-verification claims into a compact, learning-signal-dense subset of 5K claims. We show that a DecomposeRL-7B policy trained with full supervision on only ~5K curated claims achieves 86.3 in-domain and 69.8 out-of-domain balanced accuracy across 11 claim-verification benchmarks containing biomedical, political, scientific, and general-domain claims. Despite being 4x smaller, it matches 32B baselines and GPT-4.1-mini, and it further outperforms baselines in a semi-supervised setting with only 10% labeled claims data. Code, data, and models are available at https://dipta007.github.io/DecomposeRL
Get this paper in your agent:
hf papers read 2605.27858 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 9
dipta007/atomicity-single-focus-judge-balanced
Datasets citing this paper 2
dipta007/DecomposeRL
dipta007/decomposeRL-tiny-judge
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper