Our new Agentic RL work, CLEANER, is now open-source!

Our paper “CLEANER: Self-Purified Trajectories Boost Agentic Reinforcement Learning” is now available. Paper; Code. CLEANER resolves the credit assignment dilemma in agentic RL by training on self-purified trajectories, achieving SOTA performance with just one-third of the training cost.

Tianshi Xu
Tianshi Xu
Second-Year PhD student

Tianshi Xu is now a second-year Ph.D. student at the School of Integrated Circuit, Peking University. His current research interests include LLM reasoning for math. Previously, he works on efficient and privacy-preserving deep learning.