News

Our new Agentic RL work, CLEANER, is now open-source!
Our paper “CLEANER:Self-Purified Trajectories Boost Agentic Reinforcement Learning” is now available:Paper & Code. CLEANER resolves the credit assignment dilemma in agentic RL by training on self-purified trajectories, achieving SOTA performance with just one-third of the training cost.