Agentic RL

OpenClaw-RL: Let Your Agent Evolve Through Conversation

Every interaction produces a free training signal. OpenClaw-RL recovers all of them — fusing process rewards, token-level distillation, and fully async infrastructure into one framework. The convergence of our research lines in reward design, structured feedback, and scalable RL systems.

OpenClaw-RL ReasonFlux-PRM BoT SuperCorrect RLAnything CURE
Read post
LLM Reasoning

From Thought Templates to Agent Skills

We proved in the LLM reasoning era that high-level structured guidelines — reusable blueprints we called thought templates — are a uniquely powerful representation. Today the agent community calls them skills. From inference-time retrieval to RL-optimized planning to live interaction-driven generation.

BoT SuperCorrect ReasonFlux
Read post
Self-Evolving AI

Self-Evolving AI

Building AI systems that co-evolve environments, policies, and reward models — from code generation with reinforcement learning to personalized agent fine-tuning through natural conversation.

CURE RLAnything OpenClaw-RL
Read post
Diffusion Language Models

Diffusion Language Models

Advancing the frontier of diffusion-based language modeling — from reinforcement learning frameworks for discrete diffusion to multimodal generation and efficient parallel decoding.

TraceRL MMaDA MMaDA-Parallel
Read post