Research insights and technical deep-dives from the Gen-Verse team at Princeton.
Every interaction produces a free training signal. OpenClaw-RL recovers all of them — fusing process rewards, token-level distillation, and fully async infrastructure into one framework. The convergence of our research lines in reward design, structured feedback, and scalable RL systems.
We proved in the LLM reasoning era that high-level structured guidelines — reusable blueprints we called thought templates — are a uniquely powerful representation. Today the agent community calls them skills. From inference-time retrieval to RL-optimized planning to live interaction-driven generation.
Building AI systems that co-evolve environments, policies, and reward models — from code generation with reinforcement learning to personalized agent fine-tuning through natural conversation.
Advancing the frontier of diffusion-based language modeling — from reinforcement learning frameworks for discrete diffusion to multimodal generation and efficient parallel decoding.