Long-horizon coding tasks with code editing, test writing, debugging, programmatic checks, deterministic tests, granular rewards, and Docker-packaged environments for RL and SFT.
Realistic long-horizon workflows across live SaaS apps, production MCP servers, and real tools, with logically consistent state, noisy inputs, and verifiable rewards.