OpenEnv RL Challenge
Deterministic and offline

RunbookOps / CaseOps Benchmark

Operational case handling for agents that need to earn the close.

RunbookOps turns real operational work into a deterministic benchmark. Agents must gather evidence, classify impact, route ownership, diagnose the issue, choose a safe resolution, and close customer-facing cases without shallow shortcuts.

15
total cases
5
easy
5
medium
5
hard
Evidence first

Evidence-Based Resolution

Cases expose alerts, logs, workflow playbooks, and timeline notes gradually. The agent must collect enough evidence before proposing cause and mitigation.

Built for review

Judge-Friendly Structure

Typed models, FastAPI endpoints, Docker deployment, deterministic grading, and a baseline inference runner aligned with the OpenEnv submission contract.

Broader audience

Broader Than Infra Ops

RunbookOps is framed as operational case handling across access failures, order issues, payment exceptions, message delivery, search freshness, and integration regressions.