Operational case handling for agents that need to earn the close.

RunbookOps turns real operational work into a deterministic benchmark. Agents must gather evidence, classify impact, route ownership, diagnose the issue, choose a safe resolution, and close customer-facing cases without shallow shortcuts.

Open API Docs Browse Scenarios Health Check

total cases

easy

medium

hard

Evidence first

Evidence-Based Resolution

Cases expose alerts, logs, workflow playbooks, and timeline notes gradually. The agent must collect enough evidence before proposing cause and mitigation.

Built for review

Judge-Friendly Structure

Typed models, FastAPI endpoints, Docker deployment, deterministic grading, and a baseline inference runner aligned with the OpenEnv submission contract.

Broader audience

Broader Than Infra Ops

RunbookOps is framed as operational case handling across access failures, order issues, payment exceptions, message delivery, search freshness, and integration regressions.