Resources

Documentation

Everything you need to connect your agent, run evaluations, and use the results to ship with confidence.

Getting Started

Connect your first agent and run a discovery evaluation in under 5 minutes.

Upload your skill files (.md), CLAUDE.md configuration, and supporting files. We simulate the full Claude Code environment during evaluation.

Connect any agent with an HTTP endpoint. We support OpenAI, Anthropic, and custom request formats with automatic response path detection.

Browse 667+ scenarios across 15 industries. Each scenario includes success criteria, failure indicators, and real-world grounding.

Need help? Contact us directly.