Execute manual and scheduled evaluation runs with configurable parameters
Evaluation runs execute your test cases against a bot or agent and produce pass/fail results. Runs can be triggered manually for on-demand testing or scheduled via cron for continuous quality monitoring.
Navigate to Evaluations and select the project containing your test cases.
Click New Run and configure the run parameters:
Click Run to begin execution. Each test case is sent to the target sequentially.
The run dashboard shows real-time progress: total test cases, completed, passed, and failed. Individual test results stream in as they complete.
When the run completes, the results summary shows overall pass rate, individual test case outcomes, and any errors. See Reviewing Results for detailed analysis.
In the evaluation project, go to Settings > Schedule.
Enter a cron expression for the run frequency (e.g., 0 9 * * MON for every Monday at 9 AM).
Set the same parameters as a manual run: target, test case filters, and model override.
Toggle the schedule to Enabled and save. Runs will execute automatically at the configured times.
Each evaluation run follows this sequence: