Creating Test Cases

Author test cases with expected outputs, tags, and scoring criteria for automated evaluation

Overview

Test cases are the foundation of OpenRails evaluations. Each test case defines an input prompt, expected output, and scoring criteria. When an evaluation runs, the system sends the input to your bot or agent, compares the response against the expected output, and assigns a pass/fail score.

Evaluation Projects page

Create a Test Case

Create Evaluation Project dialog

Navigate to Evaluations

From the sidebar, click Evaluations and select an evaluation project (or create one).

Click "Add Test Case"

Click Add Test Case to open the test case editor.

Define the Input

Enter the input prompt that will be sent to the bot or agent. This should be a realistic user query or task description.

Define the Expected Output

Provide the expected response or output. When the evaluation runs, the AI's response will be compared against this expected output and scored for accuracy, confidence, and latency.

Add Tags

Tag the test case for organization and filtering:

  • Category Tags — e.g., "product-info", "pricing", "troubleshooting"
  • Priority Tags — e.g., "critical", "regression", "edge-case"

Configure Scoring

Set the scoring criteria:

  • Scoring Method — Choose between exact, contains, semantic, or pattern match
  • Pass Threshold — Minimum score required to pass (for semantic similarity)
  • Weight — Relative importance of this test case in the overall score

Save Test Case

Click Save to add the test case to the evaluation project.

Bulk Import

For large test suites, you can import test cases from a CSV file:

Test Case Best Practices

Tip: Start with a small set of critical test cases (10-20) and expand over time. Focus on the queries that matter most to your users before building comprehensive coverage.
Important: Test cases with semantic similarity scoring consume LLM tokens for the judge evaluation. Factor this into your budget when planning large evaluation runs.

Next Steps