Eval Files
Evaluation files define the test cases, targets, and evaluators for an evaluation run. AgentV supports two formats: YAML and JSONL.
YAML Format
Section titled “YAML Format”The primary format. A single file contains metadata, execution config, and eval cases:
description: Math problem solving evaluationexecution: target: default evaluators: - name: correctness type: llm_judge prompt: ./judges/correctness.md
evalcases: - id: addition expected_outcome: Correctly calculates 15 + 27 = 42 input_messages: - role: user content: What is 15 + 27? expected_messages: - role: assistant content: "42"Top-level Fields
Section titled “Top-level Fields”| Field | Description |
|---|---|
description | Human-readable description of the evaluation |
dataset | Optional dataset identifier |
execution | Default execution config (target, evaluators) |
evalcases | Array of individual test cases |
JSONL Format
Section titled “JSONL Format”For large-scale evaluations, AgentV supports JSONL (JSON Lines) format. Each line is a single eval case:
{"id": "test-1", "expected_outcome": "Calculates correctly", "input_messages": [{"role": "user", "content": "What is 2+2?"}]}{"id": "test-2", "expected_outcome": "Provides explanation", "input_messages": [{"role": "user", "content": "Explain variables"}]}Sidecar Metadata
Section titled “Sidecar Metadata”An optional YAML sidecar file provides metadata and execution config. Place it alongside the JSONL file with the same base name:
dataset.jsonl + dataset.yaml:
description: Math evaluation datasetdataset: math-testsexecution: target: azure_baseevaluator: llm_judgeBenefits of JSONL
Section titled “Benefits of JSONL”- Streaming-friendly — process line by line
- Git-friendly — diffs show individual case changes
- Programmatic generation — easy to create from scripts
- Industry standard — compatible with DeepEval, LangWatch, Hugging Face datasets
Converting Between Formats
Section titled “Converting Between Formats”Use the convert command to switch between YAML and JSONL:
agentv convert evals/dataset.yaml --format jsonlagentv convert evals/dataset.jsonl --format yaml