Skip to content

Running Evaluations

Terminal window
agentv eval evals/my-eval.yaml

Results are written to .agentv/results/eval_<timestamp>.jsonl.

Run against a different target than specified in the eval file:

Terminal window
agentv eval --target azure_base evals/**/*.yaml

Run a single eval case by ID:

Terminal window
agentv eval --eval-id case-123 evals/my-eval.yaml

Test the harness flow with mock responses (does not call real providers):

Terminal window
agentv eval --dry-run evals/my-eval.yaml
Terminal window
agentv eval evals/my-eval.yaml --out results/baseline.jsonl

Check eval files for schema errors without executing:

Terminal window
agentv validate evals/my-eval.yaml

Run agentv eval --help for the full list of options including workers, timeouts, output formats, and trace dumping.