Running Evaluations

Run an Evaluation

agentv eval evals/my-eval.yaml

Results are written to .agentv/results/eval_<timestamp>.jsonl.

Run against a different target than specified in the eval file:

agentv eval --target azure_base evals/**/*.yaml

Run a single eval case by ID:

agentv eval --eval-id case-123 evals/my-eval.yaml

Test the harness flow with mock responses (does not call real providers):

agentv eval --dry-run evals/my-eval.yaml

agentv eval evals/my-eval.yaml --out results/baseline.jsonl

Check eval files for schema errors without executing:

agentv validate evals/my-eval.yaml

Run agentv eval --help for the full list of options including workers, timeouts, output formats, and trace dumping.