Compare
The compare command computes deltas between two evaluation runs for A/B testing.
Run two evaluations and compare them:
agentv eval evals/my-eval.yaml --out before.jsonl# ... make changes to your agent ...agentv eval evals/my-eval.yaml --out after.jsonlagentv compare before.jsonl after.jsonlOptions
Section titled “Options”Threshold
Section titled “Threshold”Set a minimum delta to highlight significant changes:
agentv compare before.jsonl after.jsonl --threshold 0.1Output
Section titled “Output”The comparison shows:
- Wins — cases where scores improved
- Losses — cases where scores regressed
- Ties — cases with no significant change
- Mean delta — average score change across all cases
This helps identify whether changes to your agent or prompts improved or regressed performance.