CLI-first AI agent evaluation

No server. No signup. No overhead.

Evaluate your AI agents locally with multi-objective scoring from YAML specifications. Deterministic code judges + customizable LLM judges, all version-controlled in Git.

Get Started GitHub

Local Execution

No cloud dependency. All data stays on your machine.

Multi-Objective Scoring

Correctness, latency, cost, and safety in one run.

Code + LLM Judges

Deterministic code validators and customizable LLM judges.

LLM & Agent Targets

Direct LLM providers plus Claude Code, Codex, Pi, Copilot, OpenCode.

Rubric Grading

Structured criteria with weights and auto-generation.

A/B Comparison

Compare evaluation runs with statistical deltas.

Quick Start

Install

npm install -g agentv

Initialize

agentv init

Configure

Copy .env.example to .env and add your API keys.

Create an eval

description: Math evaluation
execution:
  target: default

evalcases:
  - id: addition
    expected_outcome: Correctly calculates 15 + 27 = 42
    input_messages:
      - role: user
        content: What is 15 + 27?

Run

agentv eval ./evals/example.yaml

How AgentV Compares

Feature	AgentV	LangWatch	LangSmith	LangFuse
Setup	`npm install`	Cloud account + API key	Cloud account + API key	Cloud account + API key
Server	None (local)	Managed cloud	Managed cloud	Managed cloud
Privacy	All local	Cloud-hosted	Cloud-hosted	Cloud-hosted
CLI-first	✓	✗	Limited	Limited
CI/CD ready	✓	Requires API calls	Requires API calls	Requires API calls
Version control	✓ (YAML in Git)	✗	✗	✗
Evaluators	Code + LLM + Custom	LLM only	LLM + Code	LLM only