Coding Agents
Coding agent targets evaluate AI coding assistants and CLI-based agents. These targets require a judge_target to run LLM-based evaluators.
Claude Code
Section titled “Claude Code”targets: - name: claude_code provider: claude-code judge_target: azure_baseCodex CLI
Section titled “Codex CLI”targets: - name: codex_target provider: codex judge_target: azure_basePi Coding Agent
Section titled “Pi Coding Agent”targets: - name: pi_target provider: pi-coding-agent judge_target: azure_baseVS Code / Copilot
Section titled “VS Code / Copilot”targets: - name: vscode_dev provider: vscode workspace_template: ${{ WORKSPACE_PATH }} judge_target: azure_base| Field | Required | Description |
|---|---|---|
workspace_template | Yes | Path to workspace template directory |
judge_target | Yes | LLM target for evaluation |
VS Code Insiders
Section titled “VS Code Insiders”targets: - name: vscode_insiders provider: vscode-insiders workspace_template: ${{ WORKSPACE_PATH }} judge_target: azure_baseSame configuration as VS Code.
Custom CLI Agent
Section titled “Custom CLI Agent”Evaluate any command-line agent:
targets: - name: local_agent provider: cli command_template: 'python agent.py --prompt {PROMPT}' judge_target: azure_base| Field | Required | Description |
|---|---|---|
command_template | Yes | Command to run. {PROMPT} is replaced with the input. |
judge_target | Yes | LLM target for evaluation |
Mock Provider
Section titled “Mock Provider”For testing the evaluation harness without calling real providers:
targets: - name: mock_target provider: mock