Skip to content

Coding Agents

Coding agent targets evaluate AI coding assistants and CLI-based agents. These targets require a judge_target to run LLM-based evaluators.

targets:
- name: claude_code
provider: claude-code
judge_target: azure_base
targets:
- name: codex_target
provider: codex
judge_target: azure_base
targets:
- name: pi_target
provider: pi-coding-agent
judge_target: azure_base
targets:
- name: vscode_dev
provider: vscode
workspace_template: ${{ WORKSPACE_PATH }}
judge_target: azure_base
FieldRequiredDescription
workspace_templateYesPath to workspace template directory
judge_targetYesLLM target for evaluation
targets:
- name: vscode_insiders
provider: vscode-insiders
workspace_template: ${{ WORKSPACE_PATH }}
judge_target: azure_base

Same configuration as VS Code.

Evaluate any command-line agent:

targets:
- name: local_agent
provider: cli
command_template: 'python agent.py --prompt {PROMPT}'
judge_target: azure_base
FieldRequiredDescription
command_templateYesCommand to run. {PROMPT} is replaced with the input.
judge_targetYesLLM target for evaluation

For testing the evaluation harness without calling real providers:

targets:
- name: mock_target
provider: mock