AgentOps CLI for evaluation, observability, and operational workflows for Microsoft Foundry Agents and Models.
AgentOps Toolkit is a CLI built on Microsoft Foundry that standardizes evaluation and operational workflows for AI agents and models, helping teams run, monitor, and automate AgentOps processes.
The project enables:
- Consistent local and CI execution of agent evaluations
- Reusable evaluation policies through bundles
- Operational observability through tracing, monitoring, and run inspection
- Stable machine-readable outputs for automation
- Human-readable reports for PR reviews and quality gates
Operational capabilities include:
- Standardized evaluation workflows
- Run history and result inspection
- Tracing and observability
- Monitoring (dashboards and alerts)
- CI/CD automation
- Operational reporting and analysis
Core outputs:
results.json(machine-readable)report.md(human-readable)
Exit code contract:
0execution succeeded and all thresholds passed2execution succeeded but one or more thresholds failed1runtime or configuration error
python -m venv .venv
# activate your venv in the current shell
python -m pip install -U pip
python -m pip install agentops-toolkitagentops initThis creates .agentops/ with starter bundles, datasets, and run configs for common scenarios (model quality, RAG, agent workflow, content safety).
Set your Foundry project endpoint:
export AZURE_AI_FOUNDRY_PROJECT_ENDPOINT="https://<resource>.services.ai.azure.com/api/projects/<project>"Then edit .agentops/run.yaml to set your agent_id and model deployment name.
Authentication uses
DefaultAzureCredential— runaz loginlocally, or use service principal env vars in CI.
agentops eval runResults are written to .agentops/results/latest/:
results.json— machine-readable scoresreport.md— human-readable summary
To run a different scenario:
agentops eval run --config .agentops/run-rag.yamlTo regenerate the report from existing results:
agentops report generateSee Concepts for an overview of bundles, datasets, evaluators, backends, and the configuration model.
| Command | Description | Status |
|---|---|---|
agentops --version |
Show installed version | ✅ |
agentops init [--path DIR] |
Scaffold project workspace, starter files, and coding agent skills | ✅ |
agentops eval run [--config PATH] |
Evaluate a dataset against a bundle | ✅ |
agentops eval compare --runs ID1,ID2 |
Compare two past runs | ✅ |
agentops report generate [--in FILE] |
Regenerate report.md from results.json |
✅ |
agentops workflow generate |
Generate GitHub Actions workflow | ✅ |
agentops skills install [--platform <p>] |
Install coding agent skills (Copilot, Claude) | ✅ |
agentops run list|show |
List or inspect past runs | 🚧 |
agentops bundle list|show |
Browse bundle catalog | 🚧 |
agentops dataset validate|describe |
Dataset utilities | 🚧 |
agentops trace init |
Tracing setup | 🚧 |
agentops monitor setup|show|configure |
Monitoring operations | 🚧 |
Planned commands return a friendly message indicating they are not yet implemented.
- Concepts — bundles, datasets, evaluators, backends, configuration model
- How It Works — architecture, request flow, full schema reference
- Bundles — bundle authoring and evaluator configuration
- Model-direct evaluation
- Foundry agent evaluation
- RAG evaluation
- HTTP-deployed agent evaluation
- Conversational agent evaluation
- Agent workflow evaluation
- Baseline comparison
See CONTRIBUTING.md for architecture rules, testing expectations, and contribution workflow.
