Summary
Create persona-driven use case playbooks — short, practical guides that answer specific questions a developer has when they sit down to evaluate their agent or model.
Motivation
Existing tutorials are organized by evaluation scenario (model-direct, RAG, agent workflow). But developers think in terms of their situation: "I just built a Foundry agent, how do I test it?" or "I need to compare two models before choosing one." Playbooks bridge this gap by starting from the developer's context and pointing them to the right bundle, dataset shape, and run config.
Proposed Playbooks
Each playbook should be 1-2 pages, actionable, and link to the detailed tutorial:
-
"I just created a Foundry agent — how do I evaluate it?"
- Foundry agent + conversational or model quality bundle
- Minimal dataset, first eval run, interpret results
-
"My agent uses RAG — how do I verify groundedness?"
- RAG bundle + dataset with
context field
- Groundedness, relevance, retrieval evaluators
-
"I want to compare GPT-4o vs GPT-4o-mini for my use case"
- Model comparison workflow with
agentops eval compare
- Same dataset, two run configs, side-by-side report
-
"I need to ensure content safety before deploying"
- Content safety bundle (violence, sexual, self-harm, hate)
- Adversarial dataset patterns, threshold recommendations
-
"I have an HTTP agent (LangGraph / LangChain / ACA)"
- HTTP backend setup,
request_field / response_field mapping
- Tool calls extraction for agent-with-tools scenarios
-
"I want to gate PRs on evaluation quality"
agentops workflow generate, exit code contract
- Threshold strategy,
AZURE_AI_FOUNDRY_PROJECT_ENDPOINT as secret
Format
Each playbook follows the same structure:
- Situation — 1-sentence description of the developer's context
- What you need — prerequisites (agent deployed, model available, etc.)
- Steps — numbered, with exact CLI commands
- Expected output — what
results.json and report.md will show
- Next steps — links to detailed tutorials and related playbooks
Acceptance Criteria
Context
Summary
Create persona-driven use case playbooks — short, practical guides that answer specific questions a developer has when they sit down to evaluate their agent or model.
Motivation
Existing tutorials are organized by evaluation scenario (model-direct, RAG, agent workflow). But developers think in terms of their situation: "I just built a Foundry agent, how do I test it?" or "I need to compare two models before choosing one." Playbooks bridge this gap by starting from the developer's context and pointing them to the right bundle, dataset shape, and run config.
Proposed Playbooks
Each playbook should be 1-2 pages, actionable, and link to the detailed tutorial:
"I just created a Foundry agent — how do I evaluate it?"
"My agent uses RAG — how do I verify groundedness?"
contextfield"I want to compare GPT-4o vs GPT-4o-mini for my use case"
agentops eval compare"I need to ensure content safety before deploying"
"I have an HTTP agent (LangGraph / LangChain / ACA)"
request_field/response_fieldmapping"I want to gate PRs on evaluation quality"
agentops workflow generate, exit code contractAZURE_AI_FOUNDRY_PROJECT_ENDPOINTas secretFormat
Each playbook follows the same structure:
results.jsonandreport.mdwill showAcceptance Criteria
Context