feat(okr30): add EAGLE3 Claude Code skills for triage, validation, and new-model support by yeyu-nvidia · Pull Request #1429 · NVIDIA/Model-Optimizer

yeyu-nvidia · 2026-05-11T17:58:14Z

Summary

Adds four user-invocable Claude Code skills for the EAGLE3 offline pipeline (OKR 30 — Claude-assist experience).

/eagle3-triage: Diagnose a failed pipeline run. Failure tables for all 4 tasks covering vLLM server startup, hidden state dump (3 backends: TRT-LLM / HF / vLLM), training crashes, and benchmark failures. New-model-specific issue checklist (VLMs, MoE, SWA, custom tokenizers).
/eagle3-validate: Verify a completed run end-to-end. Artifact checks per task, AR threshold validation (≥ 2.1), structured validation report.
/eagle3-new-model: Guided workflow for adding a new model. Architecture lookup, GB200 GPU/TP calculation, dump backend selection, full YAML template with correct public-launcher script paths.
/eagle3-review-logs: Lightweight log reader. Finds sbatch_*.out files, reads all task logs, produces pass/fail summary with root causes and next steps.

Skills use public launcher paths (common/eagle3/, common/vllm/, etc.) and read sbatch_*.out files directly — no sandbox-specific tooling required.

Test plan

Run /eagle3-triage against a known-failed experiment and verify it identifies the root cause
Run /eagle3-validate against a passing experiment and verify AR check works
Run /eagle3-new-model to generate a config for a new model and verify the YAML is correct
Run /eagle3-review-logs and verify summary output matches actual log contents

Summary by CodeRabbit

Documentation
- Added comprehensive EAGLE3 guides: onboarding new models and generating pipeline configs, reviewing experiment logs with structured reports, an end-to-end validation workflow (user-invocable), and a triage/troubleshooting guide mapping common failure patterns to root causes and fixes.

…d new-model support Four user-invocable skills for the EAGLE3 offline pipeline: - eagle3-triage: diagnose failed pipeline runs step-by-step; failure tables for all 4 tasks (vLLM data synthesis, hidden state dump with 3 backends, training, benchmark); new-model-specific issue checklist - eagle3-validate: verify completed runs; artifact checks; AR threshold (>= 2.1); structured validation report with next-step guidance - eagle3-new-model: guided workflow for adding a new model; architecture lookup, GPU/TP calculation for GB200, backend selection, full YAML template with correct public-launcher script paths - eagle3-review-logs: lightweight log reader; finds sbatch .out files, reads all task logs, produces pass/fail summary with root causes Skills use public launcher paths (common/eagle3/, common/vllm/, etc.) and read sbatch .out files directly — no sandbox-specific tooling required. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Ye Yu <yeyu@nvidia.com>

coderabbitai · 2026-05-11T17:58:28Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 617332fe-14b9-4112-b77b-a2d36daecc72

📥 Commits

Reviewing files that changed from the base of the PR and between 0172417 and 7ccae94.

📒 Files selected for processing (2)

.claude/skills/eagle3-review-logs/SKILL.md
.claude/skills/eagle3-triage/SKILL.md

✅ Files skipped from review due to trivial changes (2)

.claude/skills/eagle3-review-logs/SKILL.md
.claude/skills/eagle3-triage/SKILL.md

📝 Walkthrough

Walkthrough

Adds four new Claude skill documents that guide creating model YAMLs, validating EAGLE3 pipeline runs, reviewing experiment logs, and triaging failures. All changes are documentation only; no code entities are modified.

Changes

EAGLE3 Pipeline Workflow Skills

Layer / File(s)	Summary
New Model Configuration Skill `.claude/skills/eagle3-new-model/SKILL.md`	Procedural guide to create `hf_offline_eagle3.yaml` for new models: extract architecture/serving properties, compute OCI‑HSG/GB200 GPU/node sizing, select hidden-state dump backend (vLLM/HF/TRT-LLM), author a 4-task pipeline (data synthesis, hidden-state dump, offline training, benchmark), add model-specific adjustments, and run a dry-run.
Pipeline Validation Skill `.claude/skills/eagle3-validate/SKILL.md`	End-to-end validation workflow: locate latest experiment, inspect task logs for success/timeout/failure, verify artifacts under `/scratchspace/`, extract benchmark acceptance-rate (AR) and compare to threshold (>= 2.1), check training quality cues, and emit a structured validation report with PASS/FAIL and next steps.
Log Review and Analysis Skill `.claude/skills/eagle3-review-logs/SKILL.md`	Systematic log review: locate `sbatch_*.out` Slurm logs, tail last 200 lines per task in parallel, analyze exit/cancellation signals, Python tracebacks, CUDA/Slurm failure modes, and success indicators; generate a structured markdown report with per-task diagnosis, suggested fixes, and a benign-patterns table.
Pipeline Failure Diagnosis Skill `.claude/skills/eagle3-triage/SKILL.md`	Comprehensive triage workflow: find failed experiments, fetch Slurm logs, map error patterns to root causes and concrete fixes per task, run new-model-specific checks (VLM/SWA/trust_remote_code/MoE sizing/tokenizer cache), provide re-run commands and skip flags, and instruct updating `tools/launcher/examples/EAGLE3_TRIAGE.md`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 6

✅ Passed checks (6 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main change: adding four EAGLE3 Claude Code skills (triage, validation, new-model support, and review-logs) for the OKR 30 initiative.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	PR adds only markdown documentation files in .claude/skills/. No Python code, no modelopt/examples changes, no dependencies modified. Security check not applicable.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch yeyu/eagle3-claude-skills

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.claude/skills/eagle3-new-model/SKILL.md:
- Around line 33-38: The fenced code block containing the sizing formulas (the
lines starting with "BF16 weight size  = total_params × 2 bytes" through "tp    
= min(gpus_needed, 4)") needs a fence language added (for example use ```text)
to satisfy MD040; update the opening fence to include that language so the
Markdown linter recognizes it as a plain-text code block.
- Around line 1-9: Add the missing frontmatter key user_invocable: true to the
skill metadata so the skill becomes callable; edit the SKILL.md frontmatter for
the eagle3-new-model skill and insert user_invocable: true (boolean) alongside
name/description so the YAML now includes user_invocable: true.
- Around line 33-45: The TP calculation is inconsistent: the formula "tp =
min(gpus_needed, 4)" contradicts table rows that set tp=4 even when GPUs needed
is 1 or 2; update either the formula or the table so they match. Locate the
block with the formulas (BF16 weight size, GPUs needed, nodes, tp) and either
change the tp formula to "tp = min(max(gpus_needed, 4), 4)" or more sensibly "tp
= min(4, gpus_needed) if gpus_needed >= 4 else gpus_needed" (or adjust each
example row in the table to set tp = gpus_needed for 1–3 GPUs), and ensure the
entries for models (8B dense, 70B dense, 685B MoE, 1T MoE) reflect the chosen
rule consistently.

In @.claude/skills/eagle3-review-logs/SKILL.md:
- Around line 54-79: The markdown in the SKILL.md "Output a structured markdown
report:" section is triggering markdownlint MD022/MD031 because headings (e.g.,
"### Summary", "### Task Results", "## Step 4 — Suggest next steps") and the
fenced code block (the ```bash snippet) are not surrounded by blank lines;
update the template so every heading has a blank line before and after it and
ensure the fenced code block has a blank line before and after the ```bash fence
to satisfy MD022/MD031 and eliminate the formatting warnings.
- Around line 34-40: The text says “Read the last 200 lines of each log in
parallel” but the shown for-loop is sequential; either remove the phrase “in
parallel” or replace the sequential for-loop block with a true parallel command.
Concretely, update the snippet that currently uses the for f in $(find ...); do
... tail -200 ... done to use the parallel xargs pipeline (find ... | sort |
xargs -I{} -P 8 sh -c 'echo "=== {} ==="; tail -200 "{}"; echo') so it actually
runs tails in parallel, or else change the prose to say “sequentially” and keep
the existing for-loop.

In @.claude/skills/eagle3-triage/SKILL.md:
- Around line 148-163: Add a blank line before and after each fenced code block
in the SKILL.md section containing the two uv run examples (the triple-backtick
command blocks for "To skip task_0..." and "To run only task_1...") so there is
an empty line surrounding each ```...``` fence to satisfy MD031.

In @.claude/skills/eagle3-validate/SKILL.md:
- Around line 59-61: The fenced code block containing "Average Acceptance Length
{'accept': X, 'count': Y, 'ratio': Z.ZZ}" needs a language tag to satisfy MD040;
update the three-backtick fence from ``` to ```text so the block is recognized
as plain text (leave the content unchanged) in SKILL.md.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5681877c-5f95-498c-bd50-02be8e857617

📥 Commits

Reviewing files that changed from the base of the PR and between d30ebbd and 6c580d8.

📒 Files selected for processing (4)

.claude/skills/eagle3-new-model/SKILL.md
.claude/skills/eagle3-review-logs/SKILL.md
.claude/skills/eagle3-triage/SKILL.md
.claude/skills/eagle3-validate/SKILL.md

coderabbitai · 2026-05-11T18:01:11Z

+---
+name: eagle3-new-model
+description: >
+  Add a new model to the EAGLE3 offline pipeline. Generates an hf_offline_eagle3.yaml
+  launcher config for a new model checkpoint, choosing the right hidden state dump
+  backend (TRT-LLM / HF / vLLM) and GPU configuration.
+  Use when user wants to run EAGLE3 on a model that does not yet have a YAML in
+  tools/launcher/examples/ or asks how to configure the pipeline for a new checkpoint.
+---


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Missing user_invocable: true in skill frontmatter

This skill is described as user-invocable in the PR objective, but the metadata omits the flag. Without it, the skill may not be callable directly.

Suggested fix

--- name: eagle3-new-model description: > Add a new model to the EAGLE3 offline pipeline. Generates an hf_offline_eagle3.yaml launcher config for a new model checkpoint, choosing the right hidden state dump backend (TRT-LLM / HF / vLLM) and GPU configuration. Use when user wants to run EAGLE3 on a model that does not yet have a YAML in tools/launcher/examples/ or asks how to configure the pipeline for a new checkpoint. +user_invocable: true ---

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

---

name: eagle3-new-model

description: >

Add a new model to the EAGLE3 offline pipeline. Generates an hf_offline_eagle3.yaml

launcher config for a new model checkpoint, choosing the right hidden state dump

backend (TRT-LLM / HF / vLLM) and GPU configuration.

Use when user wants to run EAGLE3 on a model that does not yet have a YAML in

tools/launcher/examples/ or asks how to configure the pipeline for a new checkpoint.

---

---

name: eagle3-new-model

description: >

Add a new model to the EAGLE3 offline pipeline. Generates an hf_offline_eagle3.yaml

launcher config for a new model checkpoint, choosing the right hidden state dump

backend (TRT-LLM / HF / vLLM) and GPU configuration.

Use when user wants to run EAGLE3 on a model that does not yet have a YAML in

tools/launcher/examples/ or asks how to configure the pipeline for a new checkpoint.

user_invocable: true

---

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.claude/skills/eagle3-new-model/SKILL.md around lines 1 - 9, Add the missing frontmatter key user_invocable: true to the skill metadata so the skill becomes callable; edit the SKILL.md frontmatter for the eagle3-new-model skill and insert user_invocable: true (boolean) alongside name/description so the YAML now includes user_invocable: true.

coderabbitai · 2026-05-11T18:01:11Z

+```
+BF16 weight size  = total_params × 2 bytes
+GPUs needed       = ceil(weight_size_GB / 192)
+nodes             = ceil(gpus_needed / 4)
+tp                = min(gpus_needed, 4)
+```
+
+| Model | Weights (BF16) | GPUs | nodes | tp |
+|---|---|---|---|---|
+| 8B dense | ~16 GB | 1 | 1 | 4 |
+| 70B dense | ~140 GB | 1 | 1 | 4 |
+| 685B MoE | ~340 GB | 2 | 1 | 4 |
+| 1T MoE | ~595 GB | 4 | 1 | 4 |


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

TP formula and example table are internally inconsistent

tp = min(gpus_needed, 4) conflicts with examples that set tp=4 when gpus_needed is 1 or 2. This can mislead users into incorrect sizing.

Suggested fix (align examples with formula)

| Model | Weights (BF16) | GPUs | nodes | tp | |---|---|---|---|---| -| 8B dense | ~16 GB | 1 | 1 | 4 | -| 70B dense | ~140 GB | 1 | 1 | 4 | -| 685B MoE | ~340 GB | 2 | 1 | 4 | +| 8B dense | ~16 GB | 1 | 1 | 1 | +| 70B dense | ~140 GB | 1 | 1 | 1 | +| 685B MoE | ~340 GB | 2 | 1 | 2 | | 1T MoE | ~595 GB | 4 | 1 | 4 |

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 33-33: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.claude/skills/eagle3-new-model/SKILL.md around lines 33 - 45, The TP calculation is inconsistent: the formula "tp = min(gpus_needed, 4)" contradicts table rows that set tp=4 even when GPUs needed is 1 or 2; update either the formula or the table so they match. Locate the block with the formulas (BF16 weight size, GPUs needed, nodes, tp) and either change the tp formula to "tp = min(max(gpus_needed, 4), 4)" or more sensibly "tp = min(4, gpus_needed) if gpus_needed >= 4 else gpus_needed" (or adjust each example row in the table to set tp = gpus_needed for 1–3 GPUs), and ensure the entries for models (8B dense, 70B dense, 685B MoE, 1T MoE) reflect the chosen rule consistently.

coderabbitai · 2026-05-11T18:01:11Z

+Read the last 200 lines of each log in parallel. Errors appear at the end:
+
+```bash
+for f in $(find experiments/<exp_id>/ -name "sbatch_*.out" | sort); do
+  echo "=== $f ==="; tail -200 "$f"; echo
+done
+```


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

“Read in parallel” does not match the provided command

The snippet is sequential, so the instruction is currently inaccurate. Either remove “in parallel” or update the command to actually parallelize log tails.

Suggested doc fix (true parallel read)

-Read the last 200 lines of each log in parallel. Errors appear at the end: +Read the last 200 lines of each log. Errors appear at the end: ```bash -for f in $(find experiments/<exp_id>/ -name "sbatch_*.out" | sort); do - echo "=== $f ==="; tail -200 "$f"; echo -done +find experiments/<exp_id>/ -name "sbatch_*.out" | sort | \ + xargs -I{} -P 8 sh -c 'echo "=== {} ==="; tail -200 "{}"; echo'

</details> <details> <summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.claude/skills/eagle3-review-logs/SKILL.md around lines 34 - 40, The text
says “Read the last 200 lines of each log in parallel” but the shown for-loop is
sequential; either remove the phrase “in parallel” or replace the sequential
for-loop block with a true parallel command. Concretely, update the snippet that
currently uses the for f in $(find ...); do ... tail -200 ... done to use the
parallel xargs pipeline (find ... | sort | xargs -I{} -P 8 sh -c 'echo "=== {}
==="; tail -200 "{}"; echo') so it actually runs tails in parallel, or else
change the prose to say “sequentially” and keep the existing for-loop.

</details>   

codecov · 2026-05-11T18:13:14Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.40%. Comparing base (555be6c) to head (7ccae94).
⚠️ Report is 11 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1429      +/-   ##
==========================================
- Coverage   76.74%   76.40%   -0.35%     
==========================================
  Files         476      478       +2     
  Lines       51307    52592    +1285     
==========================================
+ Hits        39377    40181     +804     
- Misses      11930    12411     +481

Flag	Coverage Δ
unit	`52.74% <ø> (+0.20%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Add `text` language specifiers to bare fenced code blocks: - eagle3-new-model/SKILL.md: GPU calculation formula block - eagle3-validate/SKILL.md: acceptance rate log output block Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Ye Yu <yeyu@nvidia.com>

Add blank lines before fenced code blocks as required by MD031: - eagle3-triage/SKILL.md: two re-run command blocks - eagle3-review-logs/SKILL.md: suggested fix block and section headers Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Ye Yu <yeyu@nvidia.com>

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

yeyu-nvidia and others added 2 commits May 11, 2026 11:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(okr30): add EAGLE3 Claude Code skills for triage, validation, and new-model support#1429

feat(okr30): add EAGLE3 Claude Code skills for triage, validation, and new-model support#1429
yeyu-nvidia wants to merge 3 commits into
mainfrom
yeyu/eagle3-claude-skills

yeyu-nvidia commented May 11, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 11, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 11, 2026

Uh oh!

Uh oh!

coderabbitai Bot May 11, 2026

Uh oh!

coderabbitai Bot May 11, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yeyu-nvidia commented May 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yeyu-nvidia commented May 11, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 11, 2026 •

edited

Loading

codecov Bot commented May 11, 2026 •

edited

Loading