Skip to content

Split bypass prerequisites#1468

Open
Separius wants to merge 4 commits into
mainfrom
ssameni/puzzletron-bypass-1-prereqs
Open

Split bypass prerequisites#1468
Separius wants to merge 4 commits into
mainfrom
ssameni/puzzletron-bypass-1-prereqs

Conversation

@Separius
Copy link
Copy Markdown
Contributor

@Separius Separius commented May 12, 2026

Summary

This is PR 1 of 3 in the Puzzletron bypass/local-distillation stack.

This PR contains prerequisite infrastructure only. It does not wire bypass distillation into the Puzzletron pipeline yet.

Stack:

  1. This PR: shared prerequisites
  2. ssameni/puzzletron-bypass-2-core: bypass distillation core
  3. ssameni/puzzletron-bypass-3-integration: Puzzletron integration, configs, docs, GPU coverage

What Changed

  • Added ModelDescriptor.pruning_mixins() so model families can expose pruning mixins needed by downstream bypass initialization.
  • Added KV-head pruning mixin support for GPT-OSS, Nemotron-H, Nemotron-H-v2, and Qwen3-VL descriptors.
  • Improved pruning utilities for nested language-model configs and missing attention bias config fields.
  • Added create_train_dataloader() and streaming-safe shuffle handling.
  • Added chat-template fallback for base models without tokenizer.chat_template.
  • Added Sewing Kit loss/helper exports needed by the later bypass core.
  • Updated child-state initialization to support composing multiple pruning mixins.
  • Updated warmup-step resolver to account for gradient accumulation.

Why

The bypass distillation MR needs these reusable pieces, but they are independently reviewable and useful without adding the bypass
training stage itself.

Splitting them out keeps the bypass core PR focused on the actual local-distillation engine.

Tests

Added focused unit coverage for:

  • Dataloader behavior
  • Bypass loss helpers
  • KV-head pruning utilities
  • Sewing Kit activity/input/function/needle behavior

Summary by CodeRabbit

  • New Features

    • KV-head pruning support added for multiple model families.
    • New training dataloader factory for infinite, block-sized training streams.
    • Normalized MSE loss utilities: vectorwise and batched variants.
  • Improvements

    • Loss display now shows Δ-from-initial and new visual indicators.
    • Chat-sample preprocessing handles tokenizers without chat templates.
    • More robust attention/head-dimension and bias handling; pruning mixin extensibility.
  • Tests

    • Extensive unit tests added covering dataloaders, losses, pruning, and sewing-kit utilities.

Review Change Stack

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 12, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 12, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6aa221b0-e463-4d98-897b-d6eba03a3f2f

📥 Commits

Reviewing files that changed from the base of the PR and between 12086fb and b9c00ba.

📒 Files selected for processing (2)
  • modelopt/torch/puzzletron/sewing_kit/utils.py
  • modelopt/torch/puzzletron/utils/data/dataloaders.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • modelopt/torch/puzzletron/utils/data/dataloaders.py
  • modelopt/torch/puzzletron/sewing_kit/utils.py

📝 Walkthrough

Walkthrough

This PR extends the pruning framework with KV-heads support across multiple model descriptors, adds LM-config helpers and sequential multi-mixin application, introduces normalized MSE loss utilities, adds a training dataloader factory with tokenizer-aware chat preprocessing, updates stitched-loss formatting and warmup resolver behavior, and adds comprehensive unit tests.

Changes

Pruning and model descriptor enhancements

Layer / File(s) Summary
Base pruning mixin interface and language-model config utilities
modelopt/torch/puzzletron/anymodel/model_descriptor/base.py, modelopt/torch/puzzletron/pruning/pruning_utils.py
Adds ModelDescriptor.pruning_mixins() extension point; introduces _lm_attrs() and _lm_head_dim() to extract language-model sub-configs and head_dim for VL configs; updates _init_attention_weights()/_init_attention_biases()to use LM metadata with robust bias-key probing; addsMlpInitMode.MoEChannelPruning`.
KV-heads pruning across model descriptors
modelopt/torch/puzzletron/pruning/kv_heads_pruning_mixin.py, modelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_model_descriptor.py, modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py, modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py, modelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_model_descriptor.py
KVHeadsPruningMixIn derives head size via _lm_head_dim(); GPT-OSS, NemotronH, NemotronHV2, and Qwen3VL descriptors register kv_heads pruning mixins and export model-specific KVHeadsLayerDescriptor dataclasses; expert-removal mixins registered (including legacy alias where present).
Sequential mixin composition and config override
modelopt/torch/puzzletron/tools/bypassed_training/child_init.py
_process_single_layer() supports lists of pruning mixins applied sequentially, threading interim parent/new state and per-layer key views, merging per-mixin layer outputs and aggregating keys_to_remove; update_model_config.override() treats None as leave-unchanged.

Training infrastructure and loss utilities

Layer / File(s) Summary
Normalized MSE loss functions
modelopt/torch/puzzletron/sewing_kit/utils.py, tests/unit/torch/puzzletron/test_bypass_losses.py
Re-exports normalized_mse_loss; adds vectorwise_normalized_mse_loss() and batched_normalized_mse_loss() with batch-dim validation, epsilon-stabilized relative-L2 normalization, and mean-per-batch aggregation; tests cover identity, randomness, reduction modes, scale invariance, zero-target finiteness, and error cases.
Training dataloader factory
modelopt/torch/puzzletron/utils/data/dataloaders.py, modelopt/torch/puzzletron/utils/data/dataset.py, tests/unit/torch/puzzletron/test_bypass_dataloaders.py
create_train_dataloader() builds an infinite DataLoader backed by ConstantLengthDataset, rejects num_workers>0, supports streaming vs map shuffle, and wraps training split; ConstantLengthDataset.__iter__ uses tokenizer.apply_chat_template() when available or falls back to normalized newline-joined message content; tests validate materialization, padding, collation, Printer contract, loader delegation, and validation split auto-selection.
Configuration formatting and warmup computation
modelopt/torch/puzzletron/tools/hydra_utils.py, modelopt/torch/puzzletron/utils/parsing.py
warmup_steps() now requires grad_accum and validates inputs; _warmup_steps_resolver() enforces 5 Hydra resolver args; format_stitched_losses() accepts initial_values_dict and not_trainable_names, renders "Δ from initial", filters stats to finite values, and appends skipped count; formatters updated with emoji/bullet-style rendering.

Sewing kit infrastructure

Layer / File(s) Summary
Sewing kit module exports and comprehensive tests
modelopt/torch/puzzletron/sewing_kit/passage.py, tests/unit/torch/puzzletron/*
always_true_predicate exported from passage.py; extensive tests added for ActivityContext (stack semantics), Needle graph/validation, FunctionTarget kwargs-only dispatch, InputArgs behavior, pruning mixin composition and key-tracking, KV-head helper, hydra warmup validation, dataloader behavior, and loss formatting/utilities.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.41% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Split bypass prerequisites' is directly related to the main changeset objective: this PR contains prerequisite infrastructure for Puzzletron bypass/local-distillation support, specifically split from a larger feature into a dedicated prerequisites PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed All security anti-pattern checks passed. No new dangerous patterns (torch.load, numpy.load, hardcoded trust_remote_code, eval/exec, nosec comments, new dependencies) introduced in PR changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ssameni/puzzletron-bypass-1-prereqs

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 12, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1468/

Built to branch gh-pages at 2026-05-13 08:26 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@Separius Separius force-pushed the ssameni/puzzletron-bypass-1-prereqs branch from 566cb1d to 0639883 Compare May 12, 2026 10:51
@Separius
Copy link
Copy Markdown
Contributor Author

/claude review

@claude
Copy link
Copy Markdown

claude Bot commented May 12, 2026

Claude review — summary

Findings: CRITICAL: 1 · IMPORTANT: 2 · SUGGESTION: 2

Most impactful

  • CRITICAL: tests/unit/torch/puzzletron/test_bypass_losses.py::test_format_stitched_losses_keeps_trainable_nan_visible calls format_stitched_losses(...) with initial_values_dict= and not_trainable_names= kwargs that don't exist in the function's current signature (and asserts on output strings like "Skipped=1" / "non-finite" that the implementation never produces). This test will hard-fail at collection/call time (TypeError). Either bring the format_stitched_losses update forward into this PR or defer this single test to the bypass-core PR.
  • IMPORTANT: The multi-mixin composition in child_init.py:_process_single_layer uses last-writer-wins semantics via dict.update, despite the comment claiming ordering can't corrupt the state dict. Two mixins that ever touch the same key will silently clobber each other. Either tighten the comment or add an overlap assertion.
  • IMPORTANT: override(item, None) in child_init.py:update_model_config now returns item instead of None. This is a sensible fix if None means "no override," but it's a behavior change — any caller that deliberately cleared a field with None now keeps the old value. Worth verifying no internal recipes/configs depended on the old semantics.

Risk level

Moderate. The bulk of the PR is cleanly scoped prerequisite plumbing (descriptor mixins, dataloader, chat-template fallback, warmup-step grad-accum handling, re-exports) with good test coverage for the pure-function helpers. The blocker is the one test that presupposes function-signature changes shipping in the follow-up PR — that needs to be resolved before merge. The mixin-composition and override-None semantics deserve a second look but aren't blockers.

Comment thread modelopt/torch/puzzletron/tools/bypassed_training/child_init.py Outdated
Comment thread modelopt/torch/puzzletron/tools/bypassed_training/child_init.py Outdated
Comment thread modelopt/torch/puzzletron/tools/hydra_utils.py Outdated
Comment thread modelopt/torch/puzzletron/pruning/pruning_utils.py
@Separius Separius force-pushed the ssameni/puzzletron-bypass-1-prereqs branch from 0639883 to a79fbae Compare May 12, 2026 11:19
@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

❌ Patch coverage is 69.72477% with 66 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.85%. Comparing base (62401e1) to head (bb4217c).

Files with missing lines Patch % Lines
modelopt/torch/puzzletron/pruning/pruning_utils.py 41.37% 17 Missing ⚠️
modelopt/torch/puzzletron/utils/parsing.py 64.10% 14 Missing ⚠️
...h/puzzletron/tools/bypassed_training/child_init.py 77.55% 11 Missing ⚠️
...odelopt/torch/puzzletron/utils/data/dataloaders.py 23.07% 10 Missing ⚠️
modelopt/torch/puzzletron/tools/hydra_utils.py 76.19% 5 Missing ⚠️
modelopt/torch/puzzletron/utils/data/dataset.py 70.00% 3 Missing ⚠️
...nymodel/models/gpt_oss/gpt_oss_model_descriptor.py 77.77% 2 Missing ⚠️
...torch/puzzletron/anymodel/model_descriptor/base.py 66.66% 1 Missing ⚠️
...model/models/qwen3_vl/qwen3_vl_model_descriptor.py 83.33% 1 Missing ⚠️
...torch/puzzletron/pruning/kv_heads_pruning_mixin.py 50.00% 1 Missing ⚠️
... and 1 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1468      +/-   ##
==========================================
+ Coverage   76.78%   76.85%   +0.06%     
==========================================
  Files         473      478       +5     
  Lines       51413    51906     +493     
==========================================
+ Hits        39476    39890     +414     
- Misses      11937    12016      +79     
Flag Coverage Δ
unit 53.38% <69.72%> (+0.84%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Separius Separius marked this pull request as ready for review May 12, 2026 11:32
@Separius Separius requested a review from a team as a code owner May 12, 2026 11:32
@Separius
Copy link
Copy Markdown
Contributor Author

@AAnoosheh and @kevalmorabia97 ready for review (split the bypass MR into 3, this is the first one, nothing too important, just some preparations and tiny fixes)

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
tests/unit/torch/puzzletron/test_bypass_dataloaders.py (1)

206-219: ⚡ Quick win

Add a direct test for ConstantLengthDataset chat-template fallback

This fixture replaces ConstantLengthDataset, so the new no-chat_template preprocessing path in ConstantLengthDataset.__iter__ is not exercised. A small targeted iterator test would close that regression gap.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/torch/puzzletron/test_bypass_dataloaders.py` around lines 206 -
219, The fixture patches out ConstantLengthDataset so
ConstantLengthDataset.__iter__'s new no-chat_template fallback isn't tested; add
a small unit test that imports the real ConstantLengthDataset (not
_FakeConstantLengthDataset), constructs it with a tiny dataset whose items lack
"chat_template", iterates it (e.g., list(ConstantLengthDataset(...)) or calling
its __iter__), and asserts the output matches the expected realized items (e.g.,
tensors like {"input_ids": torch.tensor([0])}); ensure this test does not apply
the patched_dataloader monkeypatch and references ConstantLengthDataset and
ConstantLengthDataset.__iter__ (and optionally create_validation_dataloader) so
the fallback path is exercised.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@modelopt/torch/puzzletron/sewing_kit/utils.py`:
- Around line 479-495: The function batched_normalized_mse_loss allows silent
broadcasting when input and target shapes differ; add explicit shape validation
at the top of the function: verify input.ndim == target.ndim, confirm batch_dims
are valid indices, and ensure sizes match for every dimension (both batch dims
and non-batch dims computed via norm_dims) so that target and input are exactly
compatible; if any mismatch, raise a ValueError with a clear message that
includes the shapes of input and target and the resolved batch_dims/norm_dims to
aid debugging.

In `@modelopt/torch/puzzletron/tools/bypassed_training/child_init.py`:
- Around line 93-95: The per-layer loop currently does full copies via
current_parent_state_dict = dict(parent_state_dict), current_new_state_dict =
dict(new_state_dict), current_keys = dict(keys) which is expensive; instead,
stop cloning entire mappings inside the loop and operate on the original dicts
(parent_state_dict, new_state_dict, keys) by reading values directly and only
materialize copies for individual tensors/entries that are actually modified
(e.g., when applying a mixin to a specific key). Locate the per-layer mixin loop
and replace the dict() copies with references to the originals, and when you
need to mutate a specific parameter, copy only that parameter (or its key->value
pair) and write back to new_state_dict; ensure any iteration over keys uses an
iterator or list(keys) outside the hot loop if necessary to avoid mutation
races.

In `@modelopt/torch/puzzletron/tools/hydra_utils.py`:
- Around line 35-50: The warmup_steps function must validate and normalize
inputs before doing integer divisions: ensure tokens, block, mbs and grad_accum
are ints (or cast) and that block>0, mbs>0, grad_accum>=1, and that pct is a
float within [0.0,1.0] (or at least >=0); raise ValueError with clear messages
for invalid values. In function warmup_steps, coerce tokens, block, mbs,
grad_accum and pct to the expected types up front, check block and mbs are >0 to
avoid ZeroDivisionError, check grad_accum>=1 (existing check can be reused), and
validate pct (and tokens>=0) before computing iters/steps and returning the
rounded warmup steps.

In `@modelopt/torch/puzzletron/utils/data/dataset.py`:
- Around line 131-138: The fallback that concatenates messages when
getattr(self.tokenizer, "chat_template", None) is None assumes every
m["content"] is a str and can raise TypeError for structured payloads; update
the else branch in dataset.py where sample is built to normalize each
m["content"] to a string before joining (e.g., if m["content"] is a dict or
other structured object, extract a text field if present like
m["content"].get("text") or otherwise call str(m["content"])), so the
concatenation in the no-template path (the code around tokenizer.chat_template
and tokenizer.apply_chat_template) always receives plain text.

In `@tests/unit/torch/puzzletron/test_sewing_kit_function_target_kwargs.py`:
- Around line 137-139: The test currently checks values in received["kwargs"]
but doesn't ensure no extra kwargs are present; update the second-order test in
test_sewing_kit_function_target_kwargs (use the local variables received,
student_value, teacher_value) to assert that received["kwargs"] contains exactly
the keys "input" and "target" (e.g., compare set(received["kwargs"].keys()) to
{"input","target"}) before the existing torch.equal assertions, then keep the
existing checks for received["args"] and the tensor equality against
student_value and teacher_value.

---

Nitpick comments:
In `@tests/unit/torch/puzzletron/test_bypass_dataloaders.py`:
- Around line 206-219: The fixture patches out ConstantLengthDataset so
ConstantLengthDataset.__iter__'s new no-chat_template fallback isn't tested; add
a small unit test that imports the real ConstantLengthDataset (not
_FakeConstantLengthDataset), constructs it with a tiny dataset whose items lack
"chat_template", iterates it (e.g., list(ConstantLengthDataset(...)) or calling
its __iter__), and asserts the output matches the expected realized items (e.g.,
tensors like {"input_ids": torch.tensor([0])}); ensure this test does not apply
the patched_dataloader monkeypatch and references ConstantLengthDataset and
ConstantLengthDataset.__iter__ (and optionally create_validation_dataloader) so
the fallback path is exercised.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ddff5f0a-3633-4520-914f-dad472197cf8

📥 Commits

Reviewing files that changed from the base of the PR and between 7a11fb2 and a79fbae.

📒 Files selected for processing (22)
  • modelopt/torch/puzzletron/anymodel/model_descriptor/base.py
  • modelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_model_descriptor.py
  • modelopt/torch/puzzletron/pruning/kv_heads_pruning_mixin.py
  • modelopt/torch/puzzletron/pruning/pruning_utils.py
  • modelopt/torch/puzzletron/sewing_kit/passage.py
  • modelopt/torch/puzzletron/sewing_kit/utils.py
  • modelopt/torch/puzzletron/tools/bypassed_training/child_init.py
  • modelopt/torch/puzzletron/tools/hydra_utils.py
  • modelopt/torch/puzzletron/utils/data/dataloaders.py
  • modelopt/torch/puzzletron/utils/data/dataset.py
  • modelopt/torch/puzzletron/utils/parsing.py
  • tests/unit/torch/puzzletron/test_bypass_dataloaders.py
  • tests/unit/torch/puzzletron/test_bypass_losses.py
  • tests/unit/torch/puzzletron/test_child_init_mixins.py
  • tests/unit/torch/puzzletron/test_kv_heads_pruning_utils.py
  • tests/unit/torch/puzzletron/test_sewing_kit_activity_context.py
  • tests/unit/torch/puzzletron/test_sewing_kit_function_target_kwargs.py
  • tests/unit/torch/puzzletron/test_sewing_kit_input_args.py
  • tests/unit/torch/puzzletron/test_sewing_kit_needle.py

Comment thread modelopt/torch/puzzletron/sewing_kit/utils.py
Comment thread modelopt/torch/puzzletron/tools/bypassed_training/child_init.py Outdated
Comment thread modelopt/torch/puzzletron/tools/hydra_utils.py
Comment thread modelopt/torch/puzzletron/utils/data/dataset.py Outdated
Separius added 2 commits May 12, 2026 16:09
Signed-off-by: Sepehr Sameni <ssameni@nvidia.com>
Signed-off-by: Sepehr Sameni <ssameni@nvidia.com>
@Separius Separius force-pushed the ssameni/puzzletron-bypass-1-prereqs branch from a79fbae to 12086fb Compare May 12, 2026 14:11
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@modelopt/torch/puzzletron/sewing_kit/utils.py`:
- Around line 540-542: Validate that epsilon is strictly positive before
computing den; in the function that computes num = ((input - target) **
2).sum(dim=norm_dims) and den = (target**2).sum(dim=norm_dims) + epsilon, add a
guard at the start (before the denominator math) that either raises a ValueError
with a clear message if epsilon <= 0, or clamps epsilon to a small positive
floor (e.g., max(epsilon, 1e-12)); ensure the check references the epsilon
variable and occurs before computing den to prevent any inf/nan from division.

In `@modelopt/torch/puzzletron/utils/data/dataloaders.py`:
- Around line 113-121: The shuffle call for map-style datasets currently
hardcodes keep_in_memory=True and ignores the function argument; update the
branch that handles non-IterableDataset so that it passes the caller's
keep_in_memory parameter (the function arg named keep_in_memory) into
train_data.shuffle(seed=shuffle_seed, keep_in_memory=keep_in_memory) while
leaving IterableDataset.shuffle(seed=shuffle_seed) unchanged; reference the
symbols train_data, datasets.IterableDataset, shuffle_seed, and keep_in_memory
to locate and modify the code.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5e8b4997-90ef-408e-b03d-7bb26b85189d

📥 Commits

Reviewing files that changed from the base of the PR and between a79fbae and 12086fb.

📒 Files selected for processing (23)
  • modelopt/torch/puzzletron/anymodel/model_descriptor/base.py
  • modelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_model_descriptor.py
  • modelopt/torch/puzzletron/pruning/kv_heads_pruning_mixin.py
  • modelopt/torch/puzzletron/pruning/pruning_utils.py
  • modelopt/torch/puzzletron/sewing_kit/passage.py
  • modelopt/torch/puzzletron/sewing_kit/utils.py
  • modelopt/torch/puzzletron/tools/bypassed_training/child_init.py
  • modelopt/torch/puzzletron/tools/hydra_utils.py
  • modelopt/torch/puzzletron/utils/data/dataloaders.py
  • modelopt/torch/puzzletron/utils/data/dataset.py
  • modelopt/torch/puzzletron/utils/parsing.py
  • tests/unit/torch/puzzletron/test_bypass_dataloaders.py
  • tests/unit/torch/puzzletron/test_bypass_losses.py
  • tests/unit/torch/puzzletron/test_child_init_mixins.py
  • tests/unit/torch/puzzletron/test_hydra_utils.py
  • tests/unit/torch/puzzletron/test_kv_heads_pruning_utils.py
  • tests/unit/torch/puzzletron/test_sewing_kit_activity_context.py
  • tests/unit/torch/puzzletron/test_sewing_kit_function_target_kwargs.py
  • tests/unit/torch/puzzletron/test_sewing_kit_input_args.py
  • tests/unit/torch/puzzletron/test_sewing_kit_needle.py
✅ Files skipped from review due to trivial changes (3)
  • modelopt/torch/puzzletron/sewing_kit/passage.py
  • modelopt/torch/puzzletron/pruning/kv_heads_pruning_mixin.py
  • tests/unit/torch/puzzletron/test_sewing_kit_input_args.py
🚧 Files skipped from review as they are similar to previous changes (16)
  • modelopt/torch/puzzletron/utils/data/dataset.py
  • tests/unit/torch/puzzletron/test_sewing_kit_function_target_kwargs.py
  • modelopt/torch/puzzletron/anymodel/model_descriptor/base.py
  • modelopt/torch/puzzletron/tools/hydra_utils.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py
  • tests/unit/torch/puzzletron/test_child_init_mixins.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py
  • modelopt/torch/puzzletron/pruning/pruning_utils.py
  • tests/unit/torch/puzzletron/test_kv_heads_pruning_utils.py
  • tests/unit/torch/puzzletron/test_sewing_kit_activity_context.py
  • modelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_model_descriptor.py
  • tests/unit/torch/puzzletron/test_bypass_losses.py
  • modelopt/torch/puzzletron/utils/parsing.py
  • tests/unit/torch/puzzletron/test_bypass_dataloaders.py
  • tests/unit/torch/puzzletron/test_sewing_kit_needle.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_model_descriptor.py

Comment thread modelopt/torch/puzzletron/sewing_kit/utils.py
Comment thread modelopt/torch/puzzletron/utils/data/dataloaders.py Outdated
Signed-off-by: Sepehr Sameni <ssameni@nvidia.com>
@Separius
Copy link
Copy Markdown
Contributor Author

/claude review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant