Split bypass prerequisites by Separius · Pull Request #1468 · NVIDIA/Model-Optimizer

Separius · 2026-05-12T10:45:03Z

Summary

This is PR 1 of 3 in the Puzzletron bypass/local-distillation stack.

This PR contains prerequisite infrastructure only. It does not wire bypass distillation into the Puzzletron pipeline yet.

Stack:

This PR: shared prerequisites
ssameni/puzzletron-bypass-2-core: bypass distillation core
ssameni/puzzletron-bypass-3-integration: Puzzletron integration, configs, docs, GPU coverage

What Changed

Added ModelDescriptor.pruning_mixins() so model families can expose pruning mixins needed by downstream bypass initialization.
Added KV-head pruning mixin support for GPT-OSS, Nemotron-H, Nemotron-H-v2, and Qwen3-VL descriptors.
Improved pruning utilities for nested language-model configs and missing attention bias config fields.
Added create_train_dataloader() and streaming-safe shuffle handling.
Added chat-template fallback for base models without tokenizer.chat_template.
Added Sewing Kit loss/helper exports needed by the later bypass core.
Updated child-state initialization to support composing multiple pruning mixins.
Updated warmup-step resolver to account for gradient accumulation.

Why

The bypass distillation MR needs these reusable pieces, but they are independently reviewable and useful without adding the bypass
training stage itself.

Splitting them out keeps the bypass core PR focused on the actual local-distillation engine.

Tests

Added focused unit coverage for:

Dataloader behavior
Bypass loss helpers
KV-head pruning utilities
Sewing Kit activity/input/function/needle behavior

Summary by CodeRabbit

New Features
- KV-head pruning support added for multiple model families.
- New training dataloader factory for infinite, block-sized training streams.
- Normalized MSE loss utilities: vectorwise and batched variants.
Improvements
- Loss display now shows Δ-from-initial and new visual indicators.
- Chat-sample preprocessing handles tokenizers without chat templates.
- More robust attention/head-dimension and bias handling; pruning mixin extensibility.
Tests
- Extensive unit tests added covering dataloaders, losses, pruning, and sewing-kit utilities.

copy-pr-bot · 2026-05-12T10:45:07Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-05-12T10:45:10Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6aa221b0-e463-4d98-897b-d6eba03a3f2f

📥 Commits

Reviewing files that changed from the base of the PR and between 12086fb and b9c00ba.

📒 Files selected for processing (2)

modelopt/torch/puzzletron/sewing_kit/utils.py
modelopt/torch/puzzletron/utils/data/dataloaders.py

🚧 Files skipped from review as they are similar to previous changes (2)

modelopt/torch/puzzletron/utils/data/dataloaders.py
modelopt/torch/puzzletron/sewing_kit/utils.py

📝 Walkthrough

Walkthrough

This PR extends the pruning framework with KV-heads support across multiple model descriptors, adds LM-config helpers and sequential multi-mixin application, introduces normalized MSE loss utilities, adds a training dataloader factory with tokenizer-aware chat preprocessing, updates stitched-loss formatting and warmup resolver behavior, and adds comprehensive unit tests.

Changes

Pruning and model descriptor enhancements

Layer / File(s)	Summary
Base pruning mixin interface and language-model config utilities `modelopt/torch/puzzletron/anymodel/model_descriptor/base.py`, `modelopt/torch/puzzletron/pruning/pruning_utils.py`	Adds `ModelDescriptor.pruning_mixins()` extension point; introduces `_lm_attrs()` and `_lm_head_dim()` to extract language-model sub-configs and head_dim for VL configs; updates `_init_attention_weights()`/_init_attention_biases()`to use LM metadata with robust bias-key probing; adds`MlpInitMode.MoEChannelPruning`.
KV-heads pruning across model descriptors `modelopt/torch/puzzletron/pruning/kv_heads_pruning_mixin.py`, `modelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_model_descriptor.py`, `modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py`, `modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py`, `modelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_model_descriptor.py`	`KVHeadsPruningMixIn` derives head size via `_lm_head_dim()`; GPT-OSS, NemotronH, NemotronHV2, and Qwen3VL descriptors register `kv_heads` pruning mixins and export model-specific `KVHeadsLayerDescriptor` dataclasses; expert-removal mixins registered (including legacy alias where present).
Sequential mixin composition and config override `modelopt/torch/puzzletron/tools/bypassed_training/child_init.py`	`_process_single_layer()` supports lists of pruning mixins applied sequentially, threading interim parent/new state and per-layer key views, merging per-mixin layer outputs and aggregating `keys_to_remove`; `update_model_config.override()` treats `None` as leave-unchanged.

Training infrastructure and loss utilities

Layer / File(s)	Summary
Normalized MSE loss functions `modelopt/torch/puzzletron/sewing_kit/utils.py`, `tests/unit/torch/puzzletron/test_bypass_losses.py`	Re-exports `normalized_mse_loss`; adds `vectorwise_normalized_mse_loss()` and `batched_normalized_mse_loss()` with batch-dim validation, epsilon-stabilized relative-L2 normalization, and mean-per-batch aggregation; tests cover identity, randomness, reduction modes, scale invariance, zero-target finiteness, and error cases.
Training dataloader factory `modelopt/torch/puzzletron/utils/data/dataloaders.py`, `modelopt/torch/puzzletron/utils/data/dataset.py`, `tests/unit/torch/puzzletron/test_bypass_dataloaders.py`	`create_train_dataloader()` builds an infinite `DataLoader` backed by `ConstantLengthDataset`, rejects `num_workers>0`, supports streaming vs map shuffle, and wraps training split; `ConstantLengthDataset.__iter__` uses `tokenizer.apply_chat_template()` when available or falls back to normalized newline-joined message content; tests validate materialization, padding, collation, `Printer` contract, loader delegation, and validation split auto-selection.
Configuration formatting and warmup computation `modelopt/torch/puzzletron/tools/hydra_utils.py`, `modelopt/torch/puzzletron/utils/parsing.py`	`warmup_steps()` now requires `grad_accum` and validates inputs; `_warmup_steps_resolver()` enforces 5 Hydra resolver args; `format_stitched_losses()` accepts `initial_values_dict` and `not_trainable_names`, renders "Δ from initial", filters stats to finite values, and appends skipped count; formatters updated with emoji/bullet-style rendering.

Sewing kit infrastructure

Layer / File(s)	Summary
Sewing kit module exports and comprehensive tests `modelopt/torch/puzzletron/sewing_kit/passage.py`, `tests/unit/torch/puzzletron/*`	`always_true_predicate` exported from `passage.py`; extensive tests added for ActivityContext (stack semantics), Needle graph/validation, FunctionTarget kwargs-only dispatch, InputArgs behavior, pruning mixin composition and key-tracking, KV-head helper, hydra warmup validation, dataloader behavior, and loss formatting/utilities.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 37.41% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Split bypass prerequisites' is directly related to the main changeset objective: this PR contains prerequisite infrastructure for Puzzletron bypass/local-distillation support, specifically split from a larger feature into a dedicated prerequisites PR.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	All security anti-pattern checks passed. No new dangerous patterns (torch.load, numpy.load, hardcoded trust_remote_code, eval/exec, nosec comments, new dependencies) introduced in PR changes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch ssameni/puzzletron-bypass-1-prereqs

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-12T10:49:39Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1468/
Built to branch `gh-pages` at 2026-05-13 08:26 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Separius · 2026-05-12T10:54:34Z

/claude review

claude · 2026-05-12T11:01:08Z

Claude review — summary

Findings: CRITICAL: 1 · IMPORTANT: 2 · SUGGESTION: 2

Most impactful

CRITICAL: tests/unit/torch/puzzletron/test_bypass_losses.py::test_format_stitched_losses_keeps_trainable_nan_visible calls format_stitched_losses(...) with initial_values_dict= and not_trainable_names= kwargs that don't exist in the function's current signature (and asserts on output strings like "Skipped=1" / "non-finite" that the implementation never produces). This test will hard-fail at collection/call time (TypeError). Either bring the format_stitched_losses update forward into this PR or defer this single test to the bypass-core PR.
IMPORTANT: The multi-mixin composition in child_init.py:_process_single_layer uses last-writer-wins semantics via dict.update, despite the comment claiming ordering can't corrupt the state dict. Two mixins that ever touch the same key will silently clobber each other. Either tighten the comment or add an overlap assertion.
IMPORTANT: override(item, None) in child_init.py:update_model_config now returns item instead of None. This is a sensible fix if None means "no override," but it's a behavior change — any caller that deliberately cleared a field with None now keeps the old value. Worth verifying no internal recipes/configs depended on the old semantics.

Risk level

Moderate. The bulk of the PR is cleanly scoped prerequisite plumbing (descriptor mixins, dataloader, chat-template fallback, warmup-step grad-accum handling, re-exports) with good test coverage for the pure-function helpers. The blocker is the one test that presupposes function-signature changes shipping in the follow-up PR — that needs to be resolved before merge. The mixin-composition and override-None semantics deserve a second look but aren't blockers.

codecov · 2026-05-12T11:31:56Z

Codecov Report

❌ Patch coverage is 69.72477% with 66 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.85%. Comparing base (62401e1) to head (bb4217c).

Files with missing lines	Patch %	Lines
modelopt/torch/puzzletron/pruning/pruning_utils.py	41.37%	17 Missing ⚠️
modelopt/torch/puzzletron/utils/parsing.py	64.10%	14 Missing ⚠️
...h/puzzletron/tools/bypassed_training/child_init.py	77.55%	11 Missing ⚠️
...odelopt/torch/puzzletron/utils/data/dataloaders.py	23.07%	10 Missing ⚠️
modelopt/torch/puzzletron/tools/hydra_utils.py	76.19%	5 Missing ⚠️
modelopt/torch/puzzletron/utils/data/dataset.py	70.00%	3 Missing ⚠️
...nymodel/models/gpt_oss/gpt_oss_model_descriptor.py	77.77%	2 Missing ⚠️
...torch/puzzletron/anymodel/model_descriptor/base.py	66.66%	1 Missing ⚠️
...model/models/qwen3_vl/qwen3_vl_model_descriptor.py	83.33%	1 Missing ⚠️
...torch/puzzletron/pruning/kv_heads_pruning_mixin.py	50.00%	1 Missing ⚠️
... and 1 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1468      +/-   ##
==========================================
+ Coverage   76.78%   76.85%   +0.06%     
==========================================
  Files         473      478       +5     
  Lines       51413    51906     +493     
==========================================
+ Hits        39476    39890     +414     
- Misses      11937    12016      +79

Flag	Coverage Δ
unit	`53.38% <69.72%> (+0.84%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Separius · 2026-05-12T11:34:36Z

@AAnoosheh and @kevalmorabia97 ready for review (split the bypass MR into 3, this is the first one, nothing too important, just some preparations and tiny fixes)

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (1)

tests/unit/torch/puzzletron/test_bypass_dataloaders.py (1)

206-219: ⚡ Quick win

Add a direct test for ConstantLengthDataset chat-template fallback

This fixture replaces ConstantLengthDataset, so the new no-chat_template preprocessing path in ConstantLengthDataset.__iter__ is not exercised. A small targeted iterator test would close that regression gap.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/torch/puzzletron/test_bypass_dataloaders.py` around lines 206 -
219, The fixture patches out ConstantLengthDataset so
ConstantLengthDataset.__iter__'s new no-chat_template fallback isn't tested; add
a small unit test that imports the real ConstantLengthDataset (not
_FakeConstantLengthDataset), constructs it with a tiny dataset whose items lack
"chat_template", iterates it (e.g., list(ConstantLengthDataset(...)) or calling
its __iter__), and asserts the output matches the expected realized items (e.g.,
tensors like {"input_ids": torch.tensor([0])}); ensure this test does not apply
the patched_dataloader monkeypatch and references ConstantLengthDataset and
ConstantLengthDataset.__iter__ (and optionally create_validation_dataloader) so
the fallback path is exercised.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@modelopt/torch/puzzletron/sewing_kit/utils.py`:
- Around line 479-495: The function batched_normalized_mse_loss allows silent
broadcasting when input and target shapes differ; add explicit shape validation
at the top of the function: verify input.ndim == target.ndim, confirm batch_dims
are valid indices, and ensure sizes match for every dimension (both batch dims
and non-batch dims computed via norm_dims) so that target and input are exactly
compatible; if any mismatch, raise a ValueError with a clear message that
includes the shapes of input and target and the resolved batch_dims/norm_dims to
aid debugging.

In `@modelopt/torch/puzzletron/tools/bypassed_training/child_init.py`:
- Around line 93-95: The per-layer loop currently does full copies via
current_parent_state_dict = dict(parent_state_dict), current_new_state_dict =
dict(new_state_dict), current_keys = dict(keys) which is expensive; instead,
stop cloning entire mappings inside the loop and operate on the original dicts
(parent_state_dict, new_state_dict, keys) by reading values directly and only
materialize copies for individual tensors/entries that are actually modified
(e.g., when applying a mixin to a specific key). Locate the per-layer mixin loop
and replace the dict() copies with references to the originals, and when you
need to mutate a specific parameter, copy only that parameter (or its key->value
pair) and write back to new_state_dict; ensure any iteration over keys uses an
iterator or list(keys) outside the hot loop if necessary to avoid mutation
races.

In `@modelopt/torch/puzzletron/tools/hydra_utils.py`:
- Around line 35-50: The warmup_steps function must validate and normalize
inputs before doing integer divisions: ensure tokens, block, mbs and grad_accum
are ints (or cast) and that block>0, mbs>0, grad_accum>=1, and that pct is a
float within [0.0,1.0] (or at least >=0); raise ValueError with clear messages
for invalid values. In function warmup_steps, coerce tokens, block, mbs,
grad_accum and pct to the expected types up front, check block and mbs are >0 to
avoid ZeroDivisionError, check grad_accum>=1 (existing check can be reused), and
validate pct (and tokens>=0) before computing iters/steps and returning the
rounded warmup steps.

In `@modelopt/torch/puzzletron/utils/data/dataset.py`:
- Around line 131-138: The fallback that concatenates messages when
getattr(self.tokenizer, "chat_template", None) is None assumes every
m["content"] is a str and can raise TypeError for structured payloads; update
the else branch in dataset.py where sample is built to normalize each
m["content"] to a string before joining (e.g., if m["content"] is a dict or
other structured object, extract a text field if present like
m["content"].get("text") or otherwise call str(m["content"])), so the
concatenation in the no-template path (the code around tokenizer.chat_template
and tokenizer.apply_chat_template) always receives plain text.

In `@tests/unit/torch/puzzletron/test_sewing_kit_function_target_kwargs.py`:
- Around line 137-139: The test currently checks values in received["kwargs"]
but doesn't ensure no extra kwargs are present; update the second-order test in
test_sewing_kit_function_target_kwargs (use the local variables received,
student_value, teacher_value) to assert that received["kwargs"] contains exactly
the keys "input" and "target" (e.g., compare set(received["kwargs"].keys()) to
{"input","target"}) before the existing torch.equal assertions, then keep the
existing checks for received["args"] and the tensor equality against
student_value and teacher_value.

---

Nitpick comments:
In `@tests/unit/torch/puzzletron/test_bypass_dataloaders.py`:
- Around line 206-219: The fixture patches out ConstantLengthDataset so
ConstantLengthDataset.__iter__'s new no-chat_template fallback isn't tested; add
a small unit test that imports the real ConstantLengthDataset (not
_FakeConstantLengthDataset), constructs it with a tiny dataset whose items lack
"chat_template", iterates it (e.g., list(ConstantLengthDataset(...)) or calling
its __iter__), and asserts the output matches the expected realized items (e.g.,
tensors like {"input_ids": torch.tensor([0])}); ensure this test does not apply
the patched_dataloader monkeypatch and references ConstantLengthDataset and
ConstantLengthDataset.__iter__ (and optionally create_validation_dataloader) so
the fallback path is exercised.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ddff5f0a-3633-4520-914f-dad472197cf8

📥 Commits

Reviewing files that changed from the base of the PR and between 7a11fb2 and a79fbae.

📒 Files selected for processing (22)

modelopt/torch/puzzletron/anymodel/model_descriptor/base.py
modelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_model_descriptor.py
modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py
modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py
modelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_model_descriptor.py
modelopt/torch/puzzletron/pruning/kv_heads_pruning_mixin.py
modelopt/torch/puzzletron/pruning/pruning_utils.py
modelopt/torch/puzzletron/sewing_kit/passage.py
modelopt/torch/puzzletron/sewing_kit/utils.py
modelopt/torch/puzzletron/tools/bypassed_training/child_init.py
modelopt/torch/puzzletron/tools/hydra_utils.py
modelopt/torch/puzzletron/utils/data/dataloaders.py
modelopt/torch/puzzletron/utils/data/dataset.py
modelopt/torch/puzzletron/utils/parsing.py
tests/unit/torch/puzzletron/test_bypass_dataloaders.py
tests/unit/torch/puzzletron/test_bypass_losses.py
tests/unit/torch/puzzletron/test_child_init_mixins.py
tests/unit/torch/puzzletron/test_kv_heads_pruning_utils.py
tests/unit/torch/puzzletron/test_sewing_kit_activity_context.py
tests/unit/torch/puzzletron/test_sewing_kit_function_target_kwargs.py
tests/unit/torch/puzzletron/test_sewing_kit_input_args.py
tests/unit/torch/puzzletron/test_sewing_kit_needle.py

Signed-off-by: Sepehr Sameni <ssameni@nvidia.com>

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@modelopt/torch/puzzletron/sewing_kit/utils.py`:
- Around line 540-542: Validate that epsilon is strictly positive before
computing den; in the function that computes num = ((input - target) **
2).sum(dim=norm_dims) and den = (target**2).sum(dim=norm_dims) + epsilon, add a
guard at the start (before the denominator math) that either raises a ValueError
with a clear message if epsilon <= 0, or clamps epsilon to a small positive
floor (e.g., max(epsilon, 1e-12)); ensure the check references the epsilon
variable and occurs before computing den to prevent any inf/nan from division.

In `@modelopt/torch/puzzletron/utils/data/dataloaders.py`:
- Around line 113-121: The shuffle call for map-style datasets currently
hardcodes keep_in_memory=True and ignores the function argument; update the
branch that handles non-IterableDataset so that it passes the caller's
keep_in_memory parameter (the function arg named keep_in_memory) into
train_data.shuffle(seed=shuffle_seed, keep_in_memory=keep_in_memory) while
leaving IterableDataset.shuffle(seed=shuffle_seed) unchanged; reference the
symbols train_data, datasets.IterableDataset, shuffle_seed, and keep_in_memory
to locate and modify the code.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5e8b4997-90ef-408e-b03d-7bb26b85189d

📥 Commits

Reviewing files that changed from the base of the PR and between a79fbae and 12086fb.

📒 Files selected for processing (23)

modelopt/torch/puzzletron/anymodel/model_descriptor/base.py
modelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_model_descriptor.py
modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py
modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py
modelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_model_descriptor.py
modelopt/torch/puzzletron/pruning/kv_heads_pruning_mixin.py
modelopt/torch/puzzletron/pruning/pruning_utils.py
modelopt/torch/puzzletron/sewing_kit/passage.py
modelopt/torch/puzzletron/sewing_kit/utils.py
modelopt/torch/puzzletron/tools/bypassed_training/child_init.py
modelopt/torch/puzzletron/tools/hydra_utils.py
modelopt/torch/puzzletron/utils/data/dataloaders.py
modelopt/torch/puzzletron/utils/data/dataset.py
modelopt/torch/puzzletron/utils/parsing.py
tests/unit/torch/puzzletron/test_bypass_dataloaders.py
tests/unit/torch/puzzletron/test_bypass_losses.py
tests/unit/torch/puzzletron/test_child_init_mixins.py
tests/unit/torch/puzzletron/test_hydra_utils.py
tests/unit/torch/puzzletron/test_kv_heads_pruning_utils.py
tests/unit/torch/puzzletron/test_sewing_kit_activity_context.py
tests/unit/torch/puzzletron/test_sewing_kit_function_target_kwargs.py
tests/unit/torch/puzzletron/test_sewing_kit_input_args.py
tests/unit/torch/puzzletron/test_sewing_kit_needle.py

✅ Files skipped from review due to trivial changes (3)

modelopt/torch/puzzletron/sewing_kit/passage.py
modelopt/torch/puzzletron/pruning/kv_heads_pruning_mixin.py
tests/unit/torch/puzzletron/test_sewing_kit_input_args.py

🚧 Files skipped from review as they are similar to previous changes (16)

modelopt/torch/puzzletron/utils/data/dataset.py
tests/unit/torch/puzzletron/test_sewing_kit_function_target_kwargs.py
modelopt/torch/puzzletron/anymodel/model_descriptor/base.py
modelopt/torch/puzzletron/tools/hydra_utils.py
modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py
tests/unit/torch/puzzletron/test_child_init_mixins.py
modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py
modelopt/torch/puzzletron/pruning/pruning_utils.py
tests/unit/torch/puzzletron/test_kv_heads_pruning_utils.py
tests/unit/torch/puzzletron/test_sewing_kit_activity_context.py
modelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_model_descriptor.py
tests/unit/torch/puzzletron/test_bypass_losses.py
modelopt/torch/puzzletron/utils/parsing.py
tests/unit/torch/puzzletron/test_bypass_dataloaders.py
tests/unit/torch/puzzletron/test_sewing_kit_needle.py
modelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_model_descriptor.py

Signed-off-by: Sepehr Sameni <ssameni@nvidia.com>

Separius · 2026-05-13T07:04:27Z

/claude review

Separius mentioned this pull request May 12, 2026

Add bypass distillation (blockwise local KD) to puzzletron pipeline #1111

Closed

Separius force-pushed the ssameni/puzzletron-bypass-1-prereqs branch from 566cb1d to 0639883 Compare May 12, 2026 10:51

claude Bot reviewed May 12, 2026

View reviewed changes

Comment thread modelopt/torch/puzzletron/tools/bypassed_training/child_init.py Outdated

claude Bot reviewed May 12, 2026

View reviewed changes

Comment thread modelopt/torch/puzzletron/tools/bypassed_training/child_init.py Outdated

claude Bot reviewed May 12, 2026

View reviewed changes

Comment thread modelopt/torch/puzzletron/tools/hydra_utils.py Outdated

claude Bot reviewed May 12, 2026

View reviewed changes

Comment thread modelopt/torch/puzzletron/pruning/pruning_utils.py

Separius force-pushed the ssameni/puzzletron-bypass-1-prereqs branch from 0639883 to a79fbae Compare May 12, 2026 11:19

Separius marked this pull request as ready for review May 12, 2026 11:32

Separius requested a review from a team as a code owner May 12, 2026 11:32

Separius requested review from AAnoosheh and kevalmorabia97 May 12, 2026 11:32

coderabbitai Bot reviewed May 12, 2026

View reviewed changes

Separius added 2 commits May 12, 2026 16:09

Split bypass prerequisites

986c5fb

Signed-off-by: Sepehr Sameni <ssameni@nvidia.com>

Address CodeRabbit feedback for bypass integration

12086fb

Signed-off-by: Sepehr Sameni <ssameni@nvidia.com>

Separius force-pushed the ssameni/puzzletron-bypass-1-prereqs branch from a79fbae to 12086fb Compare May 12, 2026 14:11

coderabbitai Bot reviewed May 12, 2026

View reviewed changes

Comment thread modelopt/torch/puzzletron/sewing_kit/utils.py

Comment thread modelopt/torch/puzzletron/utils/data/dataloaders.py Outdated

Address additional MR1 review feedback

b9c00ba

Signed-off-by: Sepehr Sameni <ssameni@nvidia.com>

Merge branch 'main' into ssameni/puzzletron-bypass-1-prereqs

bb4217c

Conversation

Separius commented May 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What Changed

Why

Tests

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented May 12, 2026

Uh oh!

coderabbitai Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Built to branch gh-pages at 2026-05-13 08:26 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

Separius commented May 12, 2026

Uh oh!

claude Bot commented May 12, 2026

Claude review — summary

Most impactful

Risk level

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Separius commented May 12, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Separius commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Separius commented May 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 12, 2026 •

edited

Loading

github-actions Bot commented May 12, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-05-13 08:26 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

codecov Bot commented May 12, 2026 •

edited

Loading