Sync with internal with new mapping design#29
Draft
yy-code-nv wants to merge 7 commits into
Draft
Conversation
yy-code-nv
commented
Jun 9, 2026
Collaborator
- regression past
- inference correct
…stale excluded files - All absolute web URLs in safe (non-docstring) positions replaced with https://invalid_url via # COSMOS-RELEASE-REPLACE-NEXT directives injected into i4 source (83 files, 176 directives) - Stale files removed that were excluded from mapping but survived previous releases: multiview_dataloader.py, vlm/defaults/dataloader.py, nvlm_data_unify.py, nvlm_sample_loaders_and_part_filters.py - New files shipped: nemotron3densevl/nemotronvl processors, tokenizer evaluation metric/reconstruction_metrics, vfm video_preprocess, vlm/video_decoder_qwen, webdataset image augmentors - Zero dangling cosmos_framework module imports Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previous pass injected REPLACE-NEXT directives for f-string URLs like
f"s3://{bucket_name}/...", incorrectly rewriting runtime-constructed paths
to https://invalid_url. The injector now skips any URL match containing '{'.
- f-string template URLs (s3://{bucket}/..., s3://rundir/{self.name}) are
now preserved verbatim
- 119 real literal URLs remain scrubbed to https://invalid_url
- action_processing.py added to action subdir (newly imported by transforms.py
and omni_mot_model.py in updated i4 source)
- Zero dangling cosmos_framework module imports
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous release re-ran rewrite_imports.py without --license-internal/ --license-apache, so the license-swap and stamp-if-missing steps were no-ops and every released file lost its header. Re-ran the pipeline with both license files, restoring the canonical OpenMDW-1.1 header (and swapping the Apache-2.0 SPDX identifier on third-party-derived files). Verified with pre-commit: addlicense and spdx-openmdw both pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Picks up upstream change: build_processor now short-circuits to Qwen3VLProcessor when tokenizer_type is a local directory (e.g. a bundled Cosmos3-Nano/Super artifact), avoiding the redundant upstream Qwen3-VL fetch. Pipeline re-run clean: no dangling imports; addlicense and spdx-openmdw pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
INTERNAL gates nvidia-only resources (S3, etc.) and was inheriting TRAINING (True) as its default. Added a # COSMOS-RELEASE-REPLACE-NEXT directive in i4 source so the public release defaults COSMOS_INTERNAL to False. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Refine the REPLACE-NEXT URL injector: - Keep all storage URIs verbatim (s3://, gs://, az://, hf://); only http(s) web URLs are candidates for scrubbing. - Whitelist public reference domains so their links survive: github.com, github.io, pytorch.org, docs.nvidia.com, docs.python.org, arxiv.org, huggingface.co, apache.org, reddit.com. - Internal hosts (gitlab-master/confluence/urm.nvidia.com) and non-public endpoints still scrub to https://invalid_url. Restores previously over-scrubbed pytorch/github doc links and s3:// paths. addlicense and spdx-openmdw pass; no dangling imports. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.