Skip to content

Sync with internal with new mapping design#29

Draft
yy-code-nv wants to merge 7 commits into
NVIDIA:mainfrom
yy-code-nv:yangyangt/try_sync_with_internal
Draft

Sync with internal with new mapping design#29
yy-code-nv wants to merge 7 commits into
NVIDIA:mainfrom
yy-code-nv:yangyangt/try_sync_with_internal

Conversation

@yy-code-nv

Copy link
Copy Markdown
Collaborator
  • regression past
  • inference correct

yy-code-nv and others added 7 commits June 9, 2026 02:27
…stale excluded files

- All absolute web URLs in safe (non-docstring) positions replaced with
  https://invalid_url via # COSMOS-RELEASE-REPLACE-NEXT directives injected
  into i4 source (83 files, 176 directives)
- Stale files removed that were excluded from mapping but survived previous
  releases: multiview_dataloader.py, vlm/defaults/dataloader.py,
  nvlm_data_unify.py, nvlm_sample_loaders_and_part_filters.py
- New files shipped: nemotron3densevl/nemotronvl processors, tokenizer
  evaluation metric/reconstruction_metrics, vfm video_preprocess,
  vlm/video_decoder_qwen, webdataset image augmentors
- Zero dangling cosmos_framework module imports

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previous pass injected REPLACE-NEXT directives for f-string URLs like
f"s3://{bucket_name}/...", incorrectly rewriting runtime-constructed paths
to https://invalid_url. The injector now skips any URL match containing '{'.

- f-string template URLs (s3://{bucket}/..., s3://rundir/{self.name}) are
  now preserved verbatim
- 119 real literal URLs remain scrubbed to https://invalid_url
- action_processing.py added to action subdir (newly imported by transforms.py
  and omni_mot_model.py in updated i4 source)
- Zero dangling cosmos_framework module imports

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous release re-ran rewrite_imports.py without --license-internal/
--license-apache, so the license-swap and stamp-if-missing steps were no-ops
and every released file lost its header. Re-ran the pipeline with both license
files, restoring the canonical OpenMDW-1.1 header (and swapping the Apache-2.0
SPDX identifier on third-party-derived files).

Verified with pre-commit: addlicense and spdx-openmdw both pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Picks up upstream change: build_processor now short-circuits to
Qwen3VLProcessor when tokenizer_type is a local directory (e.g. a bundled
Cosmos3-Nano/Super artifact), avoiding the redundant upstream Qwen3-VL fetch.

Pipeline re-run clean: no dangling imports; addlicense and spdx-openmdw pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
INTERNAL gates nvidia-only resources (S3, etc.) and was inheriting TRAINING
(True) as its default. Added a # COSMOS-RELEASE-REPLACE-NEXT directive in
i4 source so the public release defaults COSMOS_INTERNAL to False.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Refine the REPLACE-NEXT URL injector:
- Keep all storage URIs verbatim (s3://, gs://, az://, hf://); only http(s)
  web URLs are candidates for scrubbing.
- Whitelist public reference domains so their links survive: github.com,
  github.io, pytorch.org, docs.nvidia.com, docs.python.org, arxiv.org,
  huggingface.co, apache.org, reddit.com.
- Internal hosts (gitlab-master/confluence/urm.nvidia.com) and non-public
  endpoints still scrub to https://invalid_url.

Restores previously over-scrubbed pytorch/github doc links and s3:// paths.
addlicense and spdx-openmdw pass; no dangling imports.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant