Skip to content

Add green context support#1976

Open
leofang wants to merge 11 commits intoNVIDIA:mainfrom
leofang:leof/green-ctx-v1
Open

Add green context support#1976
leofang wants to merge 11 commits intoNVIDIA:mainfrom
leofang:leof/green-ctx-v1

Conversation

@leofang
Copy link
Copy Markdown
Member

@leofang leofang commented Apr 25, 2026

Close #1563. Close #112.

Summary

Add green context support to cuda.core — the explicit-model API for querying device resources, splitting SMs, creating green contexts, and using them without touching the thread-local context stack.

Design

See the companion design doc for full rationale. Key decisions:

  • Unified Context type — no user-visible GreenContext subclass. A single Context wraps either a primary CUcontext or a CUgreenCtx + derived CUcontext. ctx.is_green distinguishes them. Inspired by the CUDA runtime's execution-context (EC) abstraction.
  • dev.resources namespaceDeviceResources groups hardware resource queries (dev.resources.sm, dev.resources.workqueue). Follows the existing "plural = namespace" pattern (dev.properties, kernel.attributes).
  • ctx.resources / stream.resources — same DeviceResources type, but queries the context's provisioned resources (cuCtxGetDevResource / cuGreenCtxGetDevResource) instead of the full device.
  • SMResourceOptions with SoA broadcasting — single dataclass for SMResource.split(). Scalar fields broadcast; count drives the group count. count=None means discovery mode (translated to smCount=0 internally).
  • Merged workqueue typesWorkqueueResource merges CU_DEV_RESOURCE_TYPE_WORKQUEUE_CONFIG and CU_DEV_RESOURCE_TYPE_WORKQUEUE under one user-facing class. Strings for option values (e.g. sharing_scope="green_ctx_balanced").
  • ContextOptions(resources=[...])dev.create_context() — resource descriptor generation and cuGreenCtxCreate are internal. The user passes pre-split resource objects.
  • Explicit modelctx.create_stream() creates streams bound to a green context without calling dev.set_current(). The C++ handle layer auto-dispatches between cuGreenCtxStreamCreate and cuStreamCreateWithPriority based on the context type. Green context streams must be non-blocking.
  • ctx.close() does not manage the context stack — closing a current context raises RuntimeError. dev.set_current(green_ctx) still works for backward compatibility but is not the recommended path.

New public API

  • Device.resourcesDeviceResources (namespace: .sm, .workqueue)
  • Context.resourcesDeviceResources (context-level query of provisioned resources)
  • Stream.resourcesDeviceResources (delegates to the stream's parent context)
  • Context.create_stream(options)Stream (green contexts only; raises on primary)
  • Context.is_greenbool
  • SMResource — properties: sm_count, min_partition_size, coscheduled_alignment, flags, handle; method: split(options, *, dry_run=False)
  • SMResourceOptionscount, coscheduled_sm_count, preferred_coscheduled_sm_count
  • WorkqueueResource — method: configure(options)
  • WorkqueueResourceOptionssharing_scope
  • ContextOptions.resources — accepts Sequence[SMResource | WorkqueueResource]

Implementation details

C++ handle layer (resource_handles.hpp/cpp):

  • GreenCtxHandle (shared_ptr<const CUgreenCtx>) — owning handle; destructor calls cuGreenCtxDestroy.
  • ContextBox gains a GreenCtxHandle field so the derived CUcontext keeps the green ctx alive. get_context_green_ctx() provides reverse lookup.
  • create_green_ctx_handle() combines cuDevResourceGenerateDesc + cuGreenCtxCreate in one call — the descriptor is transient (no DevResourceDescHandle needed since CUDA has no explicit destroy for it).
  • create_stream_handle() auto-dispatches: checks get_context_green_ctx() on the provided ContextHandle and calls cuGreenCtxStreamCreate for green contexts, cuStreamCreateWithPriority for primary. Returns CUDA_ERROR_NOT_SUPPORTED if the context is green but cuGreenCtxStreamCreate is unavailable (CUDA < 12.5).
  • context_registry / stream_registry (HandleRegistry) deduplicate handles by raw CUDA pointer, enabling identity-preserving set_current swaps.

Bug fix — stream context tracking:

  • StreamBox now carries a ContextHandle dependency, populated at creation time.
  • get_stream_context() returns it without a driver call.
  • Stream._from_handle and Stream_ensure_ctx prefer the registry-backed handle before falling back to cuStreamGetCtx. This fixes a latent issue where streams created in a green context would lose their context association after a set_current swap.

Error handling:

  • dev.create_context() without resources raises ValueError with a clear message.
  • Green context stream creation with nonblocking=False is caught by the driver (CUDA_ERROR_INVALID_VALUE) and re-raised as ValueError with a helpful message.
  • cuCtxGetStreamPriorityRange failure (CUDA_ERROR_INVALID_CONTEXT) raises "Call dev.set_current() before creating streams."

Version guards:

  • Compile-time: IF CUDA_CORE_BUILD_MAJOR >= 13 gates cuDevSmResourceSplit (the general/structured form).
  • Runtime: trinary cached checks (supported / unsupported / unchecked). cy_driver_version() >= (12, 4, 0) for all green ctx APIs; >= (13, 1, 0) for structured splits. Raises ValueError when unsupported.
  • CUDA 12.x fallback: cuDevSmResourceSplitByCount for basic (homogeneous) splits. Per-group coscheduled_sm_count and heterogeneous counts require 13.1+ and raise NotImplementedError on 12.x.
  • Green ctx function pointers loaded via _get_optional_driver_fn — graceful NULL when bindings lack the symbol.

Test coverage

33 tests in test_green_context.py, organized with proper pytest fixtures and classes:

  • Fixtures: sm_resource, wq_resource, green_ctx (with CUDAError → skip), fill_kernel
  • _use_green_ctx context manager for safe push/pop in set_current regression tests
  • TestSMResourceQuery — properties, arch constraints (pre-Hopper vs Hopper+)
  • TestWorkqueueResource — query, configure valid/invalid
  • TestSMResourceSplitValidation — scalar/Sequence mismatch, negative count, dry-run blocked
  • TestSMResourceSplit — single/two-group splits with arch-aligned counts, discovery mode, alignment, dry-run parity
  • TestGreenContextLifecycleis_green, create_stream on primary raises, blocking stream raises, explicit stream creation, stream/event context tracking, close-while-current guard, set_current regression
  • TestContextResources — green ctx SM resources are subset of device, two contexts have disjoint partitions, stream.resources matches ctx.resources (SM + workqueue)
  • TestGreenContextKernelLaunch — compile + launch + host-verify via ctx.create_stream(), two independent green contexts with different fill values, SM + workqueue combined

Validation

CUDA_HOME=... pip install -e . --no-build-isolation
python -m pytest tests/test_green_context.py -v                          # 32 passed, 1 skipped (arch)
python -m pytest tests/test_device.py tests/test_stream.py tests/test_event.py tests/test_context.py -v  # no regressions (257 total passed)

-- Leo's bot

@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot Bot commented Apr 25, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions Bot added the cuda.core Everything related to the cuda.core module label Apr 25, 2026
@leofang leofang changed the title Add cuda.core green context v1 API Add green context support Apr 25, 2026
@leofang leofang added P0 High priority - Must do! feature New feature or request labels Apr 25, 2026
@leofang leofang self-assigned this Apr 25, 2026
@leofang leofang added this to the cuda.core v1.0.0 milestone Apr 25, 2026
Restructure tests into fixtures + classes with full resource cleanup:
- Fixtures: sm_resource, wq_resource, green_ctx (with CUDAError skip),
  green_ctx_active (with try/finally restore), fill_kernel
- _use_green_ctx context manager for safe push/pop in all tests
- TestSMResourceQuery: properties, arch constraints per CC
- TestSMResourceSplit: single/two-group splits, discovery, alignment,
  dry-run vs real parity
- TestGreenContextKernelLaunch: compile + launch + verify in green ctx,
  two independent green contexts, SM + workqueue combined

All set_current calls are paired with restore in finally blocks to
prevent context stack leaks on test failure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@leofang
Copy link
Copy Markdown
Member Author

leofang commented Apr 25, 2026

/ok to test ac5c0fc

@github-actions
Copy link
Copy Markdown

@leofang leofang force-pushed the leof/green-ctx-v1 branch 4 times, most recently from 17c2be2 to 08d52d1 Compare April 27, 2026 02:56
- Convert ContextOptions and SMResourceOptions/WorkqueueResourceOptions
  to cdef dataclasses for check_or_create_options compatibility.
- Cache SM metadata in typed cdef fields; fall back to arch-based
  granularity on CUDA 12.x where CUdevSmResource lacks
  minSmPartitionSize/smCoscheduledAlignment.
- Simplify Context to hold only ContextHandle (remove _h_green_ctx
  and _is_green fields). Green ctx association lives in ContextBox;
  is_green queries get_context_green_ctx() on demand.
- ContextOptions.resources accepts Sequence only (no bare resource).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@leofang leofang force-pushed the leof/green-ctx-v1 branch from 08d52d1 to 3013fe8 Compare April 27, 2026 03:06
Switch from the push model (dev.set_current + dev.create_stream) to the
explicit model (ctx.create_stream + ctx.resources) as the primary way
to use green contexts.

Context.create_stream(options):
- Only supported on green contexts (raises on primary contexts).
- Delegates to Stream._init, which calls create_stream_handle in C++.
- C++ create_stream_handle auto-dispatches: checks get_context_green_ctx
  and calls cuGreenCtxStreamCreate for green contexts, or
  cuStreamCreateWithPriority for primary. Single function, no duplication.

Context.resources:
- Returns a DeviceResources namespace querying this context's resources
  (cuCtxGetDevResource / cuGreenCtxGetDevResource), not the full device.

dev.set_current(green_ctx) still works but is not the recommended path.

Tests rewritten to use the explicit model throughout. Push-model
set_current kept as regression tests with _use_green_ctx helper.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@leofang leofang force-pushed the leof/green-ctx-v1 branch 6 times, most recently from 62e4883 to 3287204 Compare April 27, 2026 04:21
- Let the driver validate the nonblocking flag for green context streams:
  cuGreenCtxStreamCreate rejects CU_STREAM_DEFAULT. On failure, check if
  the context is green + nonblocking is False and raise a clear ValueError.
- cuCtxGetStreamPriorityRange failure (CUDA_ERROR_INVALID_CONTEXT) now
  raises: "No current CUDA context. Call dev.set_current() before
  creating streams."
- C++ create_stream_handle returns CUDA_ERROR_NOT_SUPPORTED if the
  context is green but cuGreenCtxStreamCreate is unavailable (CUDA < 12.5),
  instead of falling through to cuStreamCreateWithPriority.
- ctx.resources.workqueue now dispatches to cuGreenCtxGetDevResource for
  green contexts, matching the SM query path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@leofang leofang force-pushed the leof/green-ctx-v1 branch from 3287204 to 2812c5b Compare April 27, 2026 04:23
@leofang
Copy link
Copy Markdown
Member Author

leofang commented Apr 27, 2026

/ok to test 2812c5b

Stream.resources delegates to DeviceResources._init_from_ctx via the
stream's tracked context handle, returning the same resource view as
ctx.resources for the stream's parent context.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@leofang leofang force-pushed the leof/green-ctx-v1 branch 3 times, most recently from d5a7297 to 5b3c610 Compare April 27, 2026 13:07
@leofang
Copy link
Copy Markdown
Member Author

leofang commented Apr 27, 2026

/ok to test 5b3c610

@leofang leofang force-pushed the leof/green-ctx-v1 branch 2 times, most recently from eebd4cf to 5989fd1 Compare April 27, 2026 13:53
- dev.create_context raises ValueError (not NotImplementedError) when
  options or resources are missing.
- Cache version checks (_check_green_ctx_support, _check_workqueue_support)
  at module level; raise ValueError instead of NotImplementedError.
- Simplify _device_resources.pyx: merge _as_uint and _count_to_sm_count
  into _to_sm_count; inline unsigned int casts for coscheduled params.
- Add green context classes to api.rst (Context, ContextOptions,
  DeviceResources, SMResource, SMResourceOptions, WorkqueueResource,
  WorkqueueResourceOptions).
- Update all docstrings to NumPy style with Attributes/Parameters/Returns
  sections matching the existing codebase convention.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@leofang leofang force-pushed the leof/green-ctx-v1 branch from 5989fd1 to fa254a5 Compare April 27, 2026 14:08
@leofang
Copy link
Copy Markdown
Member Author

leofang commented Apr 27, 2026

/ok to test fa254a5

@leofang leofang requested a review from Andy-Jost April 27, 2026 16:48
@leofang leofang marked this pull request as ready for review April 27, 2026 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module feature New feature or request P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GreenContext: Support allocating SMs [EPIC] Support green contexts

1 participant