Add green context support#1976
Open
leofang wants to merge 11 commits intoNVIDIA:mainfrom
Open
Conversation
Contributor
Restructure tests into fixtures + classes with full resource cleanup: - Fixtures: sm_resource, wq_resource, green_ctx (with CUDAError skip), green_ctx_active (with try/finally restore), fill_kernel - _use_green_ctx context manager for safe push/pop in all tests - TestSMResourceQuery: properties, arch constraints per CC - TestSMResourceSplit: single/two-group splits, discovery, alignment, dry-run vs real parity - TestGreenContextKernelLaunch: compile + launch + verify in green ctx, two independent green contexts, SM + workqueue combined All set_current calls are paired with restore in finally blocks to prevent context stack leaks on test failure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Member
Author
|
/ok to test ac5c0fc |
|
17c2be2 to
08d52d1
Compare
- Convert ContextOptions and SMResourceOptions/WorkqueueResourceOptions to cdef dataclasses for check_or_create_options compatibility. - Cache SM metadata in typed cdef fields; fall back to arch-based granularity on CUDA 12.x where CUdevSmResource lacks minSmPartitionSize/smCoscheduledAlignment. - Simplify Context to hold only ContextHandle (remove _h_green_ctx and _is_green fields). Green ctx association lives in ContextBox; is_green queries get_context_green_ctx() on demand. - ContextOptions.resources accepts Sequence only (no bare resource). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
08d52d1 to
3013fe8
Compare
Switch from the push model (dev.set_current + dev.create_stream) to the explicit model (ctx.create_stream + ctx.resources) as the primary way to use green contexts. Context.create_stream(options): - Only supported on green contexts (raises on primary contexts). - Delegates to Stream._init, which calls create_stream_handle in C++. - C++ create_stream_handle auto-dispatches: checks get_context_green_ctx and calls cuGreenCtxStreamCreate for green contexts, or cuStreamCreateWithPriority for primary. Single function, no duplication. Context.resources: - Returns a DeviceResources namespace querying this context's resources (cuCtxGetDevResource / cuGreenCtxGetDevResource), not the full device. dev.set_current(green_ctx) still works but is not the recommended path. Tests rewritten to use the explicit model throughout. Push-model set_current kept as regression tests with _use_green_ctx helper. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
62e4883 to
3287204
Compare
- Let the driver validate the nonblocking flag for green context streams: cuGreenCtxStreamCreate rejects CU_STREAM_DEFAULT. On failure, check if the context is green + nonblocking is False and raise a clear ValueError. - cuCtxGetStreamPriorityRange failure (CUDA_ERROR_INVALID_CONTEXT) now raises: "No current CUDA context. Call dev.set_current() before creating streams." - C++ create_stream_handle returns CUDA_ERROR_NOT_SUPPORTED if the context is green but cuGreenCtxStreamCreate is unavailable (CUDA < 12.5), instead of falling through to cuStreamCreateWithPriority. - ctx.resources.workqueue now dispatches to cuGreenCtxGetDevResource for green contexts, matching the SM query path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3287204 to
2812c5b
Compare
Member
Author
|
/ok to test 2812c5b |
Stream.resources delegates to DeviceResources._init_from_ctx via the stream's tracked context handle, returning the same resource view as ctx.resources for the stream's parent context. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
d5a7297 to
5b3c610
Compare
Member
Author
|
/ok to test 5b3c610 |
eebd4cf to
5989fd1
Compare
- dev.create_context raises ValueError (not NotImplementedError) when options or resources are missing. - Cache version checks (_check_green_ctx_support, _check_workqueue_support) at module level; raise ValueError instead of NotImplementedError. - Simplify _device_resources.pyx: merge _as_uint and _count_to_sm_count into _to_sm_count; inline unsigned int casts for coscheduled params. - Add green context classes to api.rst (Context, ContextOptions, DeviceResources, SMResource, SMResourceOptions, WorkqueueResource, WorkqueueResourceOptions). - Update all docstrings to NumPy style with Attributes/Parameters/Returns sections matching the existing codebase convention. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5989fd1 to
fa254a5
Compare
Member
Author
|
/ok to test fa254a5 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Close #1563. Close #112.
Summary
Add green context support to cuda.core — the explicit-model API for querying device resources, splitting SMs, creating green contexts, and using them without touching the thread-local context stack.
Design
See the companion design doc for full rationale. Key decisions:
Contexttype — no user-visibleGreenContextsubclass. A singleContextwraps either a primaryCUcontextor aCUgreenCtx+ derivedCUcontext.ctx.is_greendistinguishes them. Inspired by the CUDA runtime's execution-context (EC) abstraction.dev.resourcesnamespace —DeviceResourcesgroups hardware resource queries (dev.resources.sm,dev.resources.workqueue). Follows the existing "plural = namespace" pattern (dev.properties,kernel.attributes).ctx.resources/stream.resources— sameDeviceResourcestype, but queries the context's provisioned resources (cuCtxGetDevResource/cuGreenCtxGetDevResource) instead of the full device.SMResourceOptionswith SoA broadcasting — single dataclass forSMResource.split(). Scalar fields broadcast;countdrives the group count.count=Nonemeans discovery mode (translated tosmCount=0internally).WorkqueueResourcemergesCU_DEV_RESOURCE_TYPE_WORKQUEUE_CONFIGandCU_DEV_RESOURCE_TYPE_WORKQUEUEunder one user-facing class. Strings for option values (e.g.sharing_scope="green_ctx_balanced").ContextOptions(resources=[...])→dev.create_context()— resource descriptor generation andcuGreenCtxCreateare internal. The user passes pre-split resource objects.ctx.create_stream()creates streams bound to a green context without callingdev.set_current(). The C++ handle layer auto-dispatches betweencuGreenCtxStreamCreateandcuStreamCreateWithPrioritybased on the context type. Green context streams must be non-blocking.ctx.close()does not manage the context stack — closing a current context raisesRuntimeError.dev.set_current(green_ctx)still works for backward compatibility but is not the recommended path.New public API
Device.resources→DeviceResources(namespace:.sm,.workqueue)Context.resources→DeviceResources(context-level query of provisioned resources)Stream.resources→DeviceResources(delegates to the stream's parent context)Context.create_stream(options)→Stream(green contexts only; raises on primary)Context.is_green→boolSMResource— properties:sm_count,min_partition_size,coscheduled_alignment,flags,handle; method:split(options, *, dry_run=False)SMResourceOptions—count,coscheduled_sm_count,preferred_coscheduled_sm_countWorkqueueResource— method:configure(options)WorkqueueResourceOptions—sharing_scopeContextOptions.resources— acceptsSequence[SMResource | WorkqueueResource]Implementation details
C++ handle layer (
resource_handles.hpp/cpp):GreenCtxHandle(shared_ptr<const CUgreenCtx>) — owning handle; destructor callscuGreenCtxDestroy.ContextBoxgains aGreenCtxHandlefield so the derivedCUcontextkeeps the green ctx alive.get_context_green_ctx()provides reverse lookup.create_green_ctx_handle()combinescuDevResourceGenerateDesc+cuGreenCtxCreatein one call — the descriptor is transient (noDevResourceDescHandleneeded since CUDA has no explicit destroy for it).create_stream_handle()auto-dispatches: checksget_context_green_ctx()on the providedContextHandleand callscuGreenCtxStreamCreatefor green contexts,cuStreamCreateWithPriorityfor primary. ReturnsCUDA_ERROR_NOT_SUPPORTEDif the context is green butcuGreenCtxStreamCreateis unavailable (CUDA < 12.5).context_registry/stream_registry(HandleRegistry) deduplicate handles by raw CUDA pointer, enabling identity-preservingset_currentswaps.Bug fix — stream context tracking:
StreamBoxnow carries aContextHandledependency, populated at creation time.get_stream_context()returns it without a driver call.Stream._from_handleandStream_ensure_ctxprefer the registry-backed handle before falling back tocuStreamGetCtx. This fixes a latent issue where streams created in a green context would lose their context association after aset_currentswap.Error handling:
dev.create_context()without resources raisesValueErrorwith a clear message.nonblocking=Falseis caught by the driver (CUDA_ERROR_INVALID_VALUE) and re-raised asValueErrorwith a helpful message.cuCtxGetStreamPriorityRangefailure (CUDA_ERROR_INVALID_CONTEXT) raises "Call dev.set_current() before creating streams."Version guards:
IF CUDA_CORE_BUILD_MAJOR >= 13gatescuDevSmResourceSplit(the general/structured form).cy_driver_version() >= (12, 4, 0)for all green ctx APIs;>= (13, 1, 0)for structured splits. RaisesValueErrorwhen unsupported.cuDevSmResourceSplitByCountfor basic (homogeneous) splits. Per-groupcoscheduled_sm_countand heterogeneous counts require 13.1+ and raiseNotImplementedErroron 12.x._get_optional_driver_fn— gracefulNULLwhen bindings lack the symbol.Test coverage
33 tests in
test_green_context.py, organized with proper pytest fixtures and classes:sm_resource,wq_resource,green_ctx(withCUDAError→ skip),fill_kernel_use_green_ctxcontext manager for safe push/pop in set_current regression testsTestSMResourceQuery— properties, arch constraints (pre-Hopper vs Hopper+)TestWorkqueueResource— query, configure valid/invalidTestSMResourceSplitValidation— scalar/Sequence mismatch, negative count, dry-run blockedTestSMResourceSplit— single/two-group splits with arch-aligned counts, discovery mode, alignment, dry-run parityTestGreenContextLifecycle—is_green,create_streamon primary raises, blocking stream raises, explicit stream creation, stream/event context tracking, close-while-current guard, set_current regressionTestContextResources— green ctx SM resources are subset of device, two contexts have disjoint partitions, stream.resources matches ctx.resources (SM + workqueue)TestGreenContextKernelLaunch— compile + launch + host-verify viactx.create_stream(), two independent green contexts with different fill values, SM + workqueue combinedValidation
-- Leo's bot