Performance tuning for NVIDIA Grace-Hopper for the Gordon Bell runs#987
Performance tuning for NVIDIA Grace-Hopper for the Gordon Bell runs#987ntselepidis wants to merge 3 commits intoMFlowCode:masterfrom
Conversation
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
|||||||||||
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #987 +/- ##
=======================================
Coverage 40.91% 40.91%
=======================================
Files 70 70
Lines 20270 20270
Branches 2520 2520
=======================================
Hits 8293 8293
Misses 10439 10439
Partials 1538 1538 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
it seems the maxregisters flag should be guarded by the device type (i'm sure some devices don't have that many registers) |
|
@ntselepidis is this supposed to be merged or just for fun? |
|
My understanding is that the change to |
|
@ntselepidis what is the status of this? seems like we need a cmake guard for specific architectures or something |
|
As @wilfonba correctly noted, we found that setting
|
|
sounds about right |
Claude Code ReviewPR #987 — Performance tuning for NVIDIA Grace-Hopper (Gordon Bell runs) Two files changed: Critical Issues1. The flag is added unconditionally inside the NVHPC
Required fix before merge — either gate on if (MFC_Unified)
target_compile_options(simulation PRIVATE -gpu=maxregcount:128)
endif()Or scope to just set_source_files_properties(
"${CMAKE_BINARY_DIR}/fypp/simulation/m_igr.fpp.f90"
PROPERTIES COMPILE_OPTIONS "-gpu=maxregcount:128"
)Confidence: 95 Important Issues2. Verify The GitHub diff context reports the The insertion should be in the Please verify the diff against HEAD confirms placement is in the Confidence: 82 Suggestions
Strengths
|
Automated Code ReviewSummary: 1 critical issue, 1 important issue. Critical Issue
File: The new directive if (igr_iter_solver == 1) then
@:ALLOCATE(jac_old(...))
end if
Fix: Split the declare to exclude $:GPU_DECLARE(create='[jac, jac_rhs]')The existing Important Issue
The flag is correctly gated to nvfortran GPU builds, but it applies to every kernel in the simulation target — WENO, Riemann solvers, viscous flux, bubble dynamics, FFT — when it was tuned specifically for IGR Jacobian kernels. A single global register cap will pessimize register usage for every other kernel. The standard pattern for feature-specific NVHPC flags in MFC is Strengths
|
User description
This PR will add some changes to reduce registers and improve performance.
Tested with NVHPC nightly on single Santis node with 4 Grace-Hoppers on
3D_IGR_TaylorGreenVortex_nvidiacase.PR Type
Enhancement
Description
Add GPU register limit optimization for NVIDIA Grace-Hopper
Include GPU memory management directive for Jacobian arrays
Diagram Walkthrough
File Walkthrough
CMakeLists.txt
Set GPU register count limitCMakeLists.txt
-gpu=maxregcount:165compiler flag to limit GPU register usagem_igr.fpp
Add GPU memory management directivesrc/simulation/m_igr.fpp
$:GPU_DECLAREdirective for Jacobian arrays (jac,jac_rhs,jac_old)