Skip to content

Update to v2.0.0-alpha.1#944

Draft
xylar wants to merge 30 commits intoMPAS-Dev:mainfrom
xylar:switch-to-mache-deploy
Draft

Update to v2.0.0-alpha.1#944
xylar wants to merge 30 commits intoMPAS-Dev:mainfrom
xylar:switch-to-mache-deploy

Conversation

@xylar
Copy link
Copy Markdown
Collaborator

@xylar xylar commented Mar 21, 2026

This pull request updates to mache.deploy, which uses the ./deploy.py script instead of ./conda/configure-compass-env.py.

It switches to using pixi in the background for creating environments with conda packages.

Updates:

  • esmf v8.9.1
  • mache v3.6.1 -- brings in mache.deploy, mache.jigsaw and mache.parallel as well as module updates on many machines and several bug fixes
  • moab v5.6.0
  • albany tag compass-2026-03-21
  • trilinos tag compass-2026-02-06

Testing

Only testing MALI, as MPAS-Ocean is no longer being tested regularly on Compass.

MALI with full_integration:

Deployed

MALI with full_integration:

  • Chrysalis (@xylar)
    • gnu and openmpi
  • Perlmutter (@xylar)
    • gnu and mpich
    • gnugpu and mpich

@xylar xylar force-pushed the switch-to-mache-deploy branch from 9c93e54 to 0303b90 Compare March 21, 2026 13:14
@xylar xylar added documentation Improvements or additions to documentation enhancement New feature or request ci Changes affect Azure Pipelines CI MALI-Dev PR finished dependencies and deployment Changes relate to creating conda and Spack environments, and creating a load script framework dependencies Pull requests that update a dependency file labels Mar 21, 2026
@xylar xylar force-pushed the switch-to-mache-deploy branch from 0303b90 to ba7c900 Compare March 21, 2026 13:41
@xylar xylar force-pushed the switch-to-mache-deploy branch from 74bb269 to cec99ef Compare March 21, 2026 16:07
@xylar xylar force-pushed the switch-to-mache-deploy branch 2 times, most recently from fd971e0 to 7f43434 Compare March 31, 2026 17:58
@xylar
Copy link
Copy Markdown
Collaborator Author

xylar commented Mar 31, 2026

@matthewhoffman and @trhille, to test this for now, use:

./deploy.py --with-albany --deploy-spack --mache-fork xylar/mache --mache-branch update-to-3.3.0 ...

This branch is needed until I tag a 3.3.0rc2 for mache.

@matthewhoffman
Copy link
Copy Markdown
Member

@xylar , can you walk me through a few more details about the transition to deploy.py?

First off, is the mache branch in your previous comment out of date? Mache branch update-to-3.3.0 doesn't exist on your mache fork. So I used fix-mache-deploy-with-mache-rc instead. I invoked it with:

./deploy.py --with-albany --deploy-spack --mache-fork xylar/mache --mache-branch fix-mache-deploy-with-mache-rc --compiler gnu --mpi mpich --machine pm-cpu

It ran great for awhile and seemed much faster than the old ./conda/configure-compass-env.py script. But after finishing the jigsaw build, it died with this error:

 Running:
   env -i bash -l /global/cfs/cdirs/fanssie/users/hoffman2/compass/v2.0.0-alpha.1/deploy_tmp/spack/build_compass_albany_gnu_mpich.bash

fatal: detected dubious ownership in repository at '/global/cfs/cdirs/e3sm/software/compass/pm-cpu/spack/dev_compass_2.0.0'
To add an exception for this directory, call:

	git config --global --add safe.directory /global/cfs/cdirs/e3sm/software/compass/pm-cpu/spack/dev_compass_2.0.0
Traceback (most recent call last):
  File "/global/cfs/cdirs/fanssie/users/hoffman2/compass/v2.0.0-alpha.1/deploy_tmp/bootstrap_pixi/.pixi/envs/default/bin/mache", line 6, in <module>
    sys.exit(main())
             ~~~~^^
  File "/global/cfs/cdirs/fanssie/users/hoffman2/compass/v2.0.0-alpha.1/deploy_tmp/bootstrap_pixi/.pixi/envs/default/lib/python3.14/site-packages/mache/__main__.py", line 21, in main
    args.func(args)
    ~~~~~~~~~^^^^^^
  File "/global/cfs/cdirs/fanssie/users/hoffman2/compass/v2.0.0-alpha.1/deploy_tmp/bootstrap_pixi/.pixi/envs/default/lib/python3.14/site-packages/mache/deploy/cli.py", line 91, in _dispatch_deploy
    run_deploy(args=args)
    ~~~~~~~~~~^^^^^^^^^^^
  File "/global/cfs/cdirs/fanssie/users/hoffman2/compass/v2.0.0-alpha.1/deploy_tmp/bootstrap_pixi/.pixi/envs/default/lib/python3.14/site-packages/mache/deploy/run.py", line 289, in run_deploy
    spack_results = deploy_spack_envs(
        ctx=ctx,
    ...<2 lines>...
        quiet=quiet,
    )
  File "/global/cfs/cdirs/fanssie/users/hoffman2/compass/v2.0.0-alpha.1/deploy_tmp/bootstrap_pixi/.pixi/envs/default/lib/python3.14/site-packages/mache/deploy/spack.py", line 374, in deploy_spack_envs
    _install_spack_env(
    ~~~~~~~~~~~~~~~~~~^
        ctx=ctx,
        ^^^^^^^^
    ...<10 lines>...
        quiet=quiet,
        ^^^^^^^^^^^^
    )
    ^
  File "/global/cfs/cdirs/fanssie/users/hoffman2/compass/v2.0.0-alpha.1/deploy_tmp/bootstrap_pixi/.pixi/envs/default/lib/python3.14/site-packages/mache/deploy/spack.py", line 914, in _install_spack_env
    check_call(cmd, log_filename=log_filename, quiet=quiet)
    ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/cfs/cdirs/fanssie/users/hoffman2/compass/v2.0.0-alpha.1/deploy_tmp/bootstrap_pixi/.pixi/envs/default/lib/python3.14/site-packages/mache/deploy/bootstrap.py", line 222, in check_call
    raise subprocess.CalledProcessError(
        process.returncode, commands, output=stdout_data
    )
subprocess.CalledProcessError: Command 'env -i bash -l /global/cfs/cdirs/fanssie/users/hoffman2/compass/v2.0.0-alpha.1/deploy_tmp/spack/build_compass_albany_gnu_mpich.bash' returned non-zero exit status 128.

ERROR: Deployment step failed (exit code 1). See the error output above.

Am I doing this wrong? Is this trying to deploy for the entire project? I don't think you want me interacting with /global/cfs/cdirs/e3sm/software/compass/pm-cpu/spack/dev_compass_2.0.0 do you?

@xylar
Copy link
Copy Markdown
Collaborator Author

xylar commented Apr 3, 2026

@matthewhoffman, I'm sorry. I'm developing mache for 3 projects at once -- E3SM-Unified, Polaris and Compass. that's a situation I usually try to avoid for precisely this type of reason.

I needed to release mache 3.3.0 for Polaris yesterday. As a result, the update-to-3.3.0 branch is gone. But I neglected to update this Compass branch until just now. At this point, no --mache-fork and --mache-branch should be needed for testing.

You also don't want to deploy spack. That was a mistake in my command above.

./deploy.py --with-albany --compiler gnu --mpi mpich --machine pm-cpu

@xylar xylar force-pushed the switch-to-mache-deploy branch from d3cc399 to 21d2713 Compare April 3, 2026 07:54
@matthewhoffman
Copy link
Copy Markdown
Member

Thanks, @xylar . I made a little more progress with the command you suggested. I had to make this change:

diff --git a/deploy/cli_spec.json b/deploy/cli_spec.json
index 56a951b1f..ebdea2c4c 100644
--- a/deploy/cli_spec.json
+++ b/deploy/cli_spec.json
@@ -1,7 +1,7 @@
 {
   "meta": {
     "software": "compass",
-    "mache_version": "3.3.0rc2",
+    "mache_version": "3.3.0",
     "description": "Deploy compass environment"
   },
   "arguments": [
diff --git a/deploy/pins.cfg b/deploy/pins.cfg
index bfe79a90e..6ca63db19 100644
--- a/deploy/pins.cfg
+++ b/deploy/pins.cfg
@@ -4,7 +4,7 @@ bootstrap_python = 3.14
 python = 3.14
 esmf = 8.9.1
 geometric_features = 1.6.1
-mache = 3.3.0rc2
+mache = 3.3.0
 mpas_tools = 1.4.0
 otps = 2021.10
 parallelio = 2.6.9

but then I still ran into an issue of it trying to touch the deployed spack env in the e3sm project space:

 Running:
   source /global/cfs/cdirs/e3sm/software/compass/pm-cpu/spack/dev_compass_2.0.0/share/spack/setup-env.sh
   spack env activate compass_albany_gnu_mpich
   spack config add modules:prefix_inspections:lib:[LD_LIBRARY_PATH]
   spack config add modules:prefix_inspections:lib64:[LD_LIBRARY_PATH]

==> Error: cannot write to config file [Errno 13] Permission denied: '/global/cfs/cdirs/e3sm/software/compass/pm-cpu/spack/dev_compass_2.0.0/var/spack/environments/compass_albany_gnu_mpich/.spack.yaml.tmp'
Traceback (most recent call last):
  File "/global/cfs/cdirs/fanssie/users/hoffman2/compass/v2.0.0-alpha.1/deploy_tmp/bootstrap_pixi/.pixi/envs/default/lib/python3.14/site-packages/mache/deploy/hooks.py", line 103, in run_hook
    result = func(context)
  File "/global/cfs/cdirs/fanssie/users/hoffman2/compass/v2.0.0-alpha.1/deploy/hooks.py", line 62, in post_spack
    _set_ld_library_path_for_spack_env(
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        ctx=ctx,
        ^^^^^^^^
        spack_path=spack_path,
        ^^^^^^^^^^^^^^^^^^^^^^
        env_name=env_name,
        ^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/global/cfs/cdirs/fanssie/users/hoffman2/compass/v2.0.0-alpha.1/deploy/hooks.py", line 212, in _set_ld_library_path_for_spack_env
    check_call(
    ~~~~~~~~~~^
        commands,
        ^^^^^^^^^
        log_filename=_get_log_filename(ctx),
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        quiet=bool(getattr(ctx.args, 'quiet', False)),
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/global/cfs/cdirs/fanssie/users/hoffman2/compass/v2.0.0-alpha.1/deploy_tmp/bootstrap_pixi/.pixi/envs/default/lib/python3.14/site-packages/mache/deploy/bootstrap.py", line 222, in check_call
    raise subprocess.CalledProcessError(
        process.returncode, commands, output=stdout_data
    )

@xylar
Copy link
Copy Markdown
Collaborator Author

xylar commented Apr 3, 2026

I had to make this change:

I think that's in 21d2713. Did you not have that commit or did I miss something?

@xylar
Copy link
Copy Markdown
Collaborator Author

xylar commented Apr 3, 2026

but then I still ran into an issue of it trying to touch the deployed spack env in the e3sm project space:

Yep, that's something I need to fix. Sorry about that!

@xylar
Copy link
Copy Markdown
Collaborator Author

xylar commented Apr 3, 2026

@matthewhoffman, the second issue should be fixed.

@matthewhoffman
Copy link
Copy Markdown
Member

@xylar , thanks for addressing the second issue. The first must have been because I had failed to update my local branch this morning. After updating to 160d75d , ./deploy runs successfully and I'm able to load the compass env. I will move on to trying to build MALI next. One question - do you plan to add the version number back to the load_compass_pm-cpu_gnu_mpich.sh script that gets generated?

@xylar
Copy link
Copy Markdown
Collaborator Author

xylar commented Apr 3, 2026

One question - do you plan to add the version number back to the load_compass_pm-cpu_gnu_mpich.sh script that gets generated?

The Compass version is in there:

export MACHE_DEPLOY_TARGET_VERSION="2.0.0-alpha.1"

It's just called something different than before. We can copy that into another environment variable if you need it.

@xylar
Copy link
Copy Markdown
Collaborator Author

xylar commented Apr 3, 2026

Oh, wait, it already is:

export COMPASS_VERSION="2.0.0-alpha.1"

@xylar
Copy link
Copy Markdown
Collaborator Author

xylar commented Apr 3, 2026

Are you not seeing that in you load script?

@matthewhoffman
Copy link
Copy Markdown
Member

matthewhoffman commented Apr 3, 2026

I just mean the name of the load script used to have the version in the filename, but I'm not seeing that. It's not a big deal, I was just wondering if that was intentional.

As for progress, when I compile MALI I am seeing the same PIO lib errors that you do in the issue you opened. I'm working on debugging them with help from ChatGPT and so far the obvious things are not working, but I'll keep at it while I have time.

@xylar
Copy link
Copy Markdown
Collaborator Author

xylar commented Apr 3, 2026

I see. No, the load script won't include the compass version anymore. I didn't find that to be particularly useful.

xylar added 25 commits April 24, 2026 08:54
The biggest content changes are:

* removing the stale user quick-start conda/load-script workflow that referenced commands no longer in the repo
* updating CLI docs to match current `compass list/setup/suite/run` behavior and output fields
* fixing framework docs to use the current pickle/log names like `test_case.pickle` and `case_outputs`
* bringing landice docs in line with current registered test paths, especially the solver-specific Greenland, Dome, and Thwaites cases
* updating the landice suite docs to match current suite contents and clarifying that eigencalving exists in the test group even when it is not in the regression suite
This treats warnings as errors.
This means we need to opt out of the system CMake.
It is too old.

Don't exclude it on Perlmutter, where it is new enough and
can't be built with Spack.
These clash with MPAS framework's equivalent files.
This keeps ESMF from stepping on SCORPIO's toes by installing
its own ParallelIO.
This is no longer needed now that ESMF is in the software environment.
@xylar xylar force-pushed the switch-to-mache-deploy branch from 069483c to d121b6b Compare April 24, 2026 07:01
@matthewhoffman
Copy link
Copy Markdown
Member

@xylar and @mperego , I think we should go ahead and remove the exodus output from the tests. We had added it at some point because when runs fail it is sometimes useful to be able to look at the velocity solution on the exo mesh to see what's going on, and it's convenient to not have to rerun the tests. But I can't remember the last time I've actually had to look at them, so it's not a big inconvenience to disable them again, and that seems a much better use of time than trying to debug these libraries.

@xylar , are you ok if I push a commit to your branch that makes the changes to disable the exo output? That way we can test everything in this branch.

@matthewhoffman
Copy link
Copy Markdown
Member

Also, is there anything I should be aware of in your push since the last discussion messages? Should I rebuild my env locally before testing again?

@xylar
Copy link
Copy Markdown
Collaborator Author

xylar commented Apr 30, 2026

@xylar , are you ok if I push a commit to your branch that makes the changes to disable the exo output? That way we can test everything in this branch.

Yes, go for it.

@xylar
Copy link
Copy Markdown
Collaborator Author

xylar commented Apr 30, 2026

Also, is there anything I should be aware of in your push since the last discussion messages? Should I rebuild my env locally before testing again?

Yes, you need to rebuild. I switch to a much newer version of mache.

I think I rebuild the spack environments but it's hard to keep track of everything right now :-(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci Changes affect Azure Pipelines CI dependencies and deployment Changes relate to creating conda and Spack environments, and creating a load script dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation enhancement New feature or request framework MALI-Dev PR finished

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants