chore: switch Dockerfile to slim Python + uv lockfile#719
chore: switch Dockerfile to slim Python + uv lockfile#719cailmdaley wants to merge 5 commits intodevelopfrom
Conversation
The v2.0 merge moved the base image to skaha/astroml, which ships conda. That double-installs every Python dep (conda + pip) and bloats the image to 5–10 GB. Reverting to python:3.12-slim-bookworm and using uv with a lockfile gives: - pyproject.toml as the single source of truth for *what* shapepipe needs - uv.lock pinning *which versions* — upstream changes only land when we regenerate the lockfile deliberately - a lean apt layer using Debian's astromatic packages (psfex, source-extractor, weightwatcher) instead of building weightwatcher from source - ~10× faster rebuilds and a much smaller image Also adds a CI smoke test that invokes source-extractor, weightwatcher, and psfex inside the built image. The original failure mode that prompted this work (a binary that builds fine but can't run on canfar) will now surface at PR time, not deployment time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Dockerfile now carries an inline note about why we use Debian's weightwatcher package instead of the source build with sed-patched headers — short version: GCC 10 (Apr 2020) flipped its default to -fno-common, breaking WeightWatcher 1.12's (2014) use of common-symbol globals in headers. The old Dockerfile patched them inline; Debian's weightwatcher 1.12+dfsg-3 patches the same way upstream of the build. Pyproject minimums bumped to current major versions where the major itself moved (astropy 6→7, numpy 1→2, pandas 2→3) so the abstract constraint signals our actual target. Minor-version drift stays in uv.lock. ngmix line carries an explicit "do not modernize" note since it's pinned to Axel's stable_version branch until upstream absorbs the fixes (tracked in the ngmix-update fiber). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@martinkilbinger ready for your eyes when you have a moment. Two things worth flagging: Image works end-to-end on the CI runner. Build is green in 3m31s (uv installs 238 packages in 322ms — everything's a prebuilt wheel). The added Modernized package versions in the same PR (commit (Disclosure: PR drafted by Claude in collaboration with me; I haven't run a full pipeline through the new image yet — that's the canfar test I want to do.) |
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@martinkilbinger one more thing worth flagging since it explains a Dockerfile shape change you'll notice — why we dropped the WeightWatcher source build: The previous Dockerfile (v1.x) built WeightWatcher from source with six `sed` patches to `prefs.h`, `globals.h`, `fitscat.h`, `prefs.c`, `main.c`, and `fitscat.c`. The new Dockerfile just `apt install`s `weightwatcher` instead. The patches were converting tentative definitions like `prefstruct prefs;` in headers into proper `extern` declarations + a single definition in the corresponding `.c` file. They were a workaround for GCC 10 (April 2020) flipping its default to `-fno-common`, which broke WeightWatcher 1.12's (2014) use of common-symbol globals — every translation unit started getting its own definition, the linker complained. Debian's `weightwatcher 1.12+dfsg-3` package on bookworm carries an equivalent patch upstream of the build (the `+dfsg-3` suffix indicates Debian-applied patches). So `apt install` gives us the same WeightWatcher 1.12 binary, just with Debian carrying the GCC-compatibility fix instead of us doing it inline. Tradeoff for completeness:
The apt route is probably the better default — Debian has stronger Astromatic build infrastructure than we do. If we ever needed a version bookworm doesn't ship, we'd swap back to source-build. |
|
When running the tests from within the docker of this branch, I get errors: And the slim bookworm image does not have some useful programs, e.g. vi, which should be added. |
|
When running the tests from within the docker of this branch, I get errors: And the slim bookworm image does not have some useful programs, e.g. vi, which should be added. And cloning the branch and using that version, I also get an error message for `uv run pytest: |
Rewrites the slim-bookworm Dockerfile as a two-target multi-stage build: - runtime — minimal image for canfar batch jobs and downstream FROM clauses. Ships astromatic binaries + jupyter + fitsio extras only. - dev — runtime plus everyday CLI tools (vim, less, tmux, htop, ripgrep, fd, jq, bat, git-lfs, …) and *all* extras pre-installed via the `dev` meta-extra (test, lint, doc, release, jupyter, fitsio). Default target when `--target` is omitted; published as `:<branch>` (no suffix). Both stages share a `base` stage carrying system deps + uv + the lockfile copy, so the heavy apt + wheel resolution is cached once. Addresses Martin's PR feedback on #719: - `uv run pytest: No such file or directory` — pytest now ships in the dev image (was previously only in the never-built `--extra test`). - `uv sync --extra X: Read-only file system` — the dev image has every extra pre-baked, so live `uv sync` is no longer needed at all. As a belt-and-braces also `chmod -R go+rwX /app` so non-root users on canfar/skaha can still mutate the venv when they want to. - "no vim in the slim image" — `vim`, `less`, `tmux`, `htop`, plus a curated subset of cailmdaley/containers (rg/fd/jq/bat/…) now ship in the dev target. deploy-image.yml builds, smoke-tests, and pushes both targets: - runtime: source-extractor / weightwatcher / psfex / shapepipe_run all invoked inside the built image. - dev: vim, ripgrep, pytest verified runnable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Some of the issues are fixed, but I think /app is still a read-only file system and I cannot run the pytests. I use |
Reproducing Martin's `apptainer pull ... && apptainer shell ... && cd
/app && uv run pytest` flow surfaced three layered read-only-fs failures
that all need their default writable target moved to /tmp:
1. `uv run` auto-syncs the venv (would mutate /app/.venv) → UV_NO_SYNC=1
so `uv run X` just runs X against the baked-in venv.
2. uv's package cache wants $HOME/.cache/uv (host-bind, not always
writable, and uv's mkdir logic gets confused if it pre-exists)
→ UV_CACHE_DIR=/tmp/uv-cache.
3. pytest-cov erases /app/.coverage at startup because
`addopts = --cov=shapepipe` in pyproject → COVERAGE_FILE=/tmp/.coverage.
Verified end-to-end on the existing PR image (without the env-var
defaults, reproducing the bug; and with them set in the shell, all 9
tier-0/1/2 tests pass in 10s on both a read-only SIF and a writable
sandbox). Once CI rebuilds with these defaults baked in, neither
Martin nor anyone else will need to set anything.
Also adds `docs/source/container.md` covering:
- The two image targets (`:<branch>` = dev default, `:<branch>-runtime`
= slim) and what each is for.
- Read-only SIF vs writable sandbox postures, with worked examples for
both. Includes the bind-mount-host-clone pattern Cail uses for
long-lived sandbox containers.
- The three configuration layers (`pyproject.toml` = abstract minimums
/ SSOT, `uv.lock` = concrete pins generated by uv, `Dockerfile` =
system deps + invocations) with a "what to edit when" table.
- A small symptom→cause→fix table for the three read-only failures
above, in case anyone bypasses /tmp with a custom apptainer profile.
`installation.md` points to the new doc; `toc.rst` registers it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@martinkilbinger thanks for testing — your symptom led us to a deeper read-only-fs problem than just pytest not being installed. Reproduced your flow (
Fix in ENV UV_NO_SYNC=1 UV_CACHE_DIR=/tmp/uv-cache COVERAGE_FILE=/tmp/.coverageVerified on the existing image with the env vars set in the shell — all 9 tests pass in 10s in both read-only SIF and writable sandbox modes. Once CI rebuilds, your exact flow should work without any setup: apptainer pull shapepipe.sif docker://ghcr.io/cosmostat/shapepipe:chore-uv-lockfile
apptainer shell shapepipe.sif
cd /app && uv run pytest # should now run cleanlyWhile I was in there, also wrote up (Disclosure: investigation, fix, and docs drafted by Claude in collaboration with me.) |
Summary
Replaces the skaha/astroml base (which brings conda → double-installs every Python dep, ~5–10 GB image) with
python:3.12-slim-bookworm+uvdriven by a committed lockfile.images.canfar.net/skaha/astroml:latest(conda Python)python:3.12-slim-bookwormDockerfile, drifts frompyproject.tomlpyproject.tomlis SSOT (what),uv.lockis the pinned manifest (which versions)weightwatcherpackageshapepipe_run -c …source-extractor --version,weightwatcher --version,psfex --versionWhy
Two problems with the v2.0 image:
pip installeverything on top. Two copies of every scientific package, multi-GB bloat, slow builds.shapepipe_run.Lockfile workflow
uv sync --frozenin the Dockerfile fails the build ifpyproject.tomlanduv.lockare out of sync — so a stale lockfile cannot ship.Test plan
Test — binariesstep passes (sextractor, weightwatcher, psfex)Test — shapepipe entry pointpassesOut of scope / follow-ups
Dockerfile.canfarbuilding on skaha/astroml if there's a deployment reason to keep that base image around (Martin mentioned this could be cleaner — currently conjectural).Dockerfile.jupyteris unchanged; it inherits from this base.🤖 Generated with Claude Code