Skip to content

fix(connect): close TOCTOU race in connect sync lock acquisition#113

Open
omergk28 wants to merge 2 commits into
ActiveMemory:mainfrom
omergk28:fix/93-connect-sync-lock-toctou
Open

fix(connect): close TOCTOU race in connect sync lock acquisition#113
omergk28 wants to merge 2 commits into
ActiveMemory:mainfrom
omergk28:fix/93-connect-sync-lock-toctou

Conversation

@omergk28

@omergk28 omergk28 commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

loadState in internal/cli/connection/core/sync guarded ctx connect sync with a stat-then-write pair — racy by construction. Two processes racing past the os.Stat could both acquire, both load the same LastSequence, and both write duplicate entries into .context/hub/.

This PR replaces the pair with the atomic create-or-fail primitive the codebase already ships for exactly this purpose: io.SafeTryLock (O_CREATE|O_EXCL, one syscall) + io.SafeUnlock, with prior art at internal/cli/dream/core/pass/pass.go:62-70. !acquired maps to the pre-existing os.ErrExist contract, so caller-facing behavior is unchanged.

Closes #93.

Changes

  • internal/cli/connection/core/sync/state.go — atomic lock acquisition; release delegates to SafeUnlock (warn-on-failure logging kept)
  • internal/config/hub/{hub.go,doc.go}LockSentinel removed: the lock is the file's existence, not its content, and the racy write was the constant's only consumer
  • internal/cli/connection/core/sync/state_test.go — new; the package previously had zero tests
  • specs/fix-connect-sync-lock-toctou.md — spec per the project's spec-per-commit invariant

Tests (mapped to the issue's Tests Required)

Issue ask Test
N goroutines, exactly one succeeds, N−1 get os.ErrExist TestLoadState_RejectsConcurrentSyncs (16-way; winners held until all results are collected, so the count is deterministic)
Lock released after failure so the next attempt proceeds TestLoadState_ReleaseRemovesLock + TestLoadState_ReleasesLockOnCorruptState (post-acquisition failure must not leak the lock)
Lock lives at <ctxDir>/hub/.sync.lock TestLoadState_LockFileLocation

Verification

  • Mutation check: the new contention test was run against the old stat-then-write code — it fails with winners = 16, want exactly 1, proving both that the race is real and that the test catches it. Against the fix: exactly 1 winner, 15 × os.ErrExist.
  • go test -race -count=10 ./internal/cli/connection/core/sync/ — clean.
  • make audit — all checks passed (fmt, vet, golangci-lint, style, drift, docs, full test suite).

Notes

  • One deliberate correction vs. the issue text: ctx connect sync lock has a TOCTOU race; concurrent syncs can both pass the check #93 says duplicates land in .context/shared/ — that's the pre-rename path; the renderer joins cfgHub.DirHub (render.go:32), so the spec and this PR say .context/hub/.
  • Out of scope, per the issue: stale-lock detection (crashed process leaves a stale lock until a follow-up adds PID/age cleanup) and the flock alternative (Unix-only; SafeTryLock matches the existing sentinel model).

🤖 Generated with Claude Code

Second Commit: make audit on Stock macOS

While gating this PR, make audit aborted on a stock Mac before running a single drift check:

./hack/lint-drift.sh: line 39: exclude_args[@]: unbound variable

bash 3.2 (the newest Apple ships, GPLv2 freeze) treats "${arr[@]}" on an empty array as unbound under set -u; bash 4.4+ does not. Since make audit is the contributing guide's mandatory pre-PR gate, every Mac contributor hits this. Commit 2 guards the expansion with the canonical ${arr[@]+"${arr[@]}"} alternate form (no behavior change on bash 4+), with its own spec (specs/fix-lint-drift-bash32-empty-array.md) and a LEARNINGS.md entry. The other hack/ array expansions were checked empirically — all safe today (count-guarded or hardcoded non-empty; details in the spec).

Happy to split this into its own PR if you'd rather keep #93 surgical — it's an independent commit, so a cherry-pick is clean either way.

The sync lock was a stat-then-write: two processes racing past the
existence check could both acquire, both load the same LastSequence,
and both write duplicate entries into .context/hub/. Replace the
pair with the atomic io.SafeTryLock (O_CREATE|O_EXCL, single syscall)
and release via io.SafeUnlock — the same primitive the dream pass
already uses, preserving the os.ErrExist contract for callers.

- LockSentinel removed: the lock is the file's existence, not its
  content, and the racy write was the constant's only consumer.
- state_test.go regression-pins the contract: 16-way contention
  yields exactly one winner, release frees the next sync, a corrupt
  state file does not leak the lock, and the lock path stays at
  <ctxDir>/hub/.sync.lock.

Closes ActiveMemory#93.

Spec: specs/fix-connect-sync-lock-toctou.md
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Omer Kocaoglu <omergk28@gmail.com>
@omergk28 omergk28 requested a review from josealekhine as a code owner June 11, 2026 02:23
Stock macOS ships bash 3.2 (GPLv2 freeze), which treats
"${arr[@]}" on an empty array as an unbound variable under
set -u — bash 4.4+ does not. drift_grep expands its exclude
array unconditionally and several call sites pass no excludes,
so make lint-style (and therefore make audit, the mandatory
pre-PR gate) aborted on every stock Mac before running a single
drift check.

Guard with the parameter-expansion alternate form,
${arr[@]+"${arr[@]}"}: empty expands to zero words, populated
reproduces the original quoted expansion, bash 4+ behavior
unchanged. The other hack/ array expansions are safe today
(count-guarded or hardcoded non-empty); details in the spec.

Captures the gotcha in LEARNINGS.md so the next session does
not re-debug it.

Spec: specs/fix-lint-drift-bash32-empty-array.md
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Omer Kocaoglu <omergk28@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ctx connect sync lock has a TOCTOU race; concurrent syncs can both pass the check

1 participant