Closed
Conversation
Skip test for specific Go version due to CGO issue.
This was referenced Apr 17, 2026
* remove skip condition for Go 1.23 in TestBindCgoPackage * remove skip condition for Go 1.23 in TestBindCgoPackage * update Go version matrix in CI configuration
Adds two reproducers that exercise the go2py/C.CString-without-GIL crash: 1. A 5000-iteration stress loop in the cgo example (Hi/Hello string returns). 2. A new gilstring example covering struct string fields, slice elements, and map values under repeated calls.
- gilstring.go reduced to a single Hello() function (mirrors hi.Hello from the issue report) - test.py imports both gilstring and simple as two separately-built extensions in the same Python process, interleaving Add/Hello calls over 5000 iterations - TestGilString builds each package into its own subdir to prevent C symbol collisions, then runs test.py with a shared PYTHONPATH root - ci.yml adds macos-15-intel (x86_64) to the matrix — the platform where "fatal error: bad sweepgen in refill" reliably reproduces
C.GoString (and other py2go converters) call runtime.gostring → mallocgc inside a CGo callback. If the GIL is released before those conversions, Go's GC can observe a corrupted sweep-generation counter, causing "fatal error: bad sweepgen in refill" on Go ≥1.24 / macOS x86_64 (issue #370). In genFuncBody(), pre-convert each py2go argument into a local variable while the GIL is held via PyGILState_Ensure/Release, then release the GIL for the actual Go function call as before. The callArgs loop now references the pre-converted variable instead of inlining C.GoString() after SaveThread. Also documents Idea 2 (unsafe.String zero-alloc approach) as a future defence-in-depth option in a code comment.
…ppers C.GoString (and other py2go converters) call runtime.gostring → mallocgc inside a CGo callback. If those conversions run after PyEval_SaveThread releases the GIL, Go's GC can observe a corrupted sweep-generation counter, causing "fatal error: bad sweepgen in refill" on Go ≥1.24 / macOS x86_64 (issue #370). In genFuncBody(), pre-convert each py2go argument into a local variable while the GIL is held via PyGILState_Ensure/Release, then release the GIL for the actual Go function call as before. Interface-handle arguments (ifchandle && goname == "interface{}") are excluded from pre-conversion, matching the existing callArgs switch logic to avoid type mismatches in generated code for the iface example. Also documents Idea 2 (unsafe.String zero-alloc approach) as a future defence-in-depth option in a code comment.
On macos-15-intel, two separately-built gopy extensions loaded in the
same Python process can crash with "fatal error: bad sweepgen in refill"
on certain Go versions. The root cause is not yet confirmed: candidate
mechanisms include PLT-based CGo symbol interposition (crosscall2,
_cgo_topofstack, x_cgo_inittls, etc.) and/or dyld global-namespace
deduplication of the ~150 runtime symbols exported by both .so files.
Add a diagnostic step that runs on every macos-15-intel job (pass or
fail) and reports three things:
1. how many dynamic symbols are shared between the two extensions
2. which of the critical CGo bridge symbols appear in the indirect
symbol table (otool -Iv) — the macOS equivalent of JUMP_SLOT/PLT
3. which library wins in the global namespace at runtime (ctypes)
Comparing the output across Go 1.21/1.22 (fail), 1.23/1.24 (pass), and
1.25 (fail) should confirm whether the crash correlates with PLT stub
generation changes between Go versions.
Loading two gopy extensions in the same Python process embeds two independent Go runtimes. On macOS x86_64 / Go ≥1.24 this causes "fatal error: bad sweepgen in refill" (issue #370) when both runtimes run Go code concurrently. Add a process-wide pthread_mutex_t stored as a Python capsule in builtins._gopy_global_mu so every gopy extension in the same interpreter shares the same lock. The generated CGo wrappers: 1. Call gopy_ensure_mu() (lazy init, Python GIL must be held) before releasing the GIL. 2. Release the GIL via PyEval_SaveThread. 3. Acquire the mutex via gopy_lock() — blocking until any other extension's Go call finishes. 4. Release the mutex (gopy_unlock()) before restoring the GIL (PyEval_RestoreThread), avoiding the GIL/mutex deadlock. On Windows the lock/unlock are compiled as no-ops. Fixes #370 / #385.
…erposition When two gopy extensions are loaded in the same Python process via RTLD_GLOBAL, Go runtime data globals (mcache0, allm, mheap_, etc.) from the first-loaded library win in the dynamic-linker global namespace. The second runtime's references to those globals are silently redirected, so both runtimes share the same heap metadata. This corrupts sweep- generation counters and causes: fatal error: bad sweepgen in refill on macOS x86_64 / Go ≥1.24 (the earlier pthread mutex fix serialised user code but could not stop background GC goroutines that also hit the shared globals). Fix: pass a symbol-visibility restriction to the final go build step so that only PyInit__<name> is exported into the global namespace: - macOS: -extldflags=-Wl,-exported_symbols_list,<file> - Linux: -extldflags=-Wl,--version-script,<file> All CGo bridge symbols (crosscall2, _cgo_topofstack, …) remain in the .so and are called directly at link time; they no longer pollute the global namespace and cannot be interposed by a second extension. Fixes #370 / #385.
…position On macOS, Python's default dlopen flags are RTLD_NOW|RTLD_GLOBAL (Py_RTLD_DEFAULT in configure.ac for Darwin). Every .so extension imported via the normal Python import machinery is therefore loaded into the process-wide flat namespace. When two gopy extensions are loaded in the same process, the second extension's Go runtime symbols (TLS keys, mheap_, cgo init pointers) get interposed by the first extension's definitions, causing the two independent Go runtimes to share GC state and triggering 'fatal error: bad sweepgen in refill' (issue #385). The generated Python wrapper now temporarily clears RTLD_GLOBAL before importing the underlying _<pkg>.so, so each extension's Go runtime keeps its own isolated copy of these globals. The original flags are restored immediately after import so the rest of the program is unaffected.
Member
Author
|
Closing this PR, in favor of #391 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #385
Relates-to: #370