Skip to content

[FEAT][Python] Tie Python wrapper lifetime to underlying C++ FFI object#593

Open
cyx-6 wants to merge 6 commits into
apache:mainfrom
cyx-6:pyobject
Open

[FEAT][Python] Tie Python wrapper lifetime to underlying C++ FFI object#593
cyx-6 wants to merge 6 commits into
apache:mainfrom
cyx-6:pyobject

Conversation

@cyx-6

@cyx-6 cyx-6 commented May 19, 2026

Copy link
Copy Markdown
Contributor

Summary

Make a.x is a.x, id(a.x) stable across drop+refetch, and f(x) is x for FFI returns, by attaching a 16-byte custom-allocator header to every Object allocation. Implements Tianqi Chen's PyObjectTying doc.

Before:

a = MyClass(Inner(...))
assert a.x is a.x          # False — fresh wrapper per attribute access
assert id(a.x) == id(a.x)  # flaky

After: both hold, and identity is preserved across a wrapper finalize-and-revive cycle whenever the C++ object outlives the wrapper.

Design

A two-layer allocator hook lives in core libtvm_ffi:

  • TVMFFIObjectAllocHeader { delete_space } — 8-byte base header preceding every Object body.
  • TVMFFICustomAllocator { allocate, context } — process-wide registry; libtvm_ffi installs a builtin default at registry init so TVMFFIGetCustomAllocator never returns NULL.
  • TVMFFIGetCustomAllocator / TVMFFISetCustomAllocator — frontends override the default at module load.

The Python Cython module overrides the global default with TVMFFIPyAllocate, which prepends a 16-byte PyCustomAllocHeader { PyObject* cached_pyobj; base; } encoding the wrapper binding. The Rust crate (ObjectArc::new[_with_extra_items]) and the Python-defined types in extra/dataclass.cc route through the same registry, so layout and lifetime semantics are uniform across frontends.

The Active vs Inactive distinction lives on the wrapper's own chandle field rather than on a tag bit, so the header layout stays pointer-equality clean:

State header.cached_pyobj wrapper.chandle Meaning
Detached NULL No wrapper bound
Active wrapper chandle Live canonical wrapper; wrapper owns +1
Inactive wrapper NULL Wrapper bytes preserved (phantom +1); chandle kept alive only by other C++ holders

The state machine and four transition helpers (TVMFFIPyTryGetAttachedPyObject / TVMFFIPyAttachPyObject / TVMFFIPyDetachPyObject / TVMFFIPyTPFinalize) are concentrated in python/tvm_ffi/cython/tvm_ffi_python_helpers.h. Three invariants pin the contract:

  • I1. When a PyObject goes out of scope, its +1 on chandle is always released.
  • I2. When a chandle is destroyed, its cached PyObject (if any) is reclaimed.
  • I3. wrapper.chandle != NULL implies the wrapper owns +1 on that chandle.

Implementation highlights

  • Universal cache-on for make_ret — every FFI return funnels through make_ret_object, which returns the canonical wrapper for a chandle when one exists. Combined with Inactive -> Active revival, this delivers a.x is a.x, stable id(a.x) across drop+refetch, and f(x) is x for matching returns.
  • Active -> Inactive via tp_finalize — Cython def __del__ maps to tp_finalize (PEP 442). When other C++ holders keep the chandle alive past Python refcount 0, TVMFFIPyTPFinalize nulls self.chandle before DECREFing the chandle (preserving I3), then Py_IncRef(self) to resurrect the PyObject. cached_pyobj is left pointing at the surviving wrapper for future revival. Cython auto-generates tp_dealloc, which now enters with the self.chandle == NULL invariant established by tp_finalize.
  • No SABI — dropped Python Limited API and the CYTHON_USE_TP_FINALIZE shim.
  • _move() semantics preserved — the rvalue setter eager-detaches the canonical-wrapper binding before the C++ AnyViewToOwnedAny transfer nulls the source chandle, so a downstream cache lookup never sees a stale back-pointer.
  • Frontend detectionTVMFFIPyIsCanonical compares delete_space against TVMFFIPyDeleteSpace to recognize Python-allocated chandles, avoiding a flag bit on TVMFFIObject. Pre-Python-init chandles (statically-initialized global functions in libtvm_ffi.so) carry only the base header and skip the binding install.
  • Shutdown guardTVMFFIPyMarkPythonFinalizing is wired to atexit and flips a std::atomic<bool> flag; TVMFFIPyDeleteSpace checks it (via TVMFFIPyIsPythonAlive) before PyGILState_Ensure to avoid touching a teardown interpreter.

Test plan

  • Full Python suite passes (2332 passed, 19 skipped, 2 xfailed)
  • Full Rust suite passes
  • New tests/python/test_pyobject_tying.py covers Active/Inactive transitions, cache-on aliasing, _move() under cache-on, pickle stress, threading stress, GC integration with the Inactive state, multi-chandle isolation, and the weakref limitation
  • test_function.py::test_rvalue_ref refactored for cache-on aliasing semantics
  • CI: lint, clang-tidy, doc, C++/Python/Rust on Linux x86_64 + aarch64, macOS arm64, Windows AMD64

Out-of-scope follow-ups

  • Name-keyed dict cache for _get_global_func to deliver id-stability for static-init Functions (whose chandles predate the Python allocator and so carry only the base header). Tracked as a TODO in function.pxi::_get_global_func.

@cyx-6 cyx-6 marked this pull request as draft May 19, 2026 10:32

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces 'PyObjectTying,' a mechanism to link the lifetime of Python wrappers to their underlying C++ FFI objects. By prepending a PyCustomizeAllocHeader to object allocations, the implementation enables stable object identity and address preservation across wrapper finalization cycles. The changes include modifications to the C++ SimpleObjAllocator, Cython bindings to utilize tp_finalize, and updates to the reflection system. Feedback from the review highlights a critical thread-safety issue in the callback used to free cached wrapper memory, which lacks GIL protection, and a potential memory leak when installing handles if a previous phantom wrapper is not correctly released.

Comment thread python/tvm_ffi/cython/tvm_ffi_python_helpers.h Outdated
Comment thread python/tvm_ffi/cython/object.pxi Outdated
@cyx-6 cyx-6 force-pushed the pyobject branch 2 times, most recently from 435caeb to 1b751b2 Compare May 24, 2026 13:03
Comment thread include/tvm/ffi/c_api.h Outdated
Comment thread python/tvm_ffi/cython/tvm_ffi_python_helpers.h Outdated
Comment thread python/tvm_ffi/cython/tvm_ffi_python_helpers.h Outdated
Comment thread python/tvm_ffi/cython/tvm_ffi_python_helpers.h Outdated
Comment thread include/tvm/ffi/c_api.h Outdated
Comment thread include/tvm/ffi/c_api.h
Comment thread python/tvm_ffi/cython/tvm_ffi_python_helpers.h Outdated
Comment thread python/tvm_ffi/cython/tvm_ffi_python_helpers.h Outdated
Comment thread src/ffi/custom_allocator.cc Outdated
Comment thread python/tvm_ffi/cython/tvm_ffi_python_helpers.h Outdated
Comment thread python/tvm_ffi/cython/tvm_ffi_python_helpers.h Outdated
Make `a.x is a.x`, `id(a.x)` stable across drop+refetch, and `f(x) is x`
for FFI returns by attaching a 16-byte custom-allocator header to every
Object allocation. Implements the PyObjectTying design.

Design:

- Two-layer custom-allocator hook in core libtvm_ffi.so:
  `TVMFFIObjectAllocHeader { delete_space }`,
  `TVMFFICustomAllocator { allocate, context }`, plus
  `TVMFFIGetCustomAllocator` / `TVMFFISetCustomAllocator`. libtvm_ffi
  installs a builtin default at registry init, so every Object always
  carries the 8-byte base header and `TVMFFIGetCustomAllocator` never
  returns NULL. The Python Cython module overrides the global default
  at module load with `TVMFFIPyAllocate`, which prepends a 16-byte
  `PyCustomAllocHeader` encoding the Python wrapper binding.
  `make_object` / `make_inplace_array_object` / `PyClassDeleter` and
  the Rust `ObjectArc::new[_with_extra_items]` paths all funnel through
  the registry, so Python-defined types and Rust-allocated objects
  share the layout and lifetime semantics.

- State machine concentrated in `tvm_ffi_python_helpers.h`.
  `PyCustomAllocHeader` is `{ PyObject* cached_pyobj; base; }` (16 B).
  The Active vs Inactive distinction lives on the wrapper's own
  `chandle` field, not on a tag bit, so the header layout stays
  pointer-equality clean:
    Detached:  cached_pyobj == NULL                                   (no wrapper)
    Active:    cached_pyobj == self  AND  self.chandle == chandle     (wrapper owns +1)
    Inactive:  cached_pyobj == self  AND  self.chandle == NULL        (preserved bytes)
  Four helpers expose the transitions:
  `TVMFFIPyTryGetAttachedPyObject`, `TVMFFIPyAttachPyObject`,
  `TVMFFIPyDetachPyObject`, `TVMFFIPyTPFinalize`.

- Universal cache-on for `make_ret`. Every FFI return funnels through
  `make_ret_object`, which returns the canonical wrapper for a chandle
  when one exists. Combined with Inactive -> Active revival on the
  cached path, this delivers `a.x is a.x`, stable `id(a.x)` across
  drop+refetch, and `f(x) is x` when a function returns the same
  chandle it received.

- Inactive state via Cython `def __del__` on `CObject`, mapping to
  `tp_finalize` (PEP 442). When other C++ holders keep the chandle
  alive past the wrapper's Python refcount hitting 0,
  `TVMFFIPyTPFinalize` nulls `self.chandle` BEFORE DECREFing the
  chandle, then `Py_IncRef(self)` to resurrect the PyObject.
  `cached_pyobj` is unchanged so a future re-fetch can rebind. Cython
  auto-generates `tp_dealloc`, which now enters with the
  `self.chandle == NULL` invariant established by `tp_finalize`.
  Dropped the Py_LIMITED_API / SABI shim and the
  `CYTHON_USE_TP_FINALIZE` define.

- Shutdown guard. `TVMFFIPyMarkPythonFinalizing` is wired to atexit
  from Cython module init, flipping a `std::atomic<bool>` flag read
  by `TVMFFIPyDeleteSpace` (via `TVMFFIPyIsPythonAlive`) before
  `PyGILState_Ensure` to avoid GIL acquire after Python finalization
  has begun. Inactive wrapper bytes on chandles still alive at that
  point are intentionally leaked -- process is exiting.

- Frontend-allocation detection by `delete_space` pointer comparison
  (`TVMFFIPyIsCanonical`) avoids a flag bit on `TVMFFIObject`.
  Pre-Python-init chandles (statically-initialized global functions in
  libtvm_ffi.so) carry only the base header; the Python side detects
  this and skips the binding install.

`_move()` semantics: callback args alias the caller's wrapper under
universal cache-on (one wrapper, one chandle ref) and FFI function
returns of the same chandle alias the caller's wrapper. `_move()` is
kept as an API: the rvalue setter on either caller or callback side
eager-detaches the canonical-wrapper binding before the C++
AnyViewToOwnedAny transfer nulls the source chandle.
`test_function.py::test_rvalue_ref` is refactored for the new
aliasing-aware use_count expectations.

Tests: full Python suite passes (2332 passed, 19 skipped, 2 xfailed);
Rust suite passes. New `tests/python/test_pyobject_tying.py` covers
Active/Inactive transitions, cache-on aliasing, `_move()` under
cache-on, pickle stress, threading stress, GC integration with the
Inactive state, multi-chandle isolation, and the weakref limitation.

TODO carried in `function.pxi::_get_global_func`: name-keyed dict
cache for static-init Function id-stability. Most registry-resident
Functions are allocated at C++ static init (before the Python
allocator is registered) so their chandle has only the base header --
make_ret_object's cache can't reach them. Deferred to keep this change
small.
@cyx-6 cyx-6 marked this pull request as ready for review May 28, 2026 18:10
Comment thread python/tvm_ffi/cython/object.pxi Outdated
if self.chandle != NULL:
CHECK_CALL(TVMFFIObjectDecRef(self.chandle))
self.chandle = NULL
def __del__(self):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's double check and test it out, making sure this __del__ method will be properly invoked. I had some really negative experience with overloading __del__ because it could possibly prevent Python GC from recycling memory in certain cases. Not 100% sure if it applies to Cython's cdef class's __del__ method though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a tp_finalize-free version is updated. we are now using tp_alloc + tp_free as alternative.

cls = make_fallback_cls_for_type_index(type_index)
obj = cls.__new__(cls)
(<CObject>obj).chandle = result.v_obj
TVMFFIPyAttachPyObject(result.v_obj, <PyObject*>obj)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment, this is a new class, so it must be in detached state, attach it

dlpack.deleter(dlpack)
pass
# default return the tensor
# default return the tensor.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: we don't try to attach py object for tensor types as it may appear in callback and there maybe translation depending on the context

// ----------
// I1. When a PyObject goes out of scope (no Python var refers to it),
// its +1 on chandle is always released.
// I2. When a chandle is destroyed, its cached PyObject (if any) is

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a chandle is destroyed, we can only be in Detached or Inactive state, if it is in Inactive state, the PyObject is reclaimed

// ``self.chandle`` and DecRef.
// Detached -> Detached : ``self`` is not the canonical wrapper for
// this chandle (eager-detach via move already
// cleared the binding). Just DecRef.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need clarification here, since Detached menas chandle == NULL

*/
TVM_FFI_INLINE void TVMFFIPyTPFinalize(void** ptr_to_chandle, PyObject* wrapper) {
void* chandle = *ptr_to_chandle;
if (chandle == nullptr) return;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment Detached

*
* Detached -> Detached: ``cached_pyobj`` was already cleared by an
* eager move, or chandle was not allocated through the Python custom
* allocator. Just null ``wrapper.chandle`` and DecRef.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to DecRef here? just return and pass to dealloc

Refine the Python-wrapper-to-C++-object tying mechanism introduced in the
prior commit. Move wrapper cleanup from tp_finalize (__del__) to tp_dealloc
(__dealloc__), and revive an Inactive (dead-but-cached) allocation in place
so a re-fetched wrapper keeps a stable id() at the same address.

- object.pxi: drive cleanup from __dealloc__ via TVMFFIPyTpDealloc; rework
  make_ret_object into the cache-&-revive dispatcher (LIVE HIT drops the
  caller's redundant +1; REVIVAL reuses the inactive allocation via the
  thread-local revive slot); install the custom tp_alloc / tp_free slots for
  every registered FFI type at the _update_registry choke point.
- helpers.h: two-output TVMFFIPyTryGetAttachedPyObject(chandle, &out) plus
  TVMFFIPyIsDetached; document the tagged_pyobj state machine (Detached /
  Active / Inactive), its invariants, and where each transition happens.
- base.pxi / function.pxi: track the renamed externs.
- tests: add TestNoUnboundedLeak (one pinned wrapper per distinct chandle
  must not accumulate) and trim heavy iteration counts on the tying tests.
@junrushao

Copy link
Copy Markdown
Member

Btw - does this approach also work for String and Bytes? I’m asking because I wasn’t sure if small string optimization is compatible with this design

@cyx-6

cyx-6 commented Jun 1, 2026

Copy link
Copy Markdown
Contributor Author

@junrushao Not working for String and Bytes. For small ones, they are not heap-allocated. For large ones, they are not sub-classes of CObject. Both are out of the scope of this PR.

TVMFFIObjectAllocHeader base;
};

constexpr size_t kPyHeaderOffset = sizeof(PyCustomAllocHeader); // 16

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try inline

* need ``PyObject_InitVar(.., nitems)`` here and a matching basicsize check.
*/
inline PyObject* TVMFFIPyTpAlloc(PyTypeObject* type, Py_ssize_t nitems) {
void* blk = TVMFFIPyTlsReviveSlot();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use std::swap

* count). Per-thread -> free-threading safe; strict read-and-clear.
*/
inline void*& TVMFFIPyTlsReviveSlot() {
static thread_local void* slot = nullptr;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyObject*

* ``TVMFFIPyTpAlloc`` (which is handed only ``type`` and an item
* count). Per-thread -> free-threading safe; strict read-and-clear.
*/
inline void*& TVMFFIPyTlsReviveSlot() {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLS

// stable wrapper class for the life of the process, and ``make_ret_object``
// derives both the cached allocation (from the chandle) and ``type`` (= cls for that
// same type_index) from the very same chandle on the revival path.
assert(type->tp_itemsize == 0 &&

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static PyObject* custom_tp_alloc(PyTypeObject *type, Py_ssize_t nitems) {
    PyCustomizeAllocHeader *header = get_current_header();

    // Hot Path: Reuse
    if (header && header->cached_mem) {
        PyObject *obj = header->cached_mem;
        
        // Inline-friendly initialization
        PyObject_VarInit((PyVarObject *)obj, type, nitems);
        
        // Only memset the delta between header and body
        // This is where your real time is spent
        size_t head_sz = (nitems > 0 || type->tp_itemsize > 0) ? sizeof(PyVarObject) : sizeof(PyObject);
        memset((char*)obj + head_sz, 0, type->tp_basicsize - head_sz);

        if (PyType_IS_GC(type)) {
            PyObject_GC_Track(obj);
        }
        return obj;
    }

    // Cold Path: Slow allocation
    return PyType_GenericAlloc(type, nitems);
}

* on, GC or not.
*/
inline void TVMFFIPyTpFree(void* self) {
if (reinterpret_cast<TVMFFIPyCObjectHead*>(self)->chandle == TVM_FFI_PY_INACTIVE_SENTINEL) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a runtime offset check test case

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

void** TVMFFICyObjectGetCHandlePtr(PyObject* ptr)


cdef void** TVMFFICyObjectGetCHandlePtr(PyObject* ptr):
   return &((<CObject>ptr).chandle)

TVM_FFI_INLINE void TVMFFIPyTpDealloc(void** ptr_to_chandle, PyObject* wrapper) {
void* chandle = *ptr_to_chandle;
// NULL: already detached/moved. INACTIVE_SENTINEL: an inactive cached
// allocation must never re-enter __dealloc__ while inactive (defensive).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move TVMFFIPyTpDealloc after TVMFFIPyTpAlloc, TVM_FFI_PY_INACTIVE_SENTINEL/TVMFFIPyTpDealloc/ TVMFFIPyTpFree/ sit together, document the overall logic in a section, TVM_FFI_PY_INACTIVE_SENTINEL=> TVM_FFI_PY_CHANDLE_INACTIVE_SENTINEL

* header. Just null ``wrapper.chandle`` and DecRef.
*/
TVM_FFI_INLINE void TVMFFIPyTpDealloc(void** ptr_to_chandle, PyObject* wrapper) {
void* chandle = *ptr_to_chandle;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document, after dealloc, there are two possibilities:

  • chandle is set to CHANDLE_INACTIVE_SENTINEL, tpfree will not be triggered
  • other cases, normal free

* time); the only cost is no stable-id-across-drop.
*/
TVM_FFI_INLINE bool TVMFFIPyIsInactiveEligible(PyObject* wrapper) {
PyTypeObject* tp = Py_TYPE(wrapper);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cross check

*
* The (return, ``*out``) pair has four combinations; one cannot occur:
* (true, non-NULL) : Active -- the live canonical wrapper.
* (false, NULL) : Detached / non-Python -- no wrapper bound.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document

\return whether returned PyObject state is active

cyx-6 added 3 commits June 4, 2026 09:58
…ers.h

Rework the PyObject-tying dealloc documentation to describe the current
race-free logic concisely rather than re-deriving the resolved TOCTOU:

- Trim the handshake overview to core flow; keep the Case 0/1/2 anchors.
- Unify terminology on Inactive / cached allocation (drop husk / park).
- Fold the DEALLOC_TRANSIT transience note into invariant I3; simplify the
  macro doc and the TVMFFIPyCObjectHead mirror comment.
- Deduplicate the three callback docstrings against the overview and the
  transition map; restore the dropped non-canonical sub-case in TpDealloc.
- Remove the dead defensive 'chandle == DEALLOC_TRANSIT' guard branch in
  TVMFFIPyTpDealloc (the marker never outlives its own dealloc chain) and
  document the relied-upon invariant.

Test: tighten live_inner_count to 'type(ob) is InnerLeak' (exact-type match).
Apply Tianqi's PR apache#593 review feedback to the pyobject-tying layer:

- Drop the kPyHeaderOffset constant; use sizeof(PyCustomAllocHeader)
  directly at every site (and in the static_asserts).
- Rename TVMFFIPyTlsReviveSlot -> TVMFFIPyTLSReviveSlot and make its sole
  access primitive a swap-exchange (store next, return prior). The slot is
  typed PyObject*, and taking the block is TVMFFIPyTLSReviveSlot(nullptr) --
  read-and-clear in one step, no separate clear to forget.
- Document \param/\return on TVMFFIPyTryGetAttachedPyObject.
- Reorder the slot functions into execution order (TpAlloc -> TpDealloc ->
  TpFree) with the DEALLOC_TRANSIT marker declared between them.
- Replace the hand-rolled TVMFFIPyCObjectHead layout mirror with a Cython
  cdef public accessor TVMFFICyObjectGetCHandlePtr, so the chandle field
  offset comes from the real cdef class layout instead of a hardcoded one.

Build clean; full Python suite passes (2336 passed, 18 skipped, 2 xfailed).
Resolve the lint job failures on PR apache#593:

- ruff-check: add one-line docstrings to test methods (D102), hoist
  threading/weakref imports to top-level (PLC0415).
- ty: annotate the rvalue-move callback params as `Any` instead of
  `object` so `x._move()` resolves (matches test_function.py).
- ruff-format / clang-format / cmake-format: apply formatter reflow.
- taplo-format: expand the cibuildwheel `build` array to multi-line in
  pyproject.toml (applied from CI log; taplo has no wheel for this
  platform).

Reproduced locally via pre-commit (taplo-format skipped).
@junrushao

Copy link
Copy Markdown
Member

Any updates on this issue?

…gn>8 double-free

The pyobject-tying state machine stores a wrapper back-pointer in each
object's alloc header and mutates it with no synchronization -- correct
only under the GIL. On free-threaded CPython (3.14t) concurrent attribute
access race-frees the wrapper (segfault in TestThreadingStress).

Disable tying wholesale on free-threaded builds at its two one-time setup
gates, under #ifdef Py_GIL_DISABLED: TVMFFIPyRegisterDefaultAllocator does
not install the custom allocator and TVMFFIPyInstallTypeSlots no-ops. With
the builtin allocator left in place every chandle is non-canonical, so all
state-machine helpers take their existing no-op tails -- no Cython change,
GIL build byte-identical. The cost on free-threaded is losing identity
stickiness (a.x is a.x / stable id() / f(x) is x), not correctness.

This uncovered a latent double-free in BuiltinDefaultAllocate /
BuiltinDefaultDeleteSpace: allocate offset the body by round_up(header,
alignment) but free subtracted a fixed sizeof(header), symmetric only for
alignment <= 8. Reflection dataclasses request alignof(max_align_t)=16, so
a 16-aligned object placed at base+16 was freed at base+8 -> free(): invalid
pointer. It was masked on the GIL build because the symmetric custom
allocator shadows the builtin one for all Python objects. Use a fixed body
offset = alignof(max_align_t), symmetric for every alignment.

Mark identity-stickiness tests skipif(free-threaded); value / no-crash /
no-leak / distinctness tests stay active.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants