Fix mixed-allocator heap corruption in C tokenizer entity path#356
Open
gistrec wants to merge 1 commit into
Open
Fix mixed-allocator heap corruption in C tokenizer entity path#356gistrec wants to merge 1 commit into
gistrec wants to merge 1 commit into
Conversation
The C tokenizer's common.h redefines malloc/realloc/free to the PyObject_* family but left calloc pointing at libc. Entity parsing in Tokenizer_really_parse_entity allocates with calloc and later releases the buffer with free (= PyObject_Free), which is undefined behavior: under PYTHONMALLOC=debug it aborts in _PyMem_DebugRawFree, and otherwise it can silently corrupt the heap. Route calloc through PyObject_Calloc alongside the other allocator macros so both calloc sites match the surrounding free. Fixes earwig#352.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #352.
The C tokenizer was mixing allocators in the entity parsing path.
common.hmacro-redefinesmalloc,realloc, andfreeto Python's object allocator functions, butcallocwas not redefined. As a result,Tokenizer_really_parse_entity()allocated buffers with libccalloc()and later released them through the macro-expandedfree()→PyObject_Free().That allocator mismatch is undefined behavior. With
PYTHONMALLOC=debug, it aborts immediately with_PyMem_DebugRawFree: bad ID; without debug malloc, it can silently corrupt the heap.Root cause
src/mwparserfromhell/parser/ctokenizer/common.hcurrently redirects only:But
Tokenizer_really_parse_entity()usescalloc(...)for entity parsing buffers. Those buffers are later released withfree(...), which expands toPyObject_Free(...). This passes a libc-allocated pointer to Python's object allocator.Reproduction
Before this change:
PYTHONMALLOC=debug python -c "import mwparserfromhell; mwparserfromhell.parse('{{T|p=a & b}}')"Actual result:
Expected result: parsing succeeds and returns a
Wikicodeobject without corrupting the heap.The issue is triggered by inputs that go through the entity parsing path, for example:
Fix
Add
callocto the same allocator macro block:This makes the existing
callocsites use Python's allocator consistently with the surroundingfree()calls.Tests
PYTHONMALLOC=debug, so future allocator mismatches fail deterministically instead of becoming silent heap corruption.SIGABRTand_PyMem_DebugRawFree: bad ID.python -m pytest tests/— 2006 passed, 1 skipped.PYTHONMALLOC=debug python -m pytest tests/— 2006 passed, 1 skipped.