Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracemalloc.start() and tracemalloc.stop() race condition #131566

Closed
colesbury opened this issue Mar 21, 2025 · 3 comments
Closed

tracemalloc.start() and tracemalloc.stop() race condition #131566

colesbury opened this issue Mar 21, 2025 · 3 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@colesbury
Copy link
Contributor

colesbury commented Mar 21, 2025

This occurs in both the GIL-enabled build and the free threaded build. It only happens in release builds, because test_tracemalloc.test_tracemalloc_track_race is skipped in debug builds.

Tracemalloc modifies the global "raw" memory allocator. The modification happens under a lock, but other calls to PyMem_RawMalloc and PyMem_RawFree can occur without any locking and without holding the GIL.

I think the "fix" is to just skip the test when running under TSAN:

  • I don't think there's any better fix -- we definitely don't want PyMem_RawMalloc() to require locks
  • The race is relatively "benign" -- I don't think it'll cause any crashes in practice
  • Tracemalloc is primarily a debugging tool

Here's an example stack trace:

WARNING: ThreadSanitizer: data race (pid=3203004)
  Write of size 8 at 0x555555cee008 by main thread:
    #0 __tsan_memcpy <null> (python+0xdff2e) (BuildId: 3f2abce6d83666bdd9fffb9ab44aef8621970a54)
    #1 set_allocator_unlocked /raid/sgross/cpython/Objects/obmalloc.c (python+0x29d9aa) (BuildId: 3f2abce6d83666bdd9fffb9ab44aef8621970a54)
    #2 PyMem_SetAllocator /raid/sgross/cpython/Objects/obmalloc.c:899:5 (python+0x29d9aa)
    #3 _PyTraceMalloc_Start /raid/sgross/cpython/Python/tracemalloc.c:825:5 (python+0x49bdb4) (BuildId: 3f2abce6d83666bdd9fffb9ab44aef8621970a54)
    #4 _tracemalloc_start_impl /raid/sgross/cpython/./Modules/_tracemalloc.c:99:9 (python+0x4d5db8) (BuildId: 3f2abce6d83666bdd9fffb9ab44aef8621970a54)
    #5 _tracemalloc_start /raid/sgross/cpython/./Modules/clinic/_tracemalloc.c.h:111:20 (python+0x4d5db8)
...

  Previous read of size 8 at 0x555555cee008 by thread T1:
    #0 PyMem_RawFree /raid/sgross/cpython/Objects/obmalloc.c:989:32 (python+0x29dda7) (BuildId: 3f2abce6d83666bdd9fffb9ab44aef8621970a54)
    #1 pythread_wrapper /raid/sgross/cpython/Python/thread_pthread.h:240:5 (python+0x498861) (BuildId: 3f2abce6d83666bdd9fffb9ab44aef8621970a54)

PyMem_SetAllocator(PYMEM_DOMAIN_RAW, &alloc);

static void *
pythread_wrapper(void *arg)
{
/* copy func and func_arg and free the temporary structure */
pythread_callback *callback = arg;
void (*func)(void *) = callback->func;
void *func_arg = callback->arg;
PyMem_RawFree(arg);
func(func_arg);
return NULL;
}

See also:

Linked PRs

colesbury added a commit to colesbury/cpython that referenced this issue Mar 21, 2025
The test has data race when setting the global "raw" memory allocator.
@picnixz picnixz added type-bug An unexpected behavior, bug, or error interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Mar 22, 2025
colesbury added a commit that referenced this issue Mar 22, 2025
The test has data race when setting the global "raw" memory allocator.
@picnixz
Copy link
Member

picnixz commented Mar 22, 2025

Do you want to keep this one open even if you're skipping the test or do you want to close it?

Dumb idea but can tracemalloc change the implementation of the raw allocator to switch to a locked raw allocator?

@colesbury
Copy link
Contributor Author

Thanks - I wanted to close the issue.

@colesbury
Copy link
Contributor Author

colesbury commented Mar 22, 2025

Dumb idea but can tracemalloc change the implementation of the raw allocator to switch to a locked raw allocator?

The race is in the load of the allocator function pointer _PyMem_Raw.malloc, which is a void* (*malloc) (void *ctx, size_t size):

cpython/Objects/obmalloc.c

Lines 989 to 1001 in 18249d9

void *
PyMem_RawMalloc(size_t size)
{
/*
* Limit ourselves to PY_SSIZE_T_MAX bytes to prevent security holes.
* Most python internals blindly use a signed Py_ssize_t to track
* things without checking for overflows or negatives.
* As size_t is unsigned, checking for size < 0 is not required.
*/
if (size > (size_t)PY_SSIZE_T_MAX)
return NULL;
return _PyMem_Raw.malloc(_PyMem_Raw.ctx, size);
}

This races with tracemalloc swapping the allocator pointer. Tracemalloc already performs locking inside of tracemalloc_raw_malloc (which is its implementation of _PyMem_Raw.malloc), but that's too late -- that happens after the function pointer is loaded.

I don't think the race will cause any problems in practice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants