Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-73069: fix race conditions in os.scandir and associated direntry #131642

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

duaneg
Copy link

@duaneg duaneg commented Mar 24, 2025

This fixes the first two (c1 & c2) crashes from gh-73069. Let me know if I should add a news entry.

@bedevere-app
Copy link

bedevere-app bot commented Mar 24, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • A blurb/news entry makes sense here.
  • Mutexes are prone to re-entrancy deadlocks in cases where we call back into Python, e.g., in an object's finalizer. We should be using critical sections, probably alongside argument clinic.

@duaneg
Copy link
Author

duaneg commented Mar 24, 2025

  • A blurb/news entry makes sense here.

Righto, will add.

  • Mutexes are prone to re-entrancy deadlocks in cases where we call back into Python, e.g., in an object's finalizer. We should be using critical sections, probably alongside argument clinic.

Hmm, it seems re-entrancy is implicit in a lot of innocuous looking calls: when I first looked through the code it didn't seem like it could be re-entrant, but looking more closely I guess even something like PyType_GetModule could theoretically call into Python code, right?

Regardless, I don't think a critical section works for this bug: these methods are making blocking IO calls, which will release that lock. Note that these crashes occur on a GIL build, which AIUI means critical section cannot be an appropriate fix (since it is a no-op there). FWIW I did try it quickly and confirmed it still crashes with closedir and iternext protected by a critical section (in GIL and free-threading builds).

Would using a _PyRecursiveMutex instead of a PyMutex be an acceptable alternative? I'll push out a new version using that shortly.

duaneg added 4 commits March 25, 2025 12:31
It seems we do need to worry about re-entrancy, so switch to using a recursive
mutex. Note we can't use a critical section since we need to perform blocking
IO while holding the lock, which critical section would implicitly release.
@ZeroIntensity
Copy link
Member

Would using a _PyRecursiveMutex instead of a PyMutex be an acceptable alternative? I'll push out a new version using that shortly.

Maybe, but we should be very cautious with it. _PyRecursiveMutex is safe in the re-entrant case, but it's still prone to lock-ordering deadlocks. Imagine a case where:

  • Thread A holds the recursive lock, and calls into Python via a finalizer.
  • Thread A begins waiting on a semaphore. The recursive lock is not released!
  • Thread B is waiting on the recursive lock, but it's also the same thread that unblocks the semaphore.
  • Deadlock :(

We might have to resort to some atomic compare-exchange shenanigans if mutexes prove to be too difficult.

@duaneg
Copy link
Author

duaneg commented Mar 25, 2025

Would using a _PyRecursiveMutex instead of a PyMutex be an acceptable alternative? I'll push out a new version using that shortly.

Maybe, but we should be very cautious with it. _PyRecursiveMutex is safe in the re-entrant case, but it's still prone to lock-ordering deadlocks.

Ah, true! Unfortunately I don't think using atomics will help though: we ultimately must prevent concurrent calls to either or both of readdir and closedir and the semantics of the API require blocking behaviour, even if it is implemented by spinning with atomics. I think for this to be safe the code holding the lock must not be re-entrant.

Looking again at the code, I think it can be refactored so the section holding the lock does not call anything that could possibly call back into Python. Then we can use a standard non-recursive mutex. I'll work on that now.

@duaneg
Copy link
Author

duaneg commented Mar 25, 2025

After playing with various options I think the c1 and c2 fixes require different approaches.

The c1 fix is more-or-less as described above: refactor to hold the lock only while retrieving the next entry, ensuring that no re-entrancy is possible. The only wrinkle is the direntry result from readdir needs to be copied into a buffer while holding the lock, as closedir on the directory it was reading frees it.

Note the Windows and POSIX code is looking very similar after all that. I think with a bit of additional refactoring it should be possible for them to share most of the same code and eliminate some duplication between #ifdef blocks. If this sounds useful I will carry on with that.

A similar approach doesn't really work for the c2 fix, however. In this case I think atomic compare/exchange is the better option: in a race there may be duplicated stat calls, but isn't really avoidable and the losers will be correctly cleaned up instead of leaking.

Should I rebase these commits to get rid of the introduced-and-then-removed failed attempts?

@ZeroIntensity
Copy link
Member

Should I rebase these commits to get rid of the introduced-and-then-removed failed attempts?

Nope, we squash everything at the end anyway. Force-pushing and getting rid of commits just makes reviewing more difficult :)

@duaneg
Copy link
Author

duaneg commented Mar 26, 2025

The new commit is just a little bit of additional refactoring to unify some of the Windows/POSIX code and minimise #ifdef code blocks. I'm not sure if this sort of semi-related cleanup-in-passing is considered helpful or unnecessary churn/risk here: if it is unhelpful (or should be done in a separate PR) just let me know and I'll drop/move it.

…(the final

filename field being an unsized array) and in actual practice on the WASI
platform. Thus just copying one into a stack-allocated structure is not good
enough.

The obvious way to address this is to allocate memory on the heap and return
that. However, it seems wasteful to dynamically allocate a temporary chunk of
data just to copy some data in and then out again.

Instead, if the name field is potentially not long enough, allocate some extra
space *on the stack* following the structure such that it must be sufficient to
hold the the full record length (and copy all of that, not just the potentially
smaller fixed structure size).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants