Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std::process::exit is not thread-safe in combination with C code calling exit #126600

Open
teskje opened this issue Jun 17, 2024 · 156 comments
Open
Labels
C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@teskje
Copy link

teskje commented Jun 17, 2024

Calling exit concurrently from Rust and C (or from 2 different copies of the Rust runtime in the same binary) is UB. This is permitted by the C standard, and can actually cause trouble in practice since not all versions of all libc implementations do proper locking when traversing the list of atexit functions. For programs only using Rust, we mitigated this by adding a lock, but this only helps when all calls to exit come from Rust (and they must come from the same Rust runtime). Note that returning from main is equivalent to calling exit so it can also cause this issue.

Current status

Mitigation on our end (only covers pure Rust programs): #126606

The current status is that the specification language stems from a time when C did not specify multithreading and the intent was to forbid reentrancy. External discussion indicates that there's intent to fix it both in libc implementations and the specs:

Original bugreport (partially outdated)

The current implementation of std::process::exit is unsound on Linux and possibly related platforms where it defers to libc's exit() function. exit() is documented to be not thread-safe, and hence std::process::exit is not thread-safe either, despite being a non-unsafe function.

To show that this isn't just a theoretical problem, here is a minimal example that segfaults on my machine (Ubuntu with glibc 2.37):

use std::thread;

fn main() {
    for _ in 0..32 {
        unsafe { libc::atexit(exit_handler) };
    }
    for _ in 0..2 {
        thread::spawn(|| std::process::exit(0));
    }
}

extern "C" fn exit_handler() {
    thread::sleep_ms(1000);
}

The example contains unsafe code, but only to install exit handlers. AFAICT nothing about the libc::atexit call is unsafe. The UB is introduced by calling std::process::exit concurrently afterwards.

If you are curious, https://github.com/MaterializeInc/database-issues/issues/6528 lays out what causes the segfault (it's a use-after-free). That's not terribly relevant though, given that glibc has no obligation to ensure exit() is thread safe when it's clearly documented not to be. Instead, Rust should either mark std::process::exit as unsafe (which is something it could do for the 2024 edition), or introduce locking to ensure only a single thread gets to call exit() at a time.

@teskje teskje added the C-bug Category: This is a bug. label Jun 17, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jun 17, 2024
@teskje
Copy link
Author

teskje commented Jun 17, 2024

This has been discussed previously in #83994. There the decision was to not do anything about the concerns with std::process::exit, mostly based on the reasoning that exit() is thread safe. The above example shows it's clearly not, neither according to the documentation nor in the implementation.

Given that I just spend the better part of a weekend debugging a segfault that was caused by concurrent std::process::exit invocations I feel quite strongly that this issue deserves reconsideration.

@ChrisDenton
Copy link
Member

On that thread joshtriplett notes:

The same arguments apply here as they do with environment handling: some types of unsynchronized changes to the environment from C code could race with an otherwise-synchronized one from Rust on some platforms, but we don't mark the corresponding functions in std::env as unsafe.

We do now mark std::env::set_var as unsafe so maybe thinking on this has changed. Though I'm not sure how you would mark returning from main as being unsafe?

@conradludgate
Copy link
Contributor

conradludgate commented Jun 17, 2024

Linux libc documentation does claim that exit() is thread-unsafe: "MT-Unsafe race:exit", although the original thread suggests that all libc implementations introduce appropriate locks. Perhaps introducing a Once before calling libc::exit is sufficient and not a performance blocker?

@the8472
Copy link
Member

the8472 commented Jun 17, 2024

Doesn't solve the problem with exit getting called from non-rust code. And no, an atexit handler won't help since it leaves a race-window and if you're in a situation where threads concurrently call exit you're already racing.

https://github.com/MaterializeInc/database-issues/issues/6528 sounds like it's either a libc bug or the assessment from the previous thread that libc implementations provide the desired behavior - even if the standard language doesn't - needs to be revised.

@ChrisDenton
Copy link
Member

I'd guess the safe thing to do is kill all other threads then call exit.

@the8472
Copy link
Member

the8472 commented Jun 17, 2024

Not sure if serious...

Killing threads would make anything that accesses their stacks UB. Freezing them would work but that's difficult to implement reliably. Libc would be in a position to do it since they control pthreads. But then they might as well fix their locking.

@tbu-
Copy link
Contributor

tbu- commented Jun 17, 2024

Adding 32 atexit handlers seems to be essential to make this segfault on my glibc 2.39+r52+gf8e4623421-1 on Arch Linux. With 31 atexit handlers, it doesn't segfault.

@tbu-
Copy link
Contributor

tbu- commented Jun 17, 2024

I'd say it makes sense to introduce a lock on the Rust side, since pure-Rust code shouldn't be able to cause UB by calling libc's exit on different threads.

This also means that Rust cdylibs mustn't call std::process::exit since they might not be the only language running, and applications must ensure that no foreign functions call exit. The latter should be a bug anyway due to the C standard saying that multiple calls to exit are UB.

@teskje
Copy link
Author

teskje commented Jun 17, 2024

https://github.com/MaterializeInc/database-issues/issues/6528 sounds like it's either a libc bug or the assessment from the previous thread that libc implementations provide the desired behavior - even if the standard language doesn't - needs to be revised.

Based on how the glibc code is set up it very much looks like exit isn't intended to be thread safe. It uses a lock, but releases it every time it calls one of the exit handlers. That'd make no sense if it wanted to prevent other threads from modifying the list of exit handlers concurrently. The lock is useful for preventing corruption in the face of other threads calling atexit in parallel. There is a comment that says that this is its intended purpose (though admittedly it could be clearer).

I think the previous assumption that glibc's exit is thread-safe was a misconception based on the existence of this lock alone.

@jieyouxu jieyouxu added T-libs Relevant to the library team, which will review and decide on the PR/issue. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jun 17, 2024
@teskje
Copy link
Author

teskje commented Jun 17, 2024

Adding 32 atexit handlers seems to be essential to make this segfault on my glibc 2.39+r52+gf8e4623421-1 on Arch Linux. With 31 atexit handlers, it doesn't segfault.

That's explained by the fact that one block in the list of exit handlers contains 32 entries, and the last block is never freed. So to trigger the use-after-free you need at least two blocks in the list. glibc installs one exit handler on its own, so 32 more need to be installed to add a free-able block to the list.

@jieyouxu jieyouxu added the I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness label Jun 17, 2024
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Jun 17, 2024
@jieyouxu
Copy link
Member

jieyouxu commented Jun 17, 2024

Triage: marking this as I-unsound, but please readjust the label accordingly.

@ChrisDenton ChrisDenton added the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Jun 18, 2024
@ChrisDenton
Copy link
Member

This comment from the main author of musl was pointed out in the libs-api meeting:

POSIX requires exit be thread-safe; it is not one of the functions listed in the exceptions, and all functions are required to be thread-safe by default. My understanding is that C11 also requires this.

I looked up the spec to confirm and it seems to be correct. https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09_01

All functions defined by this volume of POSIX.1-2017 shall be thread-safe, except that the following functions1 need not be thread-safe.

exit is not on the list.

@tbu-
Copy link
Contributor

tbu- commented Jun 18, 2024

According to the C11 standard, calling exit more than once is undefined behavior.

C11 7.22.4.4p2

If a program calls the exit function more than once, or calls the quick_exit function in addition to the exit function, the behavior is undefined.

Calling it from two threads at the same time is calling it twice. It is thus explicitly thread-unsafe in C11.

@tbu-
Copy link
Contributor

tbu- commented Jun 18, 2024

The POSIX description also contains that sentence.

https://pubs.opengroup.org/onlinepubs/009695299/functions/exit.html

If exit() is called more than once, the behavior is undefined.

So the POSIX spec is kind of in conflict with itself?

@ericlagergren
Copy link

@tbu- See the latter half of #83994 (comment)

@tbu-
Copy link
Contributor

tbu- commented Jun 18, 2024

If there's any proper action to be had here, it's on the C side. The text about calling exit more than once being UB was written in the absence of threads and was obviously intended to make recursive exit (via atexit handlers) undefined, not to be a constraint on MT programs outside the scope of C. This should just be fixed (at least in POSIX and other more specific implementation specs) to make it so only the recursive case is undefined.

It makes sense to fix this on the C side, too.

Currently, we have process::exit causing practical UB on a tier 1 platform, it's not just a hypothetical issue, so I think something like #126606 also makes sense. Basically making sure exit() is only called once, and other threads calling exit simply going to sleep forever (until they're cleaned up by the OS).

@Amanieu
Copy link
Member

Amanieu commented Jun 19, 2024

This should probably be reported as a bug to glibc, along with a C program that reproduces the issue.

@zachs18
Copy link
Contributor

zachs18 commented Jun 21, 2024

Here is a mostly-one-to-one C11 translation of the program in the OP that reproduces a segfault on my machine (Ubuntu 24.04, gcc 13.2.0-23ubuntu4, glibc 2.39-0ubuntu8.2), if someone wants to use it in a bug report.

code
#include <stdlib.h>
#include <threads.h>

void exit_handler(void) {
    thrd_sleep(&(struct timespec){.tv_sec = 1}, NULL);
}

int thread_function(void *arg) {
    (void)arg;
    exit(0);
}

int main(void) {
    for (int i = 0; i < 32; ++i) {
        atexit(exit_handler);
    }

    for (int i = 0; i < 2; ++i) {
        thrd_t thread;
        thrd_create(&thread, thread_function, NULL);
    }
}

@carbotaniuman

This comment was marked as outdated.

@tbu-

This comment was marked as outdated.

@benesch
Copy link
Contributor

benesch commented Jul 1, 2024

This should probably be reported as a bug to glibc, along with a C program that reproduces the issue.

I don't think it's exactly a bug in glibc, though, and reporting it that way might get the issue bounced. The Linux manual page for exit(3) says clearly:

The exit() function uses a global variable that is not protected, so it is not thread-safe.

So glibc is at least conforming to its own documentation.

As @richfelker said in #83994 (comment), I do think it's worth trying to get this fixed over in the POSIX/C world, but I think that looks like engaging the folks who work on the POSIX and C standards to get this updated in the next version of those standards, and looping the glibc maintainers into those conversations for feedback.

@richfelker
Copy link

That's great if you can manage it. C libraries frequently have history which precludes it, or are written to an API specification which precludes it.

Then they can be marked non-unloadable so that dlopen/dlclose N times does not leak O(N) resources but leaves O(1) resources live for the entire program lifetime. Perfectly acceptable behavior.

@greghudson
Copy link

Then they can be marked non-unloadable

If I can mark my libraries as non-unloadable in an even vaguely portable fashion, then I would like to sign up immediately. I don't think that's a common platform facility.

@richfelker
Copy link

If I can mark my libraries as non-unloadable in an even vaguely portable fashion, then I would like to sign up immediately. I don't think that's a common platform facility.

For ELF-based platforms, -Wl,-z,nodelete does it. If the library knows its own name, dlopen(myname,RTLD_NOW|RTLD_LOCAL) and throwing away the result does it at runtime on any platform with dlopen. If you want to write code that's compatible with static linking into an environment that might not have dlopen, declaring a weak reference to dlopen and testing it before calling should work.

#include <dlfcn.h>
__typeof__(dlopen) dlopen __attribute__((__weak__));
...
if (dlopen) ...

Libtool may have its own abstraction for how to mark libraries non-unloadable; I'm not sure.

None of this is perfectly portable but I think it meets your requirement for "even vaguely portable". At worst you may need to add special cases for Windows and Mac.

@DemiMarie
Copy link
Contributor

On Windows I’m not sure if making a library non-unloadable is an always an option, as it leaks the DLL’s handles. Fortunately, DllMain on Windows takes an argument indicating whether it is called due to process termination (in which case cleaning up resources is not safe or needed) or due to DLL unloading (in which case cleaning up is both safe and needed).

@bjorn3
Copy link
Member

bjorn3 commented Aug 25, 2024

On macOS using thread local storage effectively blocks dlclose. (There are several issues on the rust issue tracker about rust dylibs not getting unloaded when using dlclose. This due to rust's standard library using TLS internally and it by default getting statically linked into dylibs.)

@comex
Copy link
Contributor

comex commented Aug 25, 2024

On macOS using thread local storage effectively blocks dlclose. (There are several issues on the rust issue tracker about rust dylibs not getting unloaded when using dlclose. This due to rust's standard library using TLS internally and it by default getting statically linked into dylibs.)

Indeed. But on the other hand there doesn't seem to be any explicit "never unload" flag for macOS. No nice solution.

@RalfJung
Copy link
Member

As a C library author, I understand the arguments why Rust cannot reasonably prevent this situation, but I also don't think there's anything I can do, given the state of the ecosystem.

So it seems what you need is a function that is called on dlclose but not on exit? And then for dlclose, it is the caller's requirement to ensure that no thread runs library code any more (that's anyway required) so there can't be any conflict with concurrently running code?

Or alternatively, if supporting dlclose is not even your intent -- some way to tell libc about that so that it doesn't end up being incorrectly half-unloaded or so?

I am not sure what is the best platform to request features like that -- it'd have to come from libcs, right?

@workingjubilee
Copy link
Member

We #126600 (comment) the slow process towards having libc's commit to, effectively, carry such a lock themselves. According to information here, glibc 2.41 will have such a lock. FreeBSD libc received a similar patch.

Yes. My main concern is that the best use of this issue is, I think, continuing to coordinate these efforts. A concern that goes far enough afield, like advocacy for Rust adding an explicit API for exit handlers, is better taken to the actual decision-makers.

For solving the issue of dynamic libraries having problems with dlclose, it sounds like we would want either something like "if this magic symbol is included in your dylib, it shouldn't be actually-unloaded at dlclose time" going on? The appropriate venue for that is probably POSIX, as dlopen is not a Standard C function.

@bjorn3
Copy link
Member

bjorn3 commented Aug 26, 2024

For solving the issue of dynamic libraries having problems with dlclose, it sounds like we would want either something like "if this magic symbol is included in your dylib, it shouldn't be actually-unloaded at dlclose time" going on? The appropriate venue for that is probably POSIX, as dlopen is not a Standard C function.

For ELF based platforms there is already -Wl,-z,nodelete as noted a couple of comments back.

@workingjubilee
Copy link
Member

workingjubilee commented Aug 26, 2024

Doesn't that require the person driving the linker to do it? It's not something the author can assure. And ELF is not under the purview of the Austin Group, I think.

@RalfJung
Copy link
Member

Yes. My main concern is that the best use of this issue is, I think, continuing to coordinate these efforts.

Fair. I feel like coordinating those efforts is out of scope of Rust, but it'd make sense for us to track progress here, and maybe remove our own exit lock once all libc guarantee thread-safe exit (but with the rate at which we increment our min glibc version, that will take many years).

Discussing how to write C libraries that can tolerate their exit handlers to be invoked at any time while also not leaking resources on dlclose (or somehow declaring they don't support being unloaded on dlclose) is also out of scope of Rust, but people are inevitably going to discuss this anyway, so I've created a forum thread for this.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Aug 27, 2024
exit: explain our expectations for the exit handlers registered in a Rust program

This documents the position of `@Amanieu` and others in rust-lang#126600: a library with an atexit handler that destroys state that other threads could still be working on is buggy. We do not consider it acceptable for a library to say "you must call the following cleanup function before exiting from `main` or calling `exit`". I don't know if this is established `@rust-lang/libs-api`  consensus so I presume this will have to go through FCP.

Given that Rust supports concurrency, I don't think there is any way to write a sound Rust wrapper around a library that has such a required cleanup function: even if we made `exit` unsafe, and the Rust wrapper used the scope-with-callback approach to ensure it can run cleanup code before returning from the wrapper (like `thread::scope`), one could still call this wrapper in a second thread and then return from `main` while the wrapper runs. Making this sound would require `std` to provide a way to "block" returning from `main`, so that while the wrapper runs returning from `main` waits until the wrapper is done... that just doesn't seem feasible.

The `exit` docs do not seem like the best place to document this, but I also couldn't think of a better one.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Aug 27, 2024
exit: explain our expectations for the exit handlers registered in a Rust program

This documents the position of ``@Amanieu`` and others in rust-lang#126600: a library with an atexit handler that destroys state that other threads could still be working on is buggy. We do not consider it acceptable for a library to say "you must call the following cleanup function before exiting from `main` or calling `exit`". I don't know if this is established ``@rust-lang/libs-api``  consensus so I presume this will have to go through FCP.

Given that Rust supports concurrency, I don't think there is any way to write a sound Rust wrapper around a library that has such a required cleanup function: even if we made `exit` unsafe, and the Rust wrapper used the scope-with-callback approach to ensure it can run cleanup code before returning from the wrapper (like `thread::scope`), one could still call this wrapper in a second thread and then return from `main` while the wrapper runs. Making this sound would require `std` to provide a way to "block" returning from `main`, so that while the wrapper runs returning from `main` waits until the wrapper is done... that just doesn't seem feasible.

The `exit` docs do not seem like the best place to document this, but I also couldn't think of a better one.
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Aug 27, 2024
Rollup merge of rust-lang#129581 - RalfJung:exit, r=joshtriplett

exit: explain our expectations for the exit handlers registered in a Rust program

This documents the position of ``@Amanieu`` and others in rust-lang#126600: a library with an atexit handler that destroys state that other threads could still be working on is buggy. We do not consider it acceptable for a library to say "you must call the following cleanup function before exiting from `main` or calling `exit`". I don't know if this is established ``@rust-lang/libs-api``  consensus so I presume this will have to go through FCP.

Given that Rust supports concurrency, I don't think there is any way to write a sound Rust wrapper around a library that has such a required cleanup function: even if we made `exit` unsafe, and the Rust wrapper used the scope-with-callback approach to ensure it can run cleanup code before returning from the wrapper (like `thread::scope`), one could still call this wrapper in a second thread and then return from `main` while the wrapper runs. Making this sound would require `std` to provide a way to "block" returning from `main`, so that while the wrapper runs returning from `main` waits until the wrapper is done... that just doesn't seem feasible.

The `exit` docs do not seem like the best place to document this, but I also couldn't think of a better one.
chiichen pushed a commit to DragonOS-Community/glibc-mirror that referenced this issue Oct 18, 2024
Even if C/POSIX standard states that exit is not formally thread-unsafe,
calling it more than once is UB.  The glibc already supports
it for the single-thread, and both elf/nodelete2.c and tst-rseq-disable.c
call exit from a DSO destructor (which is called by _dl_fini, registered
at program startup with __cxa_atexit).

However, there are still race issues when it is called more than once
concurrently by multiple threads.  A recent Rust PR triggered this
issue [1], which resulted in an Austin Group ask for clarification [2].
Besides it, there is a discussion to make concurrent calling not UB [3],
wtih a defined semantic where any remaining callers block until the first
call to exit has finished (reentrant calls, leaving through longjmp, and
exceptions are still undefined).

For glibc, at least reentrant calls are required to be supported to avoid
changing the current behaviour.  This requires locking using a recursive
lock, where any exit called by atexit() handlers resumes at the point of
the current handler (thus avoiding calling the current handle multiple
times).

Checked on x86_64-linux-gnu and aarch64-linux-gnu.

[1] rust-lang/rust#126600
[2] https://austingroupbugs.net/view.php?id=1845
[3] https://www.openwall.com/lists/libc-coord/2024/07/24/4
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
@bstrie
Copy link
Contributor

bstrie commented Oct 23, 2024

This is a rather long thread, can we get a summary of the current status added to the original issue body? What are actionable steps that can be taken to resolve this unsound issue? AFAICT the status appears to be "wait until the rest of the world is fixed", which IMO is tantamount to closing this issue as WONTFIX, given the prospects of that ever happening. And I'm not entirely satisfied by the dismissal of ameliorations on the Rust side (either adding a lock or marking as unsafe). If we can do better, then let's do better, even if preventing people from linking to broken C code is out of our power.

@ChrisDenton
Copy link
Member

The pure Rust fix would be to have some Rust API for adding on exit functions. This could side-step the issue entirely so long as you ignore C.

Note that we already have a lock to mitigate this issue. However it's by no means a fix.

@the8472
Copy link
Member

the8472 commented Oct 23, 2024

AFAICT the status appears to be "wait until the rest of the world is fixed", which IMO is tantamount to closing this issue as WONTFIX, given the prospects of that ever happening.

Things are looking rosier than that. Posix, musl and glibc agreed to improve things and if I read the comments on the posix issue correctly they'll also bring it to the appropriate C committee.

@the8472
Copy link
Member

the8472 commented Oct 23, 2024

I have gathered a short summary and a few links from other comments into the top comment. Edit away if there's anything to correct.

@RalfJung RalfJung changed the title std::process::exit is not thread-safe std::process::exit is not thread-safe in combination with C code calling exit Oct 24, 2024
@safinaskar
Copy link
Contributor

Please, document in https://doc.rust-lang.org/nightly/std/process/fn.exit.html , on which platforms exit is fully thread-safe (even if some C code calls it), and on which platforms it is not. As well as I understand, exit is fully safe on musl. This should be documented on that page.

Also, as well as I understand, musl is shipped with Rust itself. This means nothing prevents us from patching it! So, I suggest patching all remaining similarly unsafe C functions (setenv?) in musl implementation we ship. This will allow us fully solve all these remaining Rust unsoundness problems caused by system libc once and for all (at least for system libraries we ship).

Also, we can create alternative target with static glibc. We will patch that glibc however we want, we will fix all exit, setenv, etc problems there. (Unfortunately, if we really want static glibc, we should build it with --enable-static-nss, but this is blocked by https://sourceware.org/bugzilla/show_bug.cgi?id=27959 )

@RalfJung
Copy link
Member

RalfJung commented Jan 29, 2025

Please, document in https://doc.rust-lang.org/nightly/std/process/fn.exit.html , on which platforms exit is fully thread-safe (even if some C code calls it), and on which platforms it is not. As well as I understand, exit is fully safe on musl. This should be documented on that page.

That is really something the respective platforms need to document. We shouldn't be in the business of providing a central repository of all the ways in which various C implementations promise more than the C standard. We track this for our own needs here, but that is not the same as making it part of the official docs.

And we are also not interested in maintaining a patched fork of musl or glibc.

@RalfJung
Copy link
Member

RalfJung commented Jan 29, 2025

What we should do, however, is document that it is UB to call this exit concurrently with C code calling exit.

jhpratt added a commit to jhpratt/rust that referenced this issue Mar 18, 2025
jhpratt added a commit to jhpratt/rust that referenced this issue Mar 18, 2025
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Mar 18, 2025
Sakib25800 added a commit to Sakib25800/rust that referenced this issue Mar 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests