-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault when thread using dynamically loaded Rust library exits #91979
Comments
Upon further research, I'm pretty sure the reason it "works" with newer glibc versions is that the library is actually never fully unloaded if there are TLS destructors registered via And in dlclose, this is checked: https://github.com/bminor/glibc/blob/90b37cac8b5a3e1548c29d91e3e0bff1014d2e5c/elf/dl-close.c#L186 This can be verified by running my reproduction program above with the |
Brooooooklyn/canvas#377 maybe relate to it |
@rustbot label T-libs A-thread-locals |
It may be useful/informative to read some of the prior discussion of dynamic libraries & thread local storage in this & linked issues: #28794 FWIW based on my experience the only "reliable" approach has been to simply never allow dylibs to be unloaded: follower/foreigner@3845586 Edit: Especially nagisa/rust_libloading#41 & also https://sourceware.org/glibc/wiki/Destructor%20support%20for%20thread_local%20variables (I first encountered this issue when using |
See also this upstream glibc bug: https://sourceware.org/bugzilla/show_bug.cgi?id=21032 |
A bit more context about how this affects real world code. This currently happens to all native Node.js modules that include thread locals, when loaded within Node's worker threads. Node controls when One potential solution is to register a destructor function in the |
Hmm, is this possibly caused by #88737 ? (edit: no, but it is related). |
Apprently, this was made significantly worse in v1.83.0, which has starting to segfault for me when using native addons written in Rust in Node.js. |
FYI, @Brooooooklyn found a workaround. Compiling with the following linker args prevents the dylib from being unloaded. [target.'cfg(target_env = "gnu")']
rustflags = ["-C", "link-args=-Wl,-z,nodelete"] Would be nice to get a real fix though. This has indeed gotten much worse with recent Rust versions. |
…read (#17276) When Tailwind is loaded in a Node Worker thread, it currently causes a segmentation fault on Linux when the thread exits. This is due to a longstanding issue in Rust that affects all native modules: rust-lang/rust#91979. I reported this years ago but unfortunately it is still not fixed, and seems to have gotten worse in Rust 1.83.0 and later. Looks like Tailwind recently updated Rust versions and this issue started appearing when run in tools like Parcel that use worker threads. The workaround is to prevent the native module from ever being unloaded. One way to do that is to always load the native module in the main thread in addition to workers, but this is hard to enforce. @Brooooooklyn found another method, which is to use a linker option for this. I tested this on an Ubuntu system and verified it fixed the issue. You can test with the following script. ```js // test.js const {Worker} = require('worker_threads'); new Worker('./worker.js'); // worker.js require('@tailwindcss/oxide'); ``` Without this change, a segmentation fault will occur. --------- Co-authored-by: Jordan Pittman <jordan@cryptica.me>
…read (#17276) When Tailwind is loaded in a Node Worker thread, it currently causes a segmentation fault on Linux when the thread exits. This is due to a longstanding issue in Rust that affects all native modules: rust-lang/rust#91979. I reported this years ago but unfortunately it is still not fixed, and seems to have gotten worse in Rust 1.83.0 and later. Looks like Tailwind recently updated Rust versions and this issue started appearing when run in tools like Parcel that use worker threads. The workaround is to prevent the native module from ever being unloaded. One way to do that is to always load the native module in the main thread in addition to workers, but this is hard to enforce. @Brooooooklyn found another method, which is to use a linker option for this. I tested this on an Ubuntu system and verified it fixed the issue. You can test with the following script. ```js // test.js const {Worker} = require('worker_threads'); new Worker('./worker.js'); // worker.js require('@tailwindcss/oxide'); ``` Without this change, a segmentation fault will occur. --------- Co-authored-by: Jordan Pittman <jordan@cryptica.me>
Now that CentOS 7 is EOL, it is very hard to imagine that this would be best fixed by a method other than raising the requirements to a version of glibc that we actually can work with. Every other hack I have seen for running destructors on dlclose has tended to result in other, novel kinds of misery that show up a few months later. That said, that would only work if glibc 2.17 is the only case that reaches this, and besides that, I'm not sure that the |
As I understand, issue is still present with latest glibc, but it is a leak instead of crash. |
This bug is observed when closing Vulkan 3D applications with Mesa NVK driver that use Rust code. It is important to properly support dlopen/dlclose for OpenGL/Vulkan drivers because it is needed for compositor GPU hotplug, driver updates and other long living system 3D applications use cases. |
And so is It is not clear that we can in fact support this "properly" if we define it as "without leaking". |
The problem is not safety, but logic breakage. Missing |
Scenario: I have a Rust cdylib, which is loaded by a C program via
dlopen
. The C program creates a thread, and loads the Rust module inside it. It proceeds to call one of the Rust functions, and closes the library viadlclose
. Then the thread exits. The Rust program has a thread local variable with a struct that implementsDrop
, which it modifies in the function called from C.Full reproduction here: https://github.com/devongovett/rust-threadlocal-bug
On CentOS 7, which uses glibc 2.17, it segfaults at
__nptl_deallocate_tsd()
inside pthread_create.c. With later versions of glibc, there is no crash. I believe the crash occurs because Rust creates a thread local key withpthread_key_create
but never callspthread_key_delete
(the call in the destructor is commented out):rust/library/std/src/sys_common/thread_local_key.rs
Lines 231 to 237 in 673d0db
When the thread exits, glibc tries to call the destructor for the key, but because the dynamic library has already been unloaded via
dlclose
at this point, the function no longer exists and we get a crash.My theory is that this only occurs with glibc 2.17 and not later versions is due to
__cxa_thread_atexit_impl
not existing in these older versions. This function is used when available to register destructors, otherwise a fallback implementation is used:rust/library/std/src/sys/unix/thread_local_dtor.rs
Lines 30 to 42 in 71965ab
I have not tested, but I think the bug could potentially be fixed if the commented out destructor linked above were actually called. The comment indicates something about windows not supporting this, so maybe it could be called conditionally?
glibc 2.17 is indeed pretty old, however, it is the version used by the current CentOS 7 version which is not EOL until 2024, so I do think this bug should be fixed.
Meta
rustc --version --verbose
:The text was updated successfully, but these errors were encountered: