Segmentation fault when thread using dynamically loaded Rust library exits #91979

devongovett · 2021-12-15T21:51:30Z

Scenario: I have a Rust cdylib, which is loaded by a C program via dlopen. The C program creates a thread, and loads the Rust module inside it. It proceeds to call one of the Rust functions, and closes the library via dlclose. Then the thread exits. The Rust program has a thread local variable with a struct that implements Drop, which it modifies in the function called from C.

Full reproduction here: https://github.com/devongovett/rust-threadlocal-bug

On CentOS 7, which uses glibc 2.17, it segfaults at __nptl_deallocate_tsd() inside pthread_create.c. With later versions of glibc, there is no crash. I believe the crash occurs because Rust creates a thread local key with pthread_key_create but never calls pthread_key_delete (the call in the destructor is commented out):

rust/library/std/src/sys_common/thread_local_key.rs

Lines 231 to 237 in 673d0db

    
           impl Drop for Key { 
        
               fn drop(&mut self) { 
        
                   // Right now Windows doesn't support TLS key destruction, but this also 
        
                   // isn't used anywhere other than tests, so just leak the TLS key. 
        
                   // unsafe { imp::destroy(self.key) } 
        
               } 
        
           }

When the thread exits, glibc tries to call the destructor for the key, but because the dynamic library has already been unloaded via dlclose at this point, the function no longer exists and we get a crash.

My theory is that this only occurs with glibc 2.17 and not later versions is due to __cxa_thread_atexit_impl not existing in these older versions. This function is used when available to register destructors, otherwise a fallback implementation is used:

rust/library/std/src/sys/unix/thread_local_dtor.rs

Lines 30 to 42 in 71965ab

    
           if !__cxa_thread_atexit_impl.is_null() { 
        
               type F = unsafe extern "C" fn( 
        
                   dtor: unsafe extern "C" fn(*mut u8), 
        
                   arg: *mut u8, 
        
                   dso_handle: *mut u8, 
        
               ) -> libc::c_int; 
        
               mem::transmute::<*const libc::c_void, F>(__cxa_thread_atexit_impl)( 
        
                   dtor, 
        
                   t, 
        
                   &__dso_handle as *const _ as *mut _, 
        
               ); 
        
               return; 
        
           }

However, I'm not sure about that. It could be some other change in glibc.

I have not tested, but I think the bug could potentially be fixed if the commented out destructor linked above were actually called. The comment indicates something about windows not supporting this, so maybe it could be called conditionally?

glibc 2.17 is indeed pretty old, however, it is the version used by the current CentOS 7 version which is not EOL until 2024, so I do think this bug should be fixed.

Meta

rustc --version --verbose:

rustc 1.57.0 (f1edd0429 2021-11-29)
binary: rustc
commit-hash: f1edd0429582dd29cccacaf50fd134b05593bd9c
commit-date: 2021-11-29
host: x86_64-unknown-linux-gnu
release: 1.57.0
LLVM version: 13.0.0

The text was updated successfully, but these errors were encountered:

devongovett · 2021-12-16T00:39:07Z

Upon further research, I'm pretty sure the reason it "works" with newer glibc versions is that the library is actually never fully unloaded if there are TLS destructors registered via __cxa_thread_atexit_impl. In the source code, l_tls_dtor_count is incremented when a destructor is registered: https://github.com/bminor/glibc/blob/91cc803d27bda34919717b496b53cf279e44a922/stdlib/cxa_thread_atexit_impl.c#L137

And in dlclose, this is checked: https://github.com/bminor/glibc/blob/90b37cac8b5a3e1548c29d91e3e0bff1014d2e5c/elf/dl-close.c#L186

This can be verified by running my reproduction program above with the LD_DEBUG=files environment variable. In glibc 2.17, it calls the .fini section immediately on dlclose (just before segfaulting), but in newer versions this is deferred until process exit (essentially a leak?!).

Brooooooklyn · 2021-12-16T04:56:05Z

Brooooooklyn/canvas#377 maybe relate to it

hkratz · 2021-12-16T06:04:42Z

@rustbot label T-libs A-thread-locals

follower · 2021-12-16T07:32:11Z

It may be useful/informative to read some of the prior discussion of dynamic libraries & thread local storage in this & linked issues: #28794

FWIW based on my experience the only "reliable" approach has been to simply never allow dylibs to be unloaded: follower/foreigner@3845586

Edit: Especially nagisa/rust_libloading#41 & also https://sourceware.org/glibc/wiki/Destructor%20support%20for%20thread_local%20variables

(I first encountered this issue when using wasmtime from C & C++.)

Aaron1011 · 2021-12-16T16:02:25Z

See also this upstream glibc bug: https://sourceware.org/bugzilla/show_bug.cgi?id=21032

devongovett · 2021-12-16T16:38:33Z

A bit more context about how this affects real world code. This currently happens to all native Node.js modules that include thread locals, when loaded within Node's worker threads. Node controls when dlopen and dlclose are called, so there isn't really a great way for module authors to simply prevent unloading the dylib. One way is for users to also load the module from the main thread in addition to workers, which in effect delays unloading until process exit, but this is not very obvious or ergonomic. And this is far from the only case where this could occur.

One potential solution is to register a destructor function in the .fini_array section of the ELF binary, which will be called when the library is unloaded by dlclose. In C this is possible via the __attribute__((destructor)) syntax, which was used to solve this related bug: https://bugzilla.redhat.com/show_bug.cgi?id=1065695. Perhaps Rust could do something similar to ensure the thread local key is dropped when the dylib is unloaded?

thomcc · 2022-03-01T05:55:20Z

Hmm, is this possibly caused by #88737 ? (edit: no, but it is related). ~~That bug shouldn't be that hard to fix, if so.~~ (edit: ☹️)

mcollina · 2024-12-16T14:01:38Z

Apprently, this was made significantly worse in v1.83.0, which has starting to segfault for me when using native addons written in Rust in Node.js.

devongovett · 2025-03-18T19:31:45Z

FYI, @Brooooooklyn found a workaround. Compiling with the following linker args prevents the dylib from being unloaded.

[target.'cfg(target_env = "gnu")']
rustflags = ["-C", "link-args=-Wl,-z,nodelete"]

Would be nice to get a real fix though. This has indeed gotten much worse with recent Rust versions.

@Brooooooklyn

…read (#17276) When Tailwind is loaded in a Node Worker thread, it currently causes a segmentation fault on Linux when the thread exits. This is due to a longstanding issue in Rust that affects all native modules: rust-lang/rust#91979. I reported this years ago but unfortunately it is still not fixed, and seems to have gotten worse in Rust 1.83.0 and later. Looks like Tailwind recently updated Rust versions and this issue started appearing when run in tools like Parcel that use worker threads. The workaround is to prevent the native module from ever being unloaded. One way to do that is to always load the native module in the main thread in addition to workers, but this is hard to enforce. @Brooooooklyn found another method, which is to use a linker option for this. I tested this on an Ubuntu system and verified it fixed the issue. You can test with the following script. ```js // test.js const {Worker} = require('worker_threads'); new Worker('./worker.js'); // worker.js require('@tailwindcss/oxide'); ``` Without this change, a segmentation fault will occur. --------- Co-authored-by: Jordan Pittman <jordan@cryptica.me>

@Brooooooklyn

…read (#17276) When Tailwind is loaded in a Node Worker thread, it currently causes a segmentation fault on Linux when the thread exits. This is due to a longstanding issue in Rust that affects all native modules: rust-lang/rust#91979. I reported this years ago but unfortunately it is still not fixed, and seems to have gotten worse in Rust 1.83.0 and later. Looks like Tailwind recently updated Rust versions and this issue started appearing when run in tools like Parcel that use worker threads. The workaround is to prevent the native module from ever being unloaded. One way to do that is to always load the native module in the main thread in addition to workers, but this is hard to enforce. @Brooooooklyn found another method, which is to use a linker option for this. I tested this on an Ubuntu system and verified it fixed the issue. You can test with the following script. ```js // test.js const {Worker} = require('worker_threads'); new Worker('./worker.js'); // worker.js require('@tailwindcss/oxide'); ``` Without this change, a segmentation fault will occur. --------- Co-authored-by: Jordan Pittman <jordan@cryptica.me>

workingjubilee · 2025-03-26T03:25:15Z

Now that CentOS 7 is EOL, it is very hard to imagine that this would be best fixed by a method other than raising the requirements to a version of glibc that we actually can work with. Every other hack I have seen for running destructors on dlclose has tended to result in other, novel kinds of misery that show up a few months later. That said, that would only work if glibc 2.17 is the only case that reaches this, and besides that, I'm not sure that the .fini_array approach wouldn't work.

X547 · 2025-03-26T03:46:07Z

raising the requirements to a version of glibc that we actually can work with

As I understand, issue is still present with latest glibc, but it is a leak instead of crash.

X547 · 2025-03-26T03:47:17Z

This bug is observed when closing Vulkan 3D applications with Mesa NVK driver that use Rust code.

It is important to properly support dlopen/dlclose for OpenGL/Vulkan drivers because it is needed for compositor GPU hotplug, driver updates and other long living system 3D applications use cases.

workingjubilee · 2025-03-26T04:40:23Z

As I understand, issue is still present with latest glibc, but it is a leak instead of crash.

And so is Box::leak or mem::forget, which is a safe function, whereas it is slightly harder to induce a segfault with safe code.

It is not clear that we can in fact support this "properly" if we define it as "without leaking".

X547 · 2025-03-26T06:11:46Z

And so is Box::leak or mem::forget, which is a safe function, whereas it is slightly harder to induce a segfault with safe code.

The problem is not safety, but logic breakage. Missing pthread_key_delete will cause leaking shared library reference and make impossible to live reload of shared library.

devongovett added the C-bug label Dec 15, 2021

This was referenced Dec 16, 2021

Segmentation file on parcel build when png file is included parcel-bundler/parcel#7408

Closed

Segmentation fault (core dumped) parcel-bundler/parcel#5961

Closed

rustbot added A-thread-locals T-libs labels Dec 16, 2021

This was referenced Dec 16, 2021

Workaround segfault with old glibc versions parcel-bundler/parcel#7457

Merged

Apply segfault workaround to image optimizer as well parcel-bundler/parcel#7461

Merged

Brooooooklyn mentioned this issue Dec 17, 2021

Segmentation fault on require("@swc/core") in a Node.js worker thread swc-project/swc#2276

Closed

joshtriplett added the E-medium label Jan 26, 2022

joshtriplett removed the E-medium label Mar 1, 2022

Brooooooklyn mentioned this issue May 27, 2022

qemu: uncaught target signal 11 (Segmentation fault) - core dumped Segmentation fault swc-project/swc#4752

Closed

thomcc mentioned this issue Jul 8, 2022

thread_local! dtor registration can pass wrong __dso_handle if dynamically linked #88737

Open

Brooooooklyn mentioned this issue Aug 16, 2022

Segmentation fault when workers exit Brooooooklyn/snappy#83

Open

pfdgithub mentioned this issue Feb 3, 2023

Error RpcIpcMessagePortClosedError TypeStrong/fork-ts-checker-webpack-plugin#623

Closed

benStre mentioned this issue Mar 5, 2024

Segmentation fault when transpiling "if" statement with @swc/core version > 1.3.100 in deno inside Alpine Linux docker container swc-project/swc#8695

Closed

kdy1 mentioned this issue Apr 11, 2024

SWC has a segfault condition swc-project/swc#8840

Closed

mcollina mentioned this issue Dec 16, 2024

Segmentation Fault rollup/rollup#5761

Closed

mcollina mentioned this issue Dec 20, 2024

Add a config option option to load a module in the main thread platformatic/platformatic#3713

Closed

gatieme mentioned this issue Dec 30, 2024

c++ code frequency dlclose/dlopen *.so compiled by rust cause crash #134820

Open

devongovett mentioned this issue Jan 12, 2025

[1.29.x][regression] segfault when require()-ed from a worker thread in Node 22/Linux parcel-bundler/lightningcss#892

Open

TheLarkInn mentioned this issue Mar 10, 2025

Introduce a Heft SWC plugin. microsoft/rushstack#5149

Merged

snowystinger mentioned this issue Mar 18, 2025

fix: verdaccio main adobe/react-spectrum#7948

Merged

5 tasks

This was referenced Mar 18, 2025

"Segmentation fault (core dumped)" from GitHub Actions parcel-bundler/parcel#10081

Open

Fix segmentation fault when loading @tailwindcss/oxide in a Worker thread tailwindlabs/tailwindcss#17276

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault when thread using dynamically loaded Rust library exits #91979

Segmentation fault when thread using dynamically loaded Rust library exits #91979

devongovett commented Dec 15, 2021

devongovett commented Dec 16, 2021

Brooooooklyn commented Dec 16, 2021

hkratz commented Dec 16, 2021

follower commented Dec 16, 2021 •

edited

Loading

Aaron1011 commented Dec 16, 2021

devongovett commented Dec 16, 2021 •

edited

Loading

thomcc commented Mar 1, 2022 •

edited

Loading

mcollina commented Dec 16, 2024

devongovett commented Mar 18, 2025

workingjubilee commented Mar 26, 2025 •

edited

Loading

X547 commented Mar 26, 2025

X547 commented Mar 26, 2025 •

edited

Loading

workingjubilee commented Mar 26, 2025 •

edited

Loading

X547 commented Mar 26, 2025

Segmentation fault when thread using dynamically loaded Rust library exits #91979

Segmentation fault when thread using dynamically loaded Rust library exits #91979

Comments

devongovett commented Dec 15, 2021

Meta

devongovett commented Dec 16, 2021

Brooooooklyn commented Dec 16, 2021

hkratz commented Dec 16, 2021

follower commented Dec 16, 2021 • edited Loading

Aaron1011 commented Dec 16, 2021

devongovett commented Dec 16, 2021 • edited Loading

thomcc commented Mar 1, 2022 • edited Loading

mcollina commented Dec 16, 2024

devongovett commented Mar 18, 2025

workingjubilee commented Mar 26, 2025 • edited Loading

X547 commented Mar 26, 2025

X547 commented Mar 26, 2025 • edited Loading

workingjubilee commented Mar 26, 2025 • edited Loading

X547 commented Mar 26, 2025

follower commented Dec 16, 2021 •

edited

Loading

devongovett commented Dec 16, 2021 •

edited

Loading

thomcc commented Mar 1, 2022 •

edited

Loading

workingjubilee commented Mar 26, 2025 •

edited

Loading

X547 commented Mar 26, 2025 •

edited

Loading

workingjubilee commented Mar 26, 2025 •

edited

Loading