Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove use of atomic_load_unordered and undefined behaviour from arm_linux.rs #790

Merged
merged 1 commit into from
Mar 20, 2025

Conversation

beetrees
Copy link
Contributor

atomic_load_unordered is in the process of being removed (#788), so replace it with a relaxed atomic load. Additionally, avoid doing an out-of-bounds Rust read (#559) on 1 and 2 byte atomics by using inline ASM.

The only changes to the resulting optimised assembly on the latest Rust nightly (rustc 1.87.0-nightly (665025243 2025-03-11)) are a few minor changes to instruction scheduling in the 1 and 2 byte atomic RMW functions.

cc @RalfJung - I've tried to summarise the approach of using inline ASM in the SAFETY comments, but would definitely appreciate someone checking they make sense.

Copy link
Member

@RalfJung RalfJung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't comment on the inline asm attributes, but the rest LGTM.

__kuser_cmpxchg is a bit questionable since oldval and newval seem like they must be able to hold provenance, but u32 is the wrong type for that. It would be more correct to use MaybeUninit<u32>.

src/arm_linux.rs Outdated
/// # Safety
///
/// `ptr` must be aligned and point to memory within a page that allows read access.
/// If `T` has a size of 4, then `ptr` must be valid for an atomic read.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// If `T` has a size of 4, then `ptr` must be valid for an atomic read.
/// If `T` has a size of 4, then `ptr` must be valid for a relaxed atomic read.

However, I am a bit confused about what extra requirements you have in mind here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When T has a size of 4, the method loads the value using AtomicU32::from_ptr(ptr).load(Ordering::Relaxed) instead of inline ASM. AtomicU32::from_ptr(ptr).load(Ordering::Relaxed), has greater safety requirements than "ptr must be aligned and point to memory within a page that allows read access" (i.e. the read must be valid under the Rust memory model, as opposed to just not segfaulting), so I was trying to state that the caller must uphold those stricter requirements when size_of::<T>() == 4. I've added more documentation to try and clarify this; let me know if you think it could be clarified further.

Copy link
Member

@RalfJung RalfJung Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like an implementation detail of the function.

I'd expect the safety contract to be something like

  • ptr must be 4-aligned (currently it just says "aligned" but it is ambiguous how aligned -- is 4 even correct?)
    • this implies that a 4-byte range cannot cross a page boundary, so that part need not be mentioned in the docs, it becomes part of the SAFETY comment for the asm block
  • T must have a size of at most 4
  • ptr must be valid for a relaxed atomic read of size size_of::<T>()

Whether and when it uses atomics vs inline asm does not have to be exposed in the docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the safety section and comment.

@tgross35 tgross35 requested a review from Amanieu March 12, 2025 19:30
@beetrees beetrees force-pushed the arm-linux-asm branch 2 times, most recently from 9da124b to 0031ae4 Compare March 13, 2025 16:58
@beetrees
Copy link
Contributor Author

__kuser_cmpxchg is a bit questionable since oldval and newval seem like they must be able to hold provenance, but u32 is the wrong type for that. It would be more correct to use MaybeUninit<u32>.

The 4 byte functions like __sync_fetch_and_add_4 are used to implement the strict_provenance_atomic_ptr functions like AtomicPtr::fetch_byte_add, however they (the __sync_*_4 functions) are currently implemented using operations on u32s. AFAICT they also need to be changed to use <*mut _>::map_addr or equivalent in order to preserve provenance?

@beetrees
Copy link
Contributor Author

Also I think it would be OK to use *mut () instead of MaybeUninit<u32> to preserve provenance, as Rust doesn't support atomics on uninitialised values?

@RalfJung
Copy link
Member

The 4 byte functions like __sync_fetch_and_add_4 are used to implement the strict_provenance_atomic_ptr functions like AtomicPtr::fetch_byte_add, however they (the _sync*_4 functions) are currently implemented using operations on u32s. AFAICT they also need to be changed to use <*mut _>::map_addr or equivalent in order to preserve provenance?

Ah, I had not thought of that part yet, I was more concerned of the data that is outside the value we conceptually operate on -- there might be pointers there we have to preserve.

Copy link
Contributor

@tgross35 tgross35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes here LGTM with some of the minor docs updates Ralf mentioned at #790 (comment), and possibly a mention somewhere that the read is atomic without ordering. I double checked that ldr is all LLVM emits anyway in this case https://rust.godbolt.org/z/cerYKhKd8.

@RalfJung
Copy link
Member

RalfJung commented Mar 18, 2025 via email

@tgross35
Copy link
Contributor

"Without ordering" sounds like a non-atomic read. Is that enough?

At least on this target, probably, but isn't this an AM vs. hardware difference? E.g. as I understand it LLVM could decide to turn a non-atomic u32 read as four u8 reads, which would be problematic.

@RalfJung
Copy link
Member

This has to be an atomic load since otherwise there could be UB due to a data race. After all, this is allowed to race with concurrent (atomic) stores.

So the comment should say that this acts like a relaxed atomic load. (There is no such thing as an "atomic load without ordering".)

@beetrees
Copy link
Contributor Author

Rebased to fix merge conflict.

@tgross35 tgross35 enabled auto-merge (rebase) March 20, 2025 02:01
@tgross35 tgross35 merged commit 1f67000 into rust-lang:master Mar 20, 2025
26 checks passed
@beetrees beetrees deleted the arm-linux-asm branch March 20, 2025 02:05
tgross35 added a commit to tgross35/rust that referenced this pull request Mar 20, 2025
Includes the following changes related to unordered atomics:

* Remove element_unordered_atomic intrinsics [1]
* Remove use of `atomic_load_unordered` and undefined behaviour [2]

There are a handful of other small changes, but nothing else
user-visible.

[1]: rust-lang/compiler-builtins#789
[2]: rust-lang/compiler-builtins#790
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Mar 21, 2025
Update `compiler-builtins` to 0.1.152

Includes the following changes related to unordered atomics:

* Remove element_unordered_atomic intrinsics [1]
* Remove use of `atomic_load_unordered` and undefined behaviour [2]

There are a handful of other small changes, but nothing else user-visible.

[1]: rust-lang/compiler-builtins#789
[2]: rust-lang/compiler-builtins#790
@tgross35
Copy link
Contributor

Unfortunately it looks like this is hitting the monomorphization error for Android rust-lang/rust#138771 (comment). I guess this file may not be covered in CI at all, given it is only used for very old arm

// Only emit the ARM Linux atomic emulation on pre-ARMv6 architectures. This
// includes the old androideabi. It is deprecated but it is available as a
// rustc target (arm-linux-androideabi).
println!("cargo::rustc-check-cfg=cfg(kernel_user_helpers)");
if llvm_target[0] == "armv4t"
|| llvm_target[0] == "armv5te"
|| target.triple == "arm-linux-androideabi"
{
println!("cargo:rustc-cfg=kernel_user_helpers")
}

@RalfJung
Copy link
Member

I thought calling generic functions is fine since they get inlined anyway? What does "cannot call functions through upstream monomorphizations" mean?

@tgross35
Copy link
Contributor

I don't really understand what the exact restrictions are, @saethlin would know better. But in this case AtomicX::from_ptr surprisingly isn't marked #[inline] and isn't actually generic (despite the error message), I think inlining should fix it (patch WIP).

Manually inlining the .cast() here also seems to fix things.

@RalfJung
Copy link
Member

Ah ofc the atomic types are all monomorphic.

So I understand now why it errors, but still can't make sense of the error message.

tgross35 added a commit to tgross35/rust that referenced this pull request Mar 21, 2025
Currently this cannot be inlined, which among other things means it
can't be used in `compiler-builtins` [1]. These are trivial functions
that should be inlineable, so add `#[inline]`.

[1]: rust-lang/compiler-builtins#790 (comment)
@tgross35
Copy link
Contributor

It comes from an internal detail https://github.com/rust-lang/rust/blob/a4a11aca5ecf24dfff3c00715641026809951305/compiler/rustc_codegen_ssa/src/base.rs#L851. But yeah, I'm not sure what the significance of "upstream monomorphization" is instead of "non-inlinable call" or whatever.

tgross35 added a commit to tgross35/rust that referenced this pull request Mar 21, 2025
Includes the following changes related to unordered atomics:

* Remove element_unordered_atomic intrinsics [1]
* Remove use of `atomic_load_unordered` and undefined behaviour [2]

There are a handful of other small changes, but nothing else
user-visible.

[1]: rust-lang/compiler-builtins#789
[2]: rust-lang/compiler-builtins#790
tgross35 added a commit to tgross35/rust that referenced this pull request Mar 21, 2025
Currently this cannot be inlined, which among other things means it
can't be used in `compiler-builtins` [1]. These are trivial functions
that should be inlineable, so add `#[inline]`.

[1]: rust-lang/compiler-builtins#790 (comment)

(cherry picked from commit eb2a2f8)
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 21, 2025
Update `compiler-builtins` to 0.1.152

Includes the following changes related to unordered atomics:

* Remove element_unordered_atomic intrinsics [1]
* Remove use of `atomic_load_unordered` and undefined behaviour [2]

There are a handful of other small changes, but nothing else user-visible.

[1]: rust-lang/compiler-builtins#789
[2]: rust-lang/compiler-builtins#790

try-job: arm-android
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Mar 22, 2025
…=RalfJung

Allow inlining for `Atomic*::from_ptr`

Currently this cannot be inlined, which among other things means it can't be used in `compiler-builtins` [1]. These are trivial functions that should be inlineable, so add `#[inline]`.

[1]: rust-lang/compiler-builtins#790 (comment)
@saethlin
Copy link
Member

saethlin commented Mar 22, 2025

But yeah, I'm not sure what the significance of "upstream monomorphization" is instead of "non-inlinable call" or whatever.

Because whether a call is non-inlinable (note that "non-inlinable call" isn't even a concept we have in the compiler) is not strictly the same as whether it is to an upstream monomorphization. Calls to upstream monomorphizations can still be inlined or otherwise optimized away by the codegen backend, and the point of this diagnostic is that we don't take that into consideration.

So we could change the error to say "compiler-builtins can only make calls to functions defined in compiler-builtins or functions which are defined in other crates but lowered from MIR as part of compiling compiler-builtins" but that's just including the definition of "upstream monomorphization" in the diagnostic 🤷

@RalfJung
Copy link
Member

So we could change the error to say "compiler-builtins can only make calls to functions defined in compiler-builtins or functions which are defined in other crates but lowered from MIR as part of compiling compiler-builtins" but that's just including the definition of "upstream monomorphization" in the diagnostic 🤷

That would still be useful, given that hardly anyone knows what an "upstream monomorphization" is.^^ I did non-trivial work on the monomorphization logic, and I never encountered that term, and did not understand it when encountering it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants