Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constructive/Destructive Interference Size Padding #298

Open
mj10021 opened this issue Nov 21, 2023 · 6 comments
Open

Constructive/Destructive Interference Size Padding #298

mj10021 opened this issue Nov 21, 2023 · 6 comments
Labels
api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api

Comments

@mj10021
Copy link

mj10021 commented Nov 21, 2023

Proposal

Problem statement

Currently, the standard library does not provide a built-in mechanism to add hardware constructive/destructive interference size padding to a struct to avoid false-sharing or promote true-sharing of instances of the same struct. These can be important tools in performance optimizations and are currently implemented independently in multiple crates.

Motivating examples or use cases

As documented in rust issue #117470, aligning a struct to the cache line size can provide significant performance benefits. Multiple crates, including mpmc::utils in rustc, crossbeam-utils, and regex implement their own version of cache padding, which would most likely be improved by std offering a built-in implementation of constructive/destructive interference size padding.

Solution sketch

A solution could look like a hard-coded value table like the implementation in crossbeam-utils, offered in rust pr #117519, although this approach is likely to be inaccurate in some cases and would require continuous updating of the values.

#[cfg_attr(
    any(target_arch = "x86_64", target_arch = "aarch64", target_arch = "powerpc64",),
    repr(align(128))
)]
#[cfg_attr(
    any(
        target_arch = "arm",
        target_arch = "mips",
        target_arch = "mips64",
        target_arch = "riscv32",
        target_arch = "riscv64",
        target_arch = "sparc",
        target_arch = "hexagon",
    ),
    repr(align(32))
)]
#[cfg_attr(target_arch = "m68k", repr(align(16)))]
#[cfg_attr(target_arch = "s390x", repr(align(256)))]
#[cfg_attr(
    not(any(
        target_arch = "x86_64",
        target_arch = "aarch64",
        target_arch = "powerpc64",
        target_arch = "arm",
        target_arch = "mips",
        target_arch = "mips64",
        target_arch = "riscv32",
        target_arch = "riscv64",
        target_arch = "sparc",
        target_arch = "hexagon",
        target_arch = "m68k",
        target_arch = "s390x",
    )),
    repr(align(64))
)]

Other approaches are offered by the C++ implementation, linked below. Additionally, LLVM offers built-in methods for getting target arch cache line sizes.

Alternatives

Links and related work

Zulip discussion:
https://rust-lang.zulipchat.com/#narrow/stream/327149-t-libs-api.2Fapi-changes/topic/adding.20CachePadded.20to.20std

Existing implementations:
crossbeam-utils:
https://docs.rs/crossbeam/latest/crossbeam/utils/struct.CachePadded.html
mpmc-utils:
https://doc.rust-lang.org/beta/src/std/sync/mpmc/utils.rs.html

This feature exists in C++ as std::hardware_destructive_interference_size and std::hardware_constructive_interference_size, https://en.cppreference.com/w/cpp/thread/hardware_destructive_interference_size . The original paper describing the feature offers a few possible implementations: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0154r1.html .

LLVM GetCacheLineSize:
https://llvm.org/doxygen/classllvm_1_1MCSubtargetInfo.html#ac4be4ef1a969f0da1aa2da9aa5ccfe45

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

  • We think this problem seems worth solving, and the standard library might be the right place to solve it.
  • We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

  • We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
  • We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.
@mj10021 mj10021 added api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api labels Nov 21, 2023
@joshtriplett
Copy link
Member

We discussed this in today's @rust-lang/libs-api meeting. We felt that this should not be exclusively a library value, because that wouldn't allow using it in repr(align(...)). This should have lang support. Given that lang support, we might also want to expose it as a constant (e.g. for use in dynamic memory allocations), but we felt that the work here should start with lang exposing this in a fashion that works with repr(align(...)).

@hanna-kruppe
Copy link

hanna-kruppe commented Feb 12, 2025

This has been mentioned in several of the linked discussions/resources but for completeness I also want to spell it out here: these numbers can't be defined precisely at compile time, since CPU models that are otherwise functionally identical can have rather different cache systems. They can only be provided on a best-effort basis, using knowledge of currently existing microarchitectures.

Of course to be useful for alignment/padding, there has to be some compile-time answer and an overly conservative choice is fine. But to remain useful, the value has to be updated when new CPU models with larger "interference size" are released. The C++ equivalent of this feature can't be updated easily because changing the value is generally ABI-breaking. In particular, the value can't vary based on flags like -mcpu because linking object files compiled with different target CPUs is supported. Rust generally has less problems with ABI ossification but this one is shared: any compiler flags that affect this padding will have to be considered a "target modifier" (rust-lang/rfcs#3716) that can cause ABI mismatches among code compiled by the same rustc version for the same target. In addition, any use of this attribute with non-repr(Rust) types and across FFI boundaries must also be considered (documented as, possibly linted against) ABI-unstable.

@the8472
Copy link
Member

the8472 commented Feb 12, 2025

any compiler flags that affect this padding will have to be considered a "target modifier" (rust-lang/rfcs#3716) that can cause ABI mismatches among code compiled by the same rustc version for the same target

Only if types use the repr(align(CACHELINE)) and have definitions in multiple separate compilations and passed unsafely across boundaries between those compilations, no? With a single definition they'd get the determined-at-compiletime alignment from the crate it was defined at in the crate metadata.

Wouldn't it then be sufficient to mark such types as not FFI-safe?

@hanna-kruppe
Copy link

hanna-kruppe commented Feb 12, 2025

I'm not quite sure. It's definitely not the case that full type layout is always stored in the defining crate's metadata because that depends on generic parameters which are only determined (sometimes repeatedly) in downstream crates. For example, today one can define something like this:

#[repr(align(64))]
pub struct Align64<T>(pub T);

... and the actual layout of Align64<u32>, including its alignment, will be computed independently in several crates that don't know anything about each other. I don't know if the crate metadata currently stores only the attributes of Align64 and re-parses the repr(align(64)) on the fly, or if it's parsed into a rustc-level number up-front and that's also stored in the metadata. For repr(align(CACHELINE)), it would be terribly unsound if it was translated to a concrete number several times in different rustc sessions. Not just because of ABI mismatches but also const eval (align_of etc.) becoming non-deterministic. Storing the actual number it translates to in crate metadata probably avoids that problem, but I don't know if that's how the compiler works today. If not, it'll have to be done to support this feature. I assume it wouldn't be too hard but I might be missing something.

If every “cache line padded” type's alignment is determined by the flags the defining crate was compiled with, that would avoid the worst problems. But it still has some surprising consequences:

  • Two distinct repr(C, align(CACHELINE)) types with identical fields can have different layout if they come from different crates. This may break unsafe code that doesn't otherwise depend on "FFI" per se, just on the documented determinism of the repr(C) layout algorithm.
  • Similarly, repr(Rust) types with this alignment annotation may not actually have the same alignment (and not merely because their fields require even higher alignment), which can cause surprising compilation or runtime failures for safe functions like bytemuck::cast_slice.
  • Without -Zbuild-std, any usage of this feature in the standard library (or other sysroot crates) won't reflect otherwise-global compilation flags (e.g., passed to Cargo via RUSTFLAGS). In particular, the standard library exporting something like core::mem::CachePadded<T> may enforce a lower alignment than expected if there's flags that increase it.
  • If the standard library or any other crate exports a constant that's supposed to be equal to the alignment implied by the repr attribute, this won't actually be true when mixing crates compiled with different flags. That is, assert_eq!(align_of::<MyCacheLinePadded>(), core::mem::DESTRUCTIVE_INTERFERENCE_SIZE); could fail.

It may still be simpler to tell people "if you want to increase this constant, that's a target modifier" and direct them to -Zbuild-std or its eventual stable equivalent.

@the8472
Copy link
Member

the8472 commented Feb 12, 2025

This may break unsafe code that doesn't otherwise depend on "FFI" per se,

The unsafe code would somehow have to know that those separate definitions in separate crates are identical, including all their attributes, but somehow be unaware of the weird align value.

In particular, the standard library exporting something like core::mem::CachePadded may enforce a lower alignment than expected if there's flags that increase it.

We have those kinds of downsides with all other optimization-things in std too. But yeah, build-std would be great for this.

Similarly, repr(Rust) types with this alignment annotation may not actually have the same alignment (and not merely because their fields require even higher alignment), which can cause surprising compilation or runtime failures for safe functions like bytemuck::cast_slice.

Cache-padded structures are usually some internal runtime thing, they aren't portable, so they shouldn't get serialized or anything in the bytemuck territory.

@hanna-kruppe
Copy link

The unsafe code would somehow have to know that those separate definitions in separate crates are identical, including all their attributes, but somehow be unaware of the weird align value.

It has to be aware that the align(CACHELINE) can mean different things across crates. Which is not at all obvious, especially given that similar features in other systems languages are fixed globally (as part of the ABI). It's also not the case for any other repr(...) attribute today.

In any case, I don't want to litigate every possible side effect this may have. My main point is that a thing that's mostly thought of as a compile time constant, but is occasionally different across crates that are linked together, would be really weird. Especially if there's also an accompanying constant in the standard library for it, which necessarily has to pick one value that every crate sees. None of that is a fatal problem for this proposal, but either we say "this is actually a constant and not modified by compiler flags" (with the downsides already discussed in the C++ context) or we'll add a new entry to the "I did not know Rust was weird like that" trivia rotation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api
Projects
None yet
Development

No branches or pull requests

4 participants