Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Rc<T>::deref and Arc<T>::deref zero-cost #132553

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

EFanZh
Copy link
Contributor

@EFanZh EFanZh commented Nov 3, 2024

Currently, Rc<T> and Arc<T> store pointers to RcInner<T> and ArcInner<T>. This PR changes the pointers so that they point to T directly instead.

This is based on the assumption that we access the T value more frequently than accessing reference counts. With this change, accessing the data can be done without offsetting pointers from RcInner<T> and ArcInner<T> to their contained data. This change might also enables some possibly useful future optimizations, such as:

  • Convert &[Rc<T>] into &[&T] within O(1) time.
  • Convert &[Rc<T>] into Vec<&T> utilizing memcpy.
  • Convert &Option<Rc<T>> into Option<&T> without branching.
  • Make Rc<T> and Arc<T> FFI compatible types where T: Sized.

@rustbot
Copy link
Collaborator

rustbot commented Nov 3, 2024

r? @jhpratt

rustbot has assigned @jhpratt.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Nov 3, 2024
@EFanZh EFanZh force-pushed the zero-cost-rc-arc-deref branch from b283c44 to ae36f44 Compare November 3, 2024 09:14
@rust-log-analyzer

This comment has been minimized.

@marmeladema
Copy link
Contributor

Would it potentially enable those types to have an ffi compatible ABI? So that they could be returned and passed directly from /to ffi function, like Box?

@rust-log-analyzer

This comment has been minimized.

@EFanZh
Copy link
Contributor Author

EFanZh commented Nov 3, 2024

Would it potentially enable those types to have an ffi compatible ABI? So that they could be returned and passed directly from /to ffi function, like Box?

I think in theory it is possible, at least for sized types, but I am not familiar with how to formally make it so.

@EFanZh EFanZh force-pushed the zero-cost-rc-arc-deref branch from ae36f44 to 0d6165f Compare November 3, 2024 11:21
@rust-log-analyzer

This comment has been minimized.

@EFanZh EFanZh force-pushed the zero-cost-rc-arc-deref branch from 0d6165f to 98edd5b Compare November 3, 2024 13:06
@rust-log-analyzer

This comment has been minimized.

@jhpratt
Copy link
Member

jhpratt commented Nov 3, 2024

r? libs

@rustbot rustbot assigned joboet and unassigned jhpratt Nov 3, 2024
@EFanZh EFanZh force-pushed the zero-cost-rc-arc-deref branch from 98edd5b to 8beb51d Compare November 4, 2024 16:29
@rust-log-analyzer

This comment has been minimized.

@EFanZh EFanZh force-pushed the zero-cost-rc-arc-deref branch from 8beb51d to d7879fa Compare November 4, 2024 17:26
@rust-log-analyzer

This comment has been minimized.

@EFanZh EFanZh force-pushed the zero-cost-rc-arc-deref branch from d7879fa to 317aa0e Compare November 4, 2024 18:40
@joboet
Copy link
Member

joboet commented Nov 7, 2024

@EFanZh Is this ready for review? If so, please un-draft the PR.

@EFanZh
Copy link
Contributor Author

EFanZh commented Nov 7, 2024

@joboet: The source code part is mostly done, but I haven’t finished updating LLDB and CDB pretty printers. The CI doesn’t seem to run those tests.

@joboet
Copy link
Member

joboet commented Nov 8, 2024

No worries! I just didn't want to keep you waiting in case you had forgotten to change the state.
@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 8, 2024
@EFanZh EFanZh force-pushed the zero-cost-rc-arc-deref branch 3 times, most recently from f243654 to 1308bf6 Compare November 11, 2024 18:35
@EFanZh EFanZh force-pushed the zero-cost-rc-arc-deref branch 2 times, most recently from 384ea40 to 0bdb018 Compare March 9, 2025 07:18
@scottmcm
Copy link
Member

Neutral-ish on icounts, improved on cycles, and even shrinks optimized binaries? Nice.

//!
//! - Making reference-counting pointers have ABI-compatible representation as raw pointers so we
//! can use them directly in FFI interfaces.
//! - Converting `Option<Rc<T>>` to `Option<&T>` with a memory copy operation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this one should optimize to that already with what you've already written here, right?

You could consider adding a codegen test, like

use std::sync::Arc;

#[no_mangle]
pub fn option_arc_as_deref_is_nop(x: &Option<Arc<i32>>) -> Option<&i32> {
    // CHECK-LABEL: @option_arc_as_deref_is_nop(ptr
    // CHECK: %[[R:.+]] = load ptr, ptr %x
    // CHECK: ret ptr %[[R]]
    x.as_deref()
}

Copy link
Contributor Author

@EFanZh EFanZh Mar 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion. I meant that only a memory copy operation should be used. Specifically, the function should not check for None values. Option<Rc<T>>::deref should generate the same assembly as Option<Box<T>>::deref, which is not currently the case: https://godbolt.org/z/MeEfjs96K.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "already with what you've already written" I mean with this PR. I agree it doesn't happen on nightly, but if this is one of the things that this PR should do, then adding a codegen test to confirm it is a good demonstration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the corresponding codegen tests in tests/codegen/option-rc-as-deref-no-cmp.rs.

Comment on lines 362 to 368
impl Deref for RcLayout {
type Target = Layout;

fn deref(&self) -> &Self::Target {
&self.0
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, if "external" things should only use the inner layout field through this deref, would it be worth putting RcLayout in a separate module to enforce that with privacy?

(This is one of those places that really wants to be able to just do unsafe struct RcLayout(Layout); to enforce it that way...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 370 to 373
trait RcLayoutExt {
/// Computes `RcLayout` at compile time if `Self` is `Sized`.
const RC_LAYOUT: RcLayout;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice that we only need one of these, since Rc and Arc can shared the constant 👍

Comment on lines 389 to 393
unsafe fn ref_counts_ptr_from_value_ptr(value_ptr: NonNull<()>) -> NonNull<RefCounts> {
const REF_COUNTS_OFFSET: usize = size_of::<RefCounts>();

unsafe { value_ptr.byte_sub(REF_COUNTS_OFFSET) }.cast()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ymmv: since you need to cast in this function anyway, you could consider avoiding the need for the cast-to-unit in the callers of this by having this be something like

Suggested change
unsafe fn ref_counts_ptr_from_value_ptr(value_ptr: NonNull<()>) -> NonNull<RefCounts> {
const REF_COUNTS_OFFSET: usize = size_of::<RefCounts>();
unsafe { value_ptr.byte_sub(REF_COUNTS_OFFSET) }.cast()
}
unsafe fn ref_counts_ptr_from_value_ptr<T: ?Sized>(value_ptr: NonNull<T>) -> NonNull<RefCounts> {
unsafe { value_ptr.cast::<RefCounts>().sub(1) }
}

(That ought to simplify the MIR too, since byte_sub has to cast to NonNull<u8> then cast back again, but if you cast and can just sub you avoid that step. Of course the conversions like that are optimized out by LLVM anyway, but...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 395 to 419
/// Get a pointer to the strong counter object in the same allocation with a value pointed to by
/// `value_ptr`.
///
/// # Safety
///
/// - `value_ptr` must point to a value object (can be uninitialized or dropped) that lives in a
/// reference-counted allocation.
unsafe fn strong_count_ptr_from_value_ptr(value_ptr: NonNull<()>) -> NonNull<UnsafeCell<usize>> {
const STRONG_OFFSET: usize = size_of::<RefCounts>() - mem::offset_of!(RefCounts, strong);

unsafe { value_ptr.byte_sub(STRONG_OFFSET) }.cast()
}

/// Get a pointer to the weak counter object in the same allocation with a value pointed to by
/// `value_ptr`.
///
/// # Safety
///
/// - `value_ptr` must point to a value object (can be uninitialized or dropped) that lives in a
/// reference-counted allocation.
unsafe fn weak_count_ptr_from_value_ptr(value_ptr: NonNull<()>) -> NonNull<UnsafeCell<usize>> {
const WEAK_OFFSET: usize = size_of::<RefCounts>() - mem::offset_of!(RefCounts, weak);

unsafe { value_ptr.byte_sub(WEAK_OFFSET) }.cast()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Can you avoid doing manual layout calculations here?

If you converted to a NonNull<RefCounts> first, then &raw can just mention the field to get its pointer, rather than needing to offset_of and deal in raw bytes.

(I think that'd let you drop the repr(C) on RefCounts too, which would be nice. I don't think there should be a need for it -- I don't think any of the logic here really cares whether the strong or weak count is first in memory.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The #[repr(C)] is used for:

  • Ensuring size_of::<RefCounts>().is_power_of_two().
  • Ensuring the alignment of reference counters is suitable for atomic operations.

Without #[repr(C)], these conditions can’t be reliably guaranteed.

Copy link
Member

@scottmcm scottmcm Mar 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But are those conditions actually important?

  • You don't need the size to be a power of two if you're using Layout::extend instead of hand-calculating it. And whatever its size, it's a constant, so LLVM can optimize based on its value already.
  • The most natural way for them to be aligned enough for atomics would be to just have them be atomic types, like I discuss in Make Rc<T>::deref and Arc<T>::deref zero-cost #132553 (comment) Or if you don't want two different types, just make them always AtomicUsize and convert to Cell<usize> in Rc's uses of them -- then there's no need for manual alignment, and there's less total converting needed. (Rc would still need the as_ptr+from_ptr dance, but Arc wouldn't.)

And either way, I don't think either of the things you mentioned are needed in this method, which could just use &raw AFAICT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, the size does not need to be a power of two, not sure why I was convinced that condition needs to be satisfied. I’ll update the corresponding code.

Comment on lines 272 to 275
#[inline]
fn value_offset(&self) -> usize {
size_of::<RefCounts>().max(self.align())
}
Copy link
Member

@scottmcm scottmcm Mar 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: it seems unfortunate to hand-calculate this, when Layout::extend returns it. Could the RcLayout maybe store the Layout and the usize offset? Right now try_from_value_layout is just ignoring the offset, when it could return it instead, for example.

(AFAICT the RcLayouts are never stored long-term, so making them a bit bigger would be fine.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early designs did include the offset in RcLayout, but I noticed binary size regressions at that time, possibly due to more values had to be passed by callers. That may have changed with current inlining strategy, but given the current seemingly OK perf results, I’d prefer not to risk it here. Probably better to explore this strategy separately.

/// - `rc_layout` correctly describes the memory layout of the reference-counted allocation.
#[inline]
unsafe fn init_rc_allocation<const STRONG_COUNT: usize>(
rc_ptr: NonNull<[u8]>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ymmv: given that you don't need the metadata of the slice here, maybe just take the value_ptr: NonNull<()> instead?

This is one of the few places that the code is passing around the allocation pointer instead of the value pointer, and I think it'd be nice for consistency to say that nothing in the module passes around the allocation pointer. If the try_allocate functions just immediately did the offset, there'd never be a question of whether a pointer was to the allocation or not, because everything, even the inits, would only ever be dealing in the value pointers, not the allocation pointers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now most of the pointers should be value pointers.

@scottmcm scottmcm self-assigned this Mar 22, 2025
// Ensure the value pointer in `self` is updated to `new_ptr`.
let update_ptr_on_drop = SetRcPtrOnDrop { rc: self, new_ptr };

// The strong count .
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix: I think this sentence never got finished?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is updated.

Comment on lines 1723 to 1726
unsafe fn from_iter_exact<I>(iter: I, length: usize) -> Self
where
A: Allocator + Default,
I: Iterator<Item = T>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsure: can it be TrustedLen? How do we know its count if not for the iter being trusted?

minor: can it used a different word from "exact"? To me in iterator contexts that makes me think ExactSizeIterator, which is not what's happening here and wouldn't be right anyway because it's not an unsafe trait.

Comment on lines 1397 to 1399
pub(crate) unsafe fn get_mut_unchecked(&mut self) -> &mut T {
unsafe { self.weak.ptr.as_mut() }
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this lost its comment about not giving &mut to the counts.

Yes, the natural implementation no longer would do that, but I think it's still good to emphasize that we only have &mut to the value since weaks can still read the counts simultaneously despite the unique reference to the value.

Comment on lines +287 to +296
let count = count.wrapping_add(1);

*count_ref = count;

// We want to abort on overflow instead of dropping the value.
// Checking for overflow after the store instead of before
// allows for slightly better code generation.
if intrinsics::unlikely(count == 0) {
intrinsics::abort();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels odd that it needs the unlikely -- after all, aborting has to be rare -- and that it's written with a zero check instead of using overflowing_add to check for the overflow, but I guess you didn't change this from how it was before so that's not really this PR's problem.

Comment on lines 102 to 113
/// Increment a reference counter managed by `RawRc` and `RawWeak`. Currently, both strong and
/// weak reference counters are incremented by this method.
///
/// # Safety
///
/// - `count` should only be handled by the same `RcOps` implementation.
/// - The value of `count` should be non-zero.
unsafe fn increment_ref_count(count: &UnsafeCell<usize>);

/// Decrement a reference counter managed by `RawRc` and `RawWeak`. Currently, both strong and
/// weak reference counters are decremented by this method. Returns whether the reference count
/// becomes zero after decrementing.
Copy link
Member

@scottmcm scottmcm Mar 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Currently" is a poor thing for trait documentation, in my opinion, especially on an unsafe method of an unsafe trait. It akes me wonder whether this is really the right abstraction. If both need to be handled by the same code, that seems like something the implementations could do internally.

Come to think of it, why does this ever deal in UnsafeCells directly? If these two just split into strong and weak versions, then everything could deal in ref_counts: &RefCounts instead, leaving the details up to the implementer, and the generic code wouldn't ever need to project into the specific fields.

For that matter, could it then just be &self for everything? It's be nice to not have to deal with the UnsafeCells in the implementations, and if they could have their own types then Rc would cell Cell<usize> and Arc would see AtomicUsize, no need for all the converting from &UnsafeCell.

If we're going to mono on RcOps anyway, could we mono on the counts type instead? There's something nice about saying that RawRc doesn't even enough about the fields on the counts, and doesn't care. And TBH, I don't think it does care -- even if it was a weird size, the value_ptr.cast::<CountsType>().sub(1) approach always works because we made sure the value_ptr is aligned enough for both the counts type and the value, so even if things ended up being padded or not aligned as much as we expected or whatever, that ought to be fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(TBH, it'd be cool if that set us up to easily offer a smart pointer with non-usize counts, too, since those are way too big on 64-bit for any real use. Arguably RawRc would ideally not even know how big the counts are, just whether they ended up being zero or not for dropping.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If both need to be handled by the same code, that seems like something the implementations could do internally.

Some early designs did use separate methods for weak and strong counters, but the resulting implementation sees a little complicated to me. I don’t see the implementation changing for the foreseeable future, so I prefer simplicity over extensibility. If we do need to separate the methods in the future, we can do it trivially.

Come to think of it, why does this ever deal in UnsafeCells directly?

I want to move as much code as possible into the raw_rc module to better share code between Rc and Arc. Passing RefCounts might duplicate the operation of calculating field offsets on the implementor side, which I would like to avoid.

If we're going to mono on RcOps anyway, could we mono on the counts type instead?

Monomorphizing on the counts type would require using two different counts types that essentially do the same thing, which leads to code duplication and more complex debugger visualizer implementation. Even if we really need two different counts types, we can add an associated type to the RcOps trait as the counts type to achieve the same effect.

Comment on lines 714 to 717
#[inline]
fn is_dangling(value_ptr: NonNull<()>) -> bool {
value_ptr.addr() == NonZeroUsize::MAX
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix: I didn't see a comment on new_dangling_in for why this value is chosen for dangling, so there should be one either here or there.

Notably, Usize::MAX used to be chosen because when it was pointing at the counts, that was obviously an invalid location for the counts that would go past it. But there's nothing inherently invalid about a value pointer having that address -- a () can live there no problem.

Now, it's still a possible choice assuming that the counts need an alignment above 1, since we ensure that the value_ptr is aligned at least as much as the counts. But there's also other choices of dangling values that would work, so which one is being used should be commented.

For example, the dangling value pointer could just be without_provenance(1) -- that's a closer equivalent to the old one, arguably, since before we were using the largest address because the counts were after it, whereas now we're depending on storing the counts before the value pointer, where obviously (for a non-zero-sized counts) 1 is invalid because the counts would be at the null address.

Copy link
Member

@scottmcm scottmcm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this, and especially finding a way to keep the debug visualizers working.

It generally looks good to me, though TBH my eyes are starting to glaze over a bit after multiple 3k-line files. I've left a bunch of thoughts as I was going along, most of them should be hopefully easy or don't actually need anything this PR.

The one big one is https://github.com/rust-lang/rust/pull/132553/files#r2008914982 about how best to handle what's essentially the strategy pattern here. Don't take that as an immutable directive, though, I'd be happy to discuss how best to handle the thoughts I had there.

(And if you wouldn't mind, it'd be great to find some useful file splits for the raw_rc file -- can any of it be split out to help people review in chunks?)

@EFanZh EFanZh force-pushed the zero-cost-rc-arc-deref branch 2 times, most recently from 178880a to bd899f5 Compare March 23, 2025 17:45
@rust-log-analyzer

This comment has been minimized.

@EFanZh EFanZh force-pushed the zero-cost-rc-arc-deref branch from bd899f5 to 17888f9 Compare March 24, 2025 13:53
@rust-log-analyzer

This comment has been minimized.

@EFanZh EFanZh force-pushed the zero-cost-rc-arc-deref branch 2 times, most recently from 42b7a8b to 5224b3c Compare March 24, 2025 15:27
@rust-log-analyzer

This comment has been minimized.

@EFanZh EFanZh force-pushed the zero-cost-rc-arc-deref branch from 5224b3c to e510e8c Compare March 24, 2025 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf-regression Performance regression. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet