-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lower to a memset(undef) when Rvalue::Repeat repeats uninit #138634
Conversation
This comment has been minimized.
This comment has been minimized.
02cf25d
to
591abdd
Compare
Going to bed so |
This comment has been minimized.
This comment has been minimized.
Some changes occurred in compiler/rustc_codegen_ssa |
Lower to a no-op when Rvalue::Repeat repeats uninit Fixes rust-lang#138625 r? oli-obk
// Do not generate the loop for zero-sized elements or empty arrays. | ||
if dest.layout.is_zst() { | ||
return; | ||
} | ||
|
||
// Do not generate the loop when the element is an uninit const. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsure: What is the loop writing into memory? What's in the cg_elem
? I'd have expected that a loop of writing undef
would trivially be optimized out be LLVM.
Is it possible that this check for "all undef" should be in the code that turns a constant into an OperandRef
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's doing a memcpy
of a constant
that is undef
:
@anon.b980601a342f3c2479dca1d3f2651cea.0 = private unnamed_addr constant <{ [24 x i8] }> undef, align 8
define void @uninit_arr_via_const(ptr dead_on_unwind noalias nocapture noundef writable sret([584 x i8]) align 8 dereferenceable(584) %_0) unnamed_addr {
start:
%_1 = alloca [576 x i8], align 8
call void @llvm.lifetime.start.p0(i64 576, ptr %_1)
br label %repeat_loop_header
repeat_loop_header:
%0 = phi i64 [ 0, %start ], [ %3, %repeat_loop_body ]
%1 = icmp ult i64 %0, 24
br i1 %1, label %repeat_loop_body, label %repeat_loop_next
repeat_loop_body:
%2 = getelementptr inbounds nuw %"core::mem::maybe_uninit::MaybeUninit<alloc::string::String>", ptr %_1, i64 %0
call void @llvm.memcpy.p0.p0.i64(ptr align 8 %2, ptr align 8 @anon.b980601a342f3c2479dca1d3f2651cea.0, i64 24, i1 false)
%3 = add nuw i64 %0, 1
br label %repeat_loop_header
repeat_loop_next:
store i64 0, ptr %_0, align 8
%4 = getelementptr inbounds i8, ptr %_0, i64 8
call void @llvm.memcpy.p0.p0.i64(ptr align 8 %4, ptr align 8 %_1, i64 576, i1 false)
call void @llvm.lifetime.end.p0(i64 576, ptr %_1)
ret void
}
After the loop gets unrolled, MemCpyOpt combines a store
and a bunch of memcpy
into a single memset
: https://godbolt.org/z/YsPqxTsjc
I agree that a more thorough approach might be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I'm also not sure if the approach I'm doing here will actually deoptimize in some cases because avoiding assigning undef
over some bytes may keep them initialized when they shouldn't be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've "addressed" all of this by changing the implementation to a memset
that writes undef to the destination. I'm not sure it makes sense to put a check in OperandRef
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Yeah, if it's making a constant and copying it that's not great.
(I really wonder if GVN should just stop making constants for anything but primitives, but that's a different conversation.)
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (8bc3b30): comparison URL. Overall result: no relevant changes - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)This benchmark run did not return any relevant results for this metric. CyclesResults (secondary -2.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary -0.0%, secondary 0.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 775.508s -> 776.263s (0.10%) |
591abdd
to
73f36ed
Compare
73f36ed
to
8e7d8dd
Compare
bx.memset( | ||
dest.val.llval, | ||
bx.const_undef(bx.type_i8()), | ||
size, | ||
dest.val.align, | ||
MemFlags::empty(), | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, llvm accepts storing undef even for things like array types https://llvm.godbolt.org/z/jYjaEYsTh -- even though using those types like that is wrong in most places -- so I don't actually know what's better here.
Eh, leave it as this is probably fine, since it's to llvm.memset
(not C's
memset) so the
undef` to it probably isn't UB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured we can memcpy
something that's undef, so surely we can memset
with an uninit u8
.
@@ -86,13 +86,30 @@ impl<'a, 'tcx, Bx: BuilderMethods<'a, 'tcx>> FunctionCx<'a, 'tcx, Bx> { | |||
} | |||
|
|||
mir::Rvalue::Repeat(ref elem, count) => { | |||
let cg_elem = self.codegen_operand(bx, elem); | |||
|
|||
// Do not generate the loop for zero-sized elements or empty arrays. | |||
if dest.layout.is_zst() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pondering: with this change we'll skip evaluating the operand to repeat if the target is zero-sized.
I think that's ok, though? If it's a Move
or Copy
at worst it's not move
ing the thing, but moving from something doesn't actually do anything right now anyway. And if it's a Constant
something else is supposed to have evaluated it before we get here, right? So we don't need to worry about [const { panic!() }; 10]
somehow getting skipped for a ZST element type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the whole mentioned items system is designed to guarantee that this isn't a problem:
rust/compiler/rustc_monomorphize/src/collector.rs
Lines 159 to 162 in 48b36c9
//! One important role of collection is to evaluate all constants that are used by all the items | |
//! which are being collected. Codegen can then rely on only encountering constants that evaluate | |
//! successfully, and if a constant fails to evaluate, the collector has much better context to be | |
//! able to show where this constant comes up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, though it'd probably be best to get confirmation from someone who actually understands CTFE memory to be sure how you're doing it is right.
(Seems fine, but I don't know what I don't know.)
@oli-obk can you check that my CTFE code in this PR is okay? I got this idea from this (resolved) discussion: #135335 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is correctly checking for fully uninit
@bors r=scottmcm,oli-obk |
…,oli-obk Lower to a memset(undef) when Rvalue::Repeat repeats uninit Fixes rust-lang#138625. It is technically correct to just do nothing. But if we actually do nothing, we may miss that this is de-initializing something, so instead we just lower to a single memset that writes undef. This is still superior to the memcpy loop, in both quality of code we hand to the backend and LLVM's final output.
💔 Test failed - checks-actions |
@bors retry GHA internal error |
☀️ Test successful - checks-actions |
What is this?This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.Comparing 1df5aff (parent) -> e61403a (this PR) Test differencesShow 2 test diffs
Job group index
|
Finished benchmarking commit (e61403a): comparison URL. Overall result: no relevant changes - no action needed@rustbot label: -perf-regression Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)Results (primary -1.7%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeResults (secondary 0.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 777.153s -> 776.063s (-0.14%) |
Fixes #138625.
It is technically correct to just do nothing. But if we actually do nothing, we may miss that this is de-initializing something, so instead we just lower to a single memset that writes undef. This is still superior to the memcpy loop, in both quality of code we hand to the backend and LLVM's final output.