Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lower to a memset(undef) when Rvalue::Repeat repeats uninit #138634

Merged
merged 1 commit into from
Mar 25, 2025

Conversation

saethlin
Copy link
Member

@saethlin saethlin commented Mar 18, 2025

Fixes #138625.

It is technically correct to just do nothing. But if we actually do nothing, we may miss that this is de-initializing something, so instead we just lower to a single memset that writes undef. This is still superior to the memcpy loop, in both quality of code we hand to the backend and LLVM's final output.

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 18, 2025
@rust-log-analyzer

This comment has been minimized.

@saethlin saethlin force-pushed the repeated-uninit branch 2 times, most recently from 02cf25d to 591abdd Compare March 18, 2025 03:52
@saethlin
Copy link
Member Author

Going to bed so
@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 18, 2025
@saethlin saethlin marked this pull request as ready for review March 18, 2025 03:54
@rustbot
Copy link
Collaborator

rustbot commented Mar 18, 2025

Some changes occurred in compiler/rustc_codegen_ssa

cc @WaffleLapkin

bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 18, 2025
Lower to a no-op when Rvalue::Repeat repeats uninit

Fixes rust-lang#138625

r? oli-obk
@bors
Copy link
Contributor

bors commented Mar 18, 2025

⌛ Trying commit 591abdd with merge 8bc3b30...

// Do not generate the loop for zero-sized elements or empty arrays.
if dest.layout.is_zst() {
return;
}

// Do not generate the loop when the element is an uninit const.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure: What is the loop writing into memory? What's in the cg_elem? I'd have expected that a loop of writing undef would trivially be optimized out be LLVM.

Is it possible that this check for "all undef" should be in the code that turns a constant into an OperandRef instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's doing a memcpy of a constant that is undef:

@anon.b980601a342f3c2479dca1d3f2651cea.0 = private unnamed_addr constant <{ [24 x i8] }> undef, align 8

define void @uninit_arr_via_const(ptr dead_on_unwind noalias nocapture noundef writable sret([584 x i8]) align 8 dereferenceable(584) %_0) unnamed_addr {
start:
  %_1 = alloca [576 x i8], align 8
  call void @llvm.lifetime.start.p0(i64 576, ptr %_1)
  br label %repeat_loop_header

repeat_loop_header:
  %0 = phi i64 [ 0, %start ], [ %3, %repeat_loop_body ]
  %1 = icmp ult i64 %0, 24
  br i1 %1, label %repeat_loop_body, label %repeat_loop_next

repeat_loop_body:
  %2 = getelementptr inbounds nuw %"core::mem::maybe_uninit::MaybeUninit<alloc::string::String>", ptr %_1, i64 %0
  call void @llvm.memcpy.p0.p0.i64(ptr align 8 %2, ptr align 8 @anon.b980601a342f3c2479dca1d3f2651cea.0, i64 24, i1 false)
  %3 = add nuw i64 %0, 1
  br label %repeat_loop_header

repeat_loop_next:
  store i64 0, ptr %_0, align 8
  %4 = getelementptr inbounds i8, ptr %_0, i64 8
  call void @llvm.memcpy.p0.p0.i64(ptr align 8 %4, ptr align 8 %_1, i64 576, i1 false)
  call void @llvm.lifetime.end.p0(i64 576, ptr %_1)
  ret void
}

After the loop gets unrolled, MemCpyOpt combines a store and a bunch of memcpy into a single memset: https://godbolt.org/z/YsPqxTsjc

I agree that a more thorough approach might be useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I'm also not sure if the approach I'm doing here will actually deoptimize in some cases because avoiding assigning undef over some bytes may keep them initialized when they shouldn't be.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've "addressed" all of this by changing the implementation to a memset that writes undef to the destination. I'm not sure it makes sense to put a check in OperandRef.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Yeah, if it's making a constant and copying it that's not great.

(I really wonder if GVN should just stop making constants for anything but primitives, but that's a different conversation.)

@bors
Copy link
Contributor

bors commented Mar 18, 2025

☀️ Try build successful - checks-actions
Build commit: 8bc3b30 (8bc3b304caf17d21c24a968b2b7cf8f7c068adb3)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (8bc3b30): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

Results (secondary -2.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.1% [-2.1%, -2.1%] 1
All ❌✅ (primary) - - 0

Binary size

Results (primary -0.0%, secondary 0.0%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
0.0% [0.0%, 0.0%] 1
Improvements ✅
(primary)
-0.0% [-0.0%, -0.0%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.0% [-0.0%, -0.0%] 2

Bootstrap: 775.508s -> 776.263s (0.10%)
Artifact size: 365.09 MiB -> 365.09 MiB (0.00%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 18, 2025
@saethlin saethlin changed the title Lower to a no-op when Rvalue::Repeat repeats uninit Lower to a memset(undef) when Rvalue::Repeat repeats uninit Mar 20, 2025
Comment on lines +100 to +106
bx.memset(
dest.val.llval,
bx.const_undef(bx.type_i8()),
size,
dest.val.align,
MemFlags::empty(),
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, llvm accepts storing undef even for things like array types https://llvm.godbolt.org/z/jYjaEYsTh -- even though using those types like that is wrong in most places -- so I don't actually know what's better here.

Eh, leave it as this is probably fine, since it's to llvm.memset (not C's memset) so the undef` to it probably isn't UB.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured we can memcpy something that's undef, so surely we can memset with an uninit u8.

@@ -86,13 +86,30 @@ impl<'a, 'tcx, Bx: BuilderMethods<'a, 'tcx>> FunctionCx<'a, 'tcx, Bx> {
}

mir::Rvalue::Repeat(ref elem, count) => {
let cg_elem = self.codegen_operand(bx, elem);

// Do not generate the loop for zero-sized elements or empty arrays.
if dest.layout.is_zst() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pondering: with this change we'll skip evaluating the operand to repeat if the target is zero-sized.

I think that's ok, though? If it's a Move or Copy at worst it's not moveing the thing, but moving from something doesn't actually do anything right now anyway. And if it's a Constant something else is supposed to have evaluated it before we get here, right? So we don't need to worry about [const { panic!() }; 10] somehow getting skipped for a ZST element type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the whole mentioned items system is designed to guarantee that this isn't a problem:

//! One important role of collection is to evaluate all constants that are used by all the items
//! which are being collected. Codegen can then rely on only encountering constants that evaluate
//! successfully, and if a constant fails to evaluate, the collector has much better context to be
//! able to show where this constant comes up.

Copy link
Member

@scottmcm scottmcm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, though it'd probably be best to get confirmation from someone who actually understands CTFE memory to be sure how you're doing it is right.

(Seems fine, but I don't know what I don't know.)

@saethlin
Copy link
Member Author

saethlin commented Mar 22, 2025

it'd probably be best to get confirmation from someone who actually understands CTFE memory to be sure how you're doing it is right.

@oli-obk can you check that my CTFE code in this PR is okay? I got this idea from this (resolved) discussion: #135335 (comment)

Copy link
Contributor

@oli-obk oli-obk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is correctly checking for fully uninit

@saethlin
Copy link
Member Author

@bors r=scottmcm,oli-obk

@bors
Copy link
Contributor

bors commented Mar 24, 2025

📌 Commit 8e7d8dd has been approved by scottmcm,oli-obk

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 24, 2025
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 24, 2025
…,oli-obk

Lower to a memset(undef) when Rvalue::Repeat repeats uninit

Fixes rust-lang#138625.

It is technically correct to just do nothing. But if we actually do nothing, we may miss that this is de-initializing something, so instead we just lower to a single memset that writes undef. This is still superior to the memcpy loop, in both quality of code we hand to the backend and LLVM's final output.
@bors
Copy link
Contributor

bors commented Mar 24, 2025

⌛ Testing commit 8e7d8dd with merge 0f37b5c...

@bors
Copy link
Contributor

bors commented Mar 24, 2025

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Mar 24, 2025
@rust-log-analyzer
Copy link
Collaborator

A job failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

@saethlin
Copy link
Member Author

@bors retry GHA internal error

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 24, 2025
@bors
Copy link
Contributor

bors commented Mar 25, 2025

⌛ Testing commit 8e7d8dd with merge e61403a...

@bors
Copy link
Contributor

bors commented Mar 25, 2025

☀️ Test successful - checks-actions
Approved by: scottmcm,oli-obk
Pushing e61403a to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Mar 25, 2025
@bors bors merged commit e61403a into rust-lang:master Mar 25, 2025
7 checks passed
@rustbot rustbot added this to the 1.87.0 milestone Mar 25, 2025
Copy link

What is this? This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.

Comparing 1df5aff (parent) -> e61403a (this PR)

Test differences

Show 2 test diffs
  • [codegen] tests/codegen/uninit-repeat-in-aggregate.rs (stage 2): [missing] -> pass (J0)
  • [codegen] tests/codegen/uninit-repeat-in-aggregate.rs (stage 1): [missing] -> pass (J1)

Job group index

  • J0: aarch64-apple, aarch64-gnu, arm-android, armhf-gnu, dist-i586-gnu-i586-i686-musl, i686-gnu-1, i686-gnu-nopt-1, i686-msvc-1, test-various, x86_64-apple-1, x86_64-gnu, x86_64-gnu-llvm-18-1, x86_64-gnu-llvm-18-2, x86_64-gnu-llvm-19-1, x86_64-gnu-llvm-19-2, x86_64-gnu-nopt, x86_64-gnu-stable, x86_64-mingw-1, x86_64-msvc-1
  • J1: x86_64-gnu-llvm-18-3, x86_64-gnu-llvm-19-3

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (e61403a): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results (primary -1.7%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.7% [-2.6%, -0.8%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -1.7% [-2.6%, -0.8%] 2

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

Results (secondary 0.0%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
0.0% [0.0%, 0.0%] 4
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Bootstrap: 777.153s -> 776.063s (-0.14%)
Artifact size: 365.84 MiB -> 365.84 MiB (-0.00%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missed optimization when using const-blocks for MaybeUninit arrays
7 participants