Lower to a memset(undef) when Rvalue::Repeat repeats uninit #138634

saethlin · 2025-03-18T01:30:03Z

It is technically correct to just do nothing. But if we actually do nothing, we may miss that this is de-initializing something, so instead we just lower to a single memset that writes undef. This is still superior to the memcpy loop, in both quality of code we hand to the backend and LLVM's final output.

saethlin · 2025-03-18T03:54:08Z

Going to bed so
@bors try @rust-timer queue

rustbot · 2025-03-18T03:54:17Z

Some changes occurred in compiler/rustc_codegen_ssa

cc @WaffleLapkin

Lower to a no-op when Rvalue::Repeat repeats uninit Fixes rust-lang#138625 r? oli-obk

bors · 2025-03-18T03:55:20Z

⌛ Trying commit 591abdd with merge 8bc3b30...

scottmcm · 2025-03-18T04:06:11Z

compiler/rustc_codegen_ssa/src/mir/rvalue.rs

                // Do not generate the loop for zero-sized elements or empty arrays.
                if dest.layout.is_zst() {
                    return;
                }

+                // Do not generate the loop when the element is an uninit const.


Unsure: What is the loop writing into memory? What's in the cg_elem? I'd have expected that a loop of writing undef would trivially be optimized out be LLVM.

Is it possible that this check for "all undef" should be in the code that turns a constant into an OperandRef instead?

It's doing a memcpy of a constant that is undef:

@anon.b980601a342f3c2479dca1d3f2651cea.0 = private unnamed_addr constant <{ [24 x i8] }> undef, align 8 define void @uninit_arr_via_const(ptr dead_on_unwind noalias nocapture noundef writable sret([584 x i8]) align 8 dereferenceable(584) %_0) unnamed_addr { start: %_1 = alloca [576 x i8], align 8 call void @llvm.lifetime.start.p0(i64 576, ptr %_1) br label %repeat_loop_header repeat_loop_header: %0 = phi i64 [ 0, %start ], [ %3, %repeat_loop_body ] %1 = icmp ult i64 %0, 24 br i1 %1, label %repeat_loop_body, label %repeat_loop_next repeat_loop_body: %2 = getelementptr inbounds nuw %"core::mem::maybe_uninit::MaybeUninit<alloc::string::String>", ptr %_1, i64 %0 call void @llvm.memcpy.p0.p0.i64(ptr align 8 %2, ptr align 8 @anon.b980601a342f3c2479dca1d3f2651cea.0, i64 24, i1 false) %3 = add nuw i64 %0, 1 br label %repeat_loop_header repeat_loop_next: store i64 0, ptr %_0, align 8 %4 = getelementptr inbounds i8, ptr %_0, i64 8 call void @llvm.memcpy.p0.p0.i64(ptr align 8 %4, ptr align 8 %_1, i64 576, i1 false) call void @llvm.lifetime.end.p0(i64 576, ptr %_1) ret void }

After the loop gets unrolled, MemCpyOpt combines a store and a bunch of memcpy into a single memset: https://godbolt.org/z/YsPqxTsjc

I agree that a more thorough approach might be useful.

🤔 I'm also not sure if the approach I'm doing here will actually deoptimize in some cases because avoiding assigning undef over some bytes may keep them initialized when they shouldn't be.

I've "addressed" all of this by changing the implementation to a memset that writes undef to the destination. I'm not sure it makes sense to put a check in OperandRef.

Ah, I see. Yeah, if it's making a constant and copying it that's not great.

(I really wonder if GVN should just stop making constants for anything but primitives, but that's a different conversation.)

tests/codegen/uninit-repeat-in-aggregate.rs

bors · 2025-03-18T06:02:10Z

☀️ Try build successful - checks-actions
Build commit: 8bc3b30 (8bc3b304caf17d21c24a968b2b7cf8f7c068adb3)

rust-timer · 2025-03-18T07:18:25Z

Finished benchmarking commit (8bc3b30): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

Results (secondary -2.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-2.1%	[-2.1%, -2.1%]	1
All ❌✅ (primary)	-	-	0

Binary size

Results (primary -0.0%, secondary 0.0%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	0.0%	[0.0%, 0.0%]	1
Improvements ✅ (primary)	-0.0%	[-0.0%, -0.0%]	2
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-0.0%	[-0.0%, -0.0%]	2

Bootstrap: 775.508s -> 776.263s (0.10%)
Artifact size: 365.09 MiB -> 365.09 MiB (0.00%)

scottmcm · 2025-03-21T23:52:51Z

compiler/rustc_codegen_ssa/src/mir/rvalue.rs

+                        bx.memset(
+                            dest.val.llval,
+                            bx.const_undef(bx.type_i8()),
+                            size,
+                            dest.val.align,
+                            MemFlags::empty(),
+                        );


Hmm, llvm accepts storing undef even for things like array types https://llvm.godbolt.org/z/jYjaEYsTh -- even though using those types like that is wrong in most places -- so I don't actually know what's better here.

Eh, leave it as this is probably fine, since it's to llvm.memset (not C's memset) so the undef` to it probably isn't UB.

I figured we can memcpy something that's undef, so surely we can memset with an uninit u8.

scottmcm · 2025-03-21T23:57:26Z

compiler/rustc_codegen_ssa/src/mir/rvalue.rs

@@ -86,13 +86,30 @@ impl<'a, 'tcx, Bx: BuilderMethods<'a, 'tcx>> FunctionCx<'a, 'tcx, Bx> {
            }

            mir::Rvalue::Repeat(ref elem, count) => {
-                let cg_elem = self.codegen_operand(bx, elem);
-
                // Do not generate the loop for zero-sized elements or empty arrays.
                if dest.layout.is_zst() {


pondering: with this change we'll skip evaluating the operand to repeat if the target is zero-sized.

I think that's ok, though? If it's a Move or Copy at worst it's not moveing the thing, but moving from something doesn't actually do anything right now anyway. And if it's a Constant something else is supposed to have evaluated it before we get here, right? So we don't need to worry about [const { panic!() }; 10] somehow getting skipped for a ZST element type?

Yes, the whole mentioned items system is designed to guarantee that this isn't a problem:

rust/compiler/rustc_monomorphize/src/collector.rs

Lines 159 to 162 in 48b36c9

//! One important role of collection is to evaluate all constants that are used by all the items

//! which are being collected. Codegen can then rely on only encountering constants that evaluate

//! successfully, and if a constant fails to evaluate, the collector has much better context to be

//! able to show where this constant comes up.

scottmcm

This looks good to me, though it'd probably be best to get confirmation from someone who actually understands CTFE memory to be sure how you're doing it is right.

(Seems fine, but I don't know what I don't know.)

saethlin · 2025-03-22T02:17:39Z

it'd probably be best to get confirmation from someone who actually understands CTFE memory to be sure how you're doing it is right.

@oli-obk can you check that my CTFE code in this PR is okay? I got this idea from this (resolved) discussion: #135335 (comment)

oli-obk

Yes, this is correctly checking for fully uninit

saethlin · 2025-03-24T13:16:25Z

@bors r=scottmcm,oli-obk

bors · 2025-03-24T13:16:28Z

📌 Commit 8e7d8dd has been approved by scottmcm,oli-obk

It is now in the queue for this repository.

…,oli-obk Lower to a memset(undef) when Rvalue::Repeat repeats uninit Fixes rust-lang#138625. It is technically correct to just do nothing. But if we actually do nothing, we may miss that this is de-initializing something, so instead we just lower to a single memset that writes undef. This is still superior to the memcpy loop, in both quality of code we hand to the backend and LLVM's final output.

bors · 2025-03-24T19:40:28Z

⌛ Testing commit 8e7d8dd with merge 0f37b5c...

bors · 2025-03-24T19:42:34Z

💔 Test failed - checks-actions

rust-log-analyzer · 2025-03-24T19:42:47Z

A job failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

saethlin · 2025-03-24T21:24:26Z

@bors retry GHA internal error

bors · 2025-03-25T02:09:17Z

⌛ Testing commit 8e7d8dd with merge e61403a...

bors · 2025-03-25T05:25:35Z

☀️ Test successful - checks-actions
Approved by: scottmcm,oli-obk
Pushing e61403a to master...

github-actions · 2025-03-25T05:27:42Z

What is this?

This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.

Comparing 1df5aff (parent) -> e61403a (this PR)

Test differences

Show 2 test diffs

[codegen] tests/codegen/uninit-repeat-in-aggregate.rs (stage 2): [missing] -> pass (J0)
[codegen] tests/codegen/uninit-repeat-in-aggregate.rs (stage 1): [missing] -> pass (J1)

Job group index

J0: aarch64-apple, aarch64-gnu, arm-android, armhf-gnu, dist-i586-gnu-i586-i686-musl, i686-gnu-1, i686-gnu-nopt-1, i686-msvc-1, test-various, x86_64-apple-1, x86_64-gnu, x86_64-gnu-llvm-18-1, x86_64-gnu-llvm-18-2, x86_64-gnu-llvm-19-1, x86_64-gnu-llvm-19-2, x86_64-gnu-nopt, x86_64-gnu-stable, x86_64-mingw-1, x86_64-msvc-1
J1: x86_64-gnu-llvm-18-3, x86_64-gnu-llvm-19-3

rust-timer · 2025-03-25T06:47:40Z

Finished benchmarking commit (e61403a): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results (primary -1.7%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.7%	[-2.6%, -0.8%]	2
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.7%	[-2.6%, -0.8%]	2

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

Results (secondary 0.0%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	0.0%	[0.0%, 0.0%]	4
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Bootstrap: 777.153s -> 776.063s (-0.14%)
Artifact size: 365.84 MiB -> 365.84 MiB (-0.00%)

rustbot assigned oli-obk Mar 18, 2025

rustbot added S-waiting-on-review T-compiler labels Mar 18, 2025

This comment has been minimized.

Sign in to view

saethlin force-pushed the repeated-uninit branch 2 times, most recently from 02cf25d to 591abdd Compare March 18, 2025 03:52

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf label Mar 18, 2025

saethlin marked this pull request as ready for review March 18, 2025 03:54

bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 18, 2025

Auto merge of rust-lang#138634 - saethlin:repeated-uninit, r=<try>

8bc3b30

Lower to a no-op when Rvalue::Repeat repeats uninit Fixes rust-lang#138625 r? oli-obk

scottmcm reviewed Mar 18, 2025

View reviewed changes

tests/codegen/uninit-repeat-in-aggregate.rs Outdated Show resolved Hide resolved

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf label Mar 18, 2025

saethlin force-pushed the repeated-uninit branch from 591abdd to 73f36ed Compare March 20, 2025 03:54

saethlin changed the title ~~Lower to a no-op when Rvalue::Repeat repeats uninit~~ Lower to a memset(undef) when Rvalue::Repeat repeats uninit Mar 20, 2025

Lower to a memset(undef) when Rvalue::Repeat repeats uninit

8e7d8dd

saethlin force-pushed the repeated-uninit branch from 73f36ed to 8e7d8dd Compare March 20, 2025 03:57

scottmcm reviewed Mar 21, 2025

View reviewed changes

scottmcm approved these changes Mar 21, 2025

View reviewed changes

oli-obk approved these changes Mar 24, 2025

View reviewed changes

bors added S-waiting-on-bors and removed S-waiting-on-review labels Mar 24, 2025

bors added S-waiting-on-review and removed S-waiting-on-bors labels Mar 24, 2025

bors added S-waiting-on-bors and removed S-waiting-on-review labels Mar 24, 2025

bors added the merged-by-bors label Mar 25, 2025

bors merged commit e61403a into rust-lang:master Mar 25, 2025
7 checks passed

rustbot added this to the 1.87.0 milestone Mar 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lower to a memset(undef) when Rvalue::Repeat repeats uninit #138634

Lower to a memset(undef) when Rvalue::Repeat repeats uninit #138634

saethlin commented Mar 18, 2025 •

edited

Loading

This comment has been minimized.

saethlin commented Mar 18, 2025

This comment has been minimized.

rustbot commented Mar 18, 2025

bors commented Mar 18, 2025

scottmcm Mar 18, 2025

saethlin Mar 18, 2025

saethlin Mar 18, 2025

saethlin Mar 20, 2025

scottmcm Mar 21, 2025

bors commented Mar 18, 2025

This comment has been minimized.

rust-timer commented Mar 18, 2025

scottmcm Mar 21, 2025

saethlin Mar 22, 2025

scottmcm Mar 21, 2025

saethlin Mar 22, 2025

scottmcm left a comment

saethlin commented Mar 22, 2025 •

edited

Loading

oli-obk left a comment

saethlin commented Mar 24, 2025

bors commented Mar 24, 2025

bors commented Mar 24, 2025

bors commented Mar 24, 2025

rust-log-analyzer commented Mar 24, 2025

saethlin commented Mar 24, 2025

bors commented Mar 25, 2025

bors commented Mar 25, 2025

github-actions bot commented Mar 25, 2025

rust-timer commented Mar 25, 2025

	//! One important role of collection is to evaluate all constants that are used by all the items
	//! which are being collected. Codegen can then rely on only encountering constants that evaluate
	//! successfully, and if a constant fails to evaluate, the collector has much better context to be
	//! able to show where this constant comes up.

Lower to a memset(undef) when Rvalue::Repeat repeats uninit #138634

Lower to a memset(undef) when Rvalue::Repeat repeats uninit #138634

Conversation

saethlin commented Mar 18, 2025 • edited Loading

This comment has been minimized.

saethlin commented Mar 18, 2025

This comment has been minimized.

rustbot commented Mar 18, 2025

bors commented Mar 18, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bors commented Mar 18, 2025

This comment has been minimized.

rust-timer commented Mar 18, 2025

Overall result: no relevant changes - no action needed

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scottmcm left a comment

Choose a reason for hiding this comment

saethlin commented Mar 22, 2025 • edited Loading

oli-obk left a comment

Choose a reason for hiding this comment

saethlin commented Mar 24, 2025

bors commented Mar 24, 2025

bors commented Mar 24, 2025

bors commented Mar 24, 2025

rust-log-analyzer commented Mar 24, 2025

saethlin commented Mar 24, 2025

bors commented Mar 25, 2025

bors commented Mar 25, 2025

github-actions bot commented Mar 25, 2025

Test differences

rust-timer commented Mar 25, 2025

Overall result: no relevant changes - no action needed

saethlin commented Mar 18, 2025 •

edited

Loading

saethlin commented Mar 22, 2025 •

edited

Loading