Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missed optimization when using const-blocks for MaybeUninit arrays #138625

Open
kmdreko opened this issue Mar 17, 2025 · 2 comments · May be fixed by #138634
Open

Missed optimization when using const-blocks for MaybeUninit arrays #138625

kmdreko opened this issue Mar 17, 2025 · 2 comments · May be fixed by #138634
Labels
A-codegen Area: Code generation C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@kmdreko
Copy link

kmdreko commented Mar 17, 2025

Please see the assembly from these functions (Rust 1.85.0 --release):

use std::mem::MaybeUninit;

pub struct SmallVec<T> {
    pub len: usize,
    pub arr: [MaybeUninit<T>; 24],
}

#[unsafe(no_mangle)]
pub fn uninit_arr_via_const() -> SmallVec<String> {
    SmallVec { len: 0, arr: [const { MaybeUninit::uninit() }; 24] }
}

#[unsafe(no_mangle)]
pub fn uninit_arr_via_assume_init() -> SmallVec<String> {
    SmallVec { len: 0, arr: unsafe { MaybeUninit::uninit().assume_init() } }
}
uninit_arr_via_const:
	pushq	%rbx
	movq	%rdi, %rbx
	movl	$584, %edx
	xorl	%esi, %esi
	callq	*memset@GOTPCREL(%rip)
	movq	%rbx, %rax
	popq	%rbx
	retq

uninit_arr_via_assume_init:
	movq	%rdi, %rax
	movq	$0, 576(%rdi)
	retq

The const {} variant compiles with code that seems to zero-out the arr field. This should not be necessary and the other version (correctly) only initializes the len field.

This is particularly troublesome because the documentation in multiple places expresses that const-blocks are the preferred way to create a [MaybeUnint; _].

@kmdreko kmdreko added the C-bug Category: This is a bug. label Mar 17, 2025
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Mar 17, 2025
@saethlin
Copy link
Member

As with most such examples, you can get the codegen you want with more const:

#[unsafe(no_mangle)]
pub fn uninit_arr_via_const() -> SmallVec<String> {
    SmallVec { len: 0, arr: const { [const { MaybeUninit::uninit() }; 24] } }
}

So it's nice that works, but I'll try to fix this properly.

@saethlin saethlin added A-codegen Area: Code generation T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Mar 18, 2025
@saethlin
Copy link
Member

I'm not tagging this as LLVM because the bug here is our bug. We are giving the backend a loop for the assignment of [const { MaybeUninit::uninit() }; 24].

@saethlin saethlin added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Mar 18, 2025
bors added a commit to rust-lang-ci/rust that referenced this issue Mar 18, 2025
Lower to a no-op when Rvalue::Repeat repeats uninit

Fixes rust-lang#138625

r? oli-obk
bors added a commit to rust-lang-ci/rust that referenced this issue Mar 24, 2025
…,oli-obk

Lower to a memset(undef) when Rvalue::Repeat repeats uninit

Fixes rust-lang#138625.

It is technically correct to just do nothing. But if we actually do nothing, we may miss that this is de-initializing something, so instead we just lower to a single memset that writes undef. This is still superior to the memcpy loop, in both quality of code we hand to the backend and LLVM's final output.
bors added a commit to rust-lang-ci/rust that referenced this issue Mar 25, 2025
…,oli-obk

Lower to a memset(undef) when Rvalue::Repeat repeats uninit

Fixes rust-lang#138625.

It is technically correct to just do nothing. But if we actually do nothing, we may miss that this is de-initializing something, so instead we just lower to a single memset that writes undef. This is still superior to the memcpy loop, in both quality of code we hand to the backend and LLVM's final output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants