Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clang trunk does not optimize trivial case #53861

Open
socketpair opened this issue Feb 15, 2022 · 30 comments
Open

Clang trunk does not optimize trivial case #53861

socketpair opened this issue Feb 15, 2022 · 30 comments

Comments

@socketpair
Copy link

https://godbolt.org/z/65Tza7cdf

#include <stdint.h>

extern uint16_t qwe, asd, zxc, zzz;

int firewall(const uint8_t *restrict data) {
    const uint8_t ip_proto = *data;
    const uint16_t dst_port = *((const uint16_t *)data + 32);

    if (ip_proto == 17 && dst_port == qwe) return 1;

    if (ip_proto == 17 && dst_port == asd) return 1;

    if (ip_proto == 17 && dst_port == zxc) return 1;

    if (ip_proto == 17 && dst_port == zzz) return 1;

    return 0;
}
firewall:                               # @firewall
        mov     dl, byte ptr [rdi]
        movzx   ecx, word ptr [rdi + 64]
        mov     eax, 1
        cmp     dl, 17
        jne     .LBB0_2
        cmp     cx, word ptr [rip + qwe]
        jne     .LBB0_2
.LBB0_7:
        ret
.LBB0_2:
        cmp     dl, 17
        jne     .LBB0_4
        cmp     cx, word ptr [rip + asd]
        je      .LBB0_7
.LBB0_4:
        cmp     dl, 17
        jne     .LBB0_6
        cmp     cx, word ptr [rip + zxc]
        je      .LBB0_7
.LBB0_6:
        cmp     dl, 17
        sete    al
        cmp     cx, word ptr [rip + zzz]
        sete    cl
        and     cl, al
        movzx   eax, cl
        ret

Compiled with -O3 to x86_64.
cmp dl, 17 is repeated for some reason. This is a bug as I think.

@socketpair
Copy link
Author

Interesting, extern volatile uint16_t qwe, asd, zxc, zzz; fixes the problem. Or changing uint16_t to uint32_t. What's happening ?

@socketpair
Copy link
Author

@EugeneZelenko who should I trigger regarding this task ?

@socketpair
Copy link
Author

socketpair commented Mar 24, 2022

CLang 12 does not have this problem:
On the same (with 1,2,3,4 instead of external variables) C source Clang 13 gives ASM output like in the first message.

#include <stdint.h>

int firewall(uint8_t *data) {
    uint8_t ip_proto = *data;
    uint16_t dst_port = *(uint16_t *)data;
    if (ip_proto == 17 && dst_port == 1) return 1;
    if (ip_proto == 17 && dst_port == 2) return 1;
    if (ip_proto == 17 && dst_port == 3) return 1;
    if (ip_proto == 17 && dst_port == 4) return 1;
    return 0;
}
firewall:                               # @firewall
        movzx   eax, word ptr [rdi]
        cmp     byte ptr [rdi], 17
        sete    cl
        dec     eax
        cmp     ax, 4
        setb    al
        and     al, cl
        movzx   eax, al
        ret

@socketpair
Copy link
Author

socketpair commented Mar 24, 2022

Another thing. Even for Clang12 (-O4):

for code

#include <stdint.h>

int firewall(uint8_t *data) {
    uint8_t ip_proto = *data;
    uint16_t dst_port = *(uint16_t *)data;
    if (ip_proto == 17 && dst_port == 11) return 1;
    if (ip_proto == 17 && dst_port == 22) return 1;
    if (ip_proto == 17 && dst_port == 33) return 1;
    if (ip_proto == 17 && dst_port == 44) return 1;
    return 0;
}

It gives:

.LCPI0_0:
        .short  44                              # 0x2c
        .short  33                              # 0x21
        .short  11                              # 0xb
        .short  22                              # 0x16
        .zero   2
        .zero   2
        .zero   2
        .zero   2
firewall:                               # @firewall
        cmp     byte ptr [rdi], 17
        sete    al
        vpbroadcastw    xmm0, word ptr [rdi]
        vpcmpeqw        xmm0, xmm0, xmmword ptr [rip + .LCPI0_0]
        vpmovsxwd       xmm0, xmm0
        vmovmskps       ecx, xmm0
        test    cl, cl
        setne   cl
        and     cl, al
        movzx   eax, cl
        ret

But if I remove comparison with 17:

#include <stdint.h>

int firewall(uint8_t *data) {
    uint8_t ip_proto = *data;
    uint16_t dst_port = *(uint16_t *)data;
    if (dst_port == 11) return 1;
    if (dst_port == 22) return 1;
    if (dst_port == 33) return 1;
    if (dst_port == 44) return 1;
    return 0;
}

it gives beautiful ASM code:

firewall:                               # @firewall
        movzx   ecx, word ptr [rdi]
        cmp     rcx, 44
        ja      .LBB0_2
        mov     eax, 1
        movabs  rdx, 17600780175360
        bt      rdx, rcx
        jae     .LBB0_2
        ret
.LBB0_2:
        xor     eax, eax
        ret

So, WHAT'S happening ? gcc even ancient versions does not have such problems and generate good code.

@nickdesaulniers
Copy link
Member

CLang 12 does not have this problem:

There's a bunch of conflicting test cases here, but the original reported issue doesn't look like a regression to me.
https://godbolt.org/z/f6K78Ex8h
Test other versions of clang, too.

cc @zmodem

@socketpair
Copy link
Author

socketpair commented Mar 24, 2022

@nickdesaulniers What about this: https://godbolt.org/z/h6zbWvr6n ?

@nickdesaulniers
Copy link
Member

nickdesaulniers commented Mar 24, 2022

Yeah, that's a more demonstrative-of-regression test case. Seeing the repeated comparisons of an unchanged register against a constant is disappointing. Looks like a critical edge we could have split perhaps; the IR looks nice, so it's probably isel.

Bisecting LLVM converged on:
5c2e50b
cc @aqjune
Reported here, too. Tentatively marking as a regression.

@LebedevRI
Copy link
Member

Hm, let me take a look.

@LebedevRI LebedevRI self-assigned this Mar 25, 2022
LebedevRI added a commit that referenced this issue Mar 25, 2022
This whole check is bogus, it's some kind of a profitability check.
For now, simply extend it to not only allow branch-on-binary-ops,
but also on poison-safe logic ops.

Refs. #53861
Refs. #54553
@LebedevRI
Copy link
Member

After f6b60b3, we now end up with

firewall:                               # @firewall
        .cfi_startproc
# %bb.0:                                # %entry
        vpbroadcastw    64(%rdi), %xmm0
        cmpb    $17, (%rdi)
        vpcmpeqw        .LCPI0_0(%rip), %xmm0, %xmm0
        sete    %al
        vmovd   %eax, %xmm1
        xorl    %eax, %eax
        vpbroadcastb    %xmm1, %xmm1
        vpmovsxwd       %xmm0, %xmm0
        vpand   %xmm0, %xmm1, %xmm0
        vpslld  $31, %xmm0, %xmm0
        vmovmskps       %xmm0, %ecx
        testl   %ecx, %ecx
        setne   %al
        retq

... which is better but still a regression from before poison safety patches.
The missing part here is #54553.

@socketpair
Copy link
Author

Wow, @LebedevRI ! Please see here https://godbolt.org/z/r57nvhGd3
Note the difference in firewall() and firewall2().

Hope this helps.

@LebedevRI
Copy link
Member

Yeah, it does seem like we are also missing some variant of CFG simplification along the lines of https://alive2.llvm.org/ce/z/-qy6hh

@socketpair
Copy link
Author

@LebedevRI Do I understand right, #54553 will fix my case and make assembler output like gcc ? (using a big 64-bit mask) ? If yes, please close this bug, and I will track #54553.

@RKSimon RKSimon changed the title CLang trunk does not optimize trivial case Clang trunk does not optimize trivial case Jul 23, 2022
mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this issue Oct 7, 2022
mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this issue Oct 7, 2022
This whole check is bogus, it's some kind of a profitability check.
For now, simply extend it to not only allow branch-on-binary-ops,
but also on poison-safe logic ops.

Refs. llvm/llvm-project#53861
Refs. llvm/llvm-project#54553
@socketpair
Copy link
Author

@LebedevRI is there any progress on that ? I mean #53861 (comment)

@socketpair
Copy link
Author

https://godbolt.org/z/GYqT96eq4 still triggered on Clang 16.

@socketpair
Copy link
Author

@LebedevRI ? Possibly I should ping someone else ?

@LebedevRI
Copy link
Member

I'm sorry, this wasn't on my interest list.
I'm not sure what specific folds we are missing, but at least one is:
https://alive2.llvm.org/ce/z/VM4PLR
I think #54553 is the main issue here.
cc @bcl5980 @spatel-gh

@bcl5980
Copy link
Contributor

bcl5980 commented Dec 5, 2022

This pattern looks a little complicate. Logical and/or use distributive law need to be careful with the operand position.
https://alive2.llvm.org/ce/z/nM6kZb
And we may need to consider normal binop also. So, there are more patterns we should detect:
https://alive2.llvm.org/ce/z/khFZ2s
I will try to add these patterns.

Candidate patch:
https://reviews.llvm.org/D139408

@socketpair
Copy link
Author

@bcl5980 Huge thanks! Starting from what versions, it will appear ?

@socketpair
Copy link
Author

socketpair commented Dec 13, 2022

I will be very happy if you do backports.

@socketpair
Copy link
Author

https://godbolt.org/z/sEj9Ys1d3 <- final (I believe) battle. Waiting while godbolt compiles Clang.

@bcl5980
Copy link
Contributor

bcl5980 commented Dec 13, 2022

https://godbolt.org/z/sEj9Ys1d3 <- final (I believe) battle. Waiting while godbolt compiles Clang.

It looks my change only can fix: https://godbolt.org/z/65Tza7cdf
https://godbolt.org/z/sEj9Ys1d3 still need to be fixed.

@bcl5980 bcl5980 reopened this Dec 13, 2022
@bcl5980
Copy link
Contributor

bcl5980 commented Dec 13, 2022

And actually before my change, @LebedevRI already check in a solution at 9ddff66 to fix the https://godbolt.org/z/65Tza7cdf
So I do nothing for this ticket for now. Sorry for that.

@LebedevRI
Copy link
Member

I'm not sure which patch helped here. Looks like we still need to fix SimplifyCFG to produce the select though.

@socketpair
Copy link
Author

ICX gives the best (what I expected at the beginning o the topic) result for our case : https://godbolt.org/z/aExdzzYM9

@socketpair
Copy link
Author

@LebedevRI @bcl5980

Sorry for bothering, but I confused. Especially about status of merging patches. So, what is merged and what is not ? What should I do in order everything to be moved further ?

@LebedevRI
Copy link
Member

What should I do in order everything to be moved further ?

You -- nothing, unless you want to write the needed patches yourself.
The two changes i wrote are either exposing a miscompilation
somewhere else in LLVM (or are causing it),
and i'm waiting on a reproducer from @alexfh.
Until i get one, and deal with it, the patches are stuck.

@socketpair
Copy link
Author

socketpair commented Jan 16, 2023

https://godbolt.org/z/c7zf5zWzv

image

-target bpf reveals more obviously what is happening.

@socketpair
Copy link
Author

@alexfh ?

@socketpair
Copy link
Author

@LebedevRI @bcl5980

https://godbolt.org/z/fh53d5fEn

Seems previous (already merged) patches are not sufficient. If condition become slightly more complex, it fails again to detect common condition.

veselypeta pushed a commit to veselypeta/cherillvm that referenced this issue Jun 9, 2024
X && Z || Y && Z --> (X || Y) && Z
https://alive2.llvm.org/ce/z/nM6kZb

(X || Z) && (Y || Z) --> (X && Y) || Z
https://alive2.llvm.org/ce/z/_EWLRR

Fix: llvm/llvm-project#53861

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D139408
@socketpair
Copy link
Author

To wrap-up : https://godbolt.org/z/Te4459xd1

Different things happening, preventing from optimal program generation.
@nickdesaulniers
@LebedevRI
@bcl5980
@alexfh

Please guide me what should I do next in order everything this to be fixed. Split to other issues ? Invite someone else ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants