-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clang trunk does not optimize trivial case #53861
Comments
Interesting, |
@EugeneZelenko who should I trigger regarding this task ? |
CLang 12 does not have this problem: #include <stdint.h>
int firewall(uint8_t *data) {
uint8_t ip_proto = *data;
uint16_t dst_port = *(uint16_t *)data;
if (ip_proto == 17 && dst_port == 1) return 1;
if (ip_proto == 17 && dst_port == 2) return 1;
if (ip_proto == 17 && dst_port == 3) return 1;
if (ip_proto == 17 && dst_port == 4) return 1;
return 0;
} firewall: # @firewall
movzx eax, word ptr [rdi]
cmp byte ptr [rdi], 17
sete cl
dec eax
cmp ax, 4
setb al
and al, cl
movzx eax, al
ret |
Another thing. Even for Clang12 (-O4): for code #include <stdint.h>
int firewall(uint8_t *data) {
uint8_t ip_proto = *data;
uint16_t dst_port = *(uint16_t *)data;
if (ip_proto == 17 && dst_port == 11) return 1;
if (ip_proto == 17 && dst_port == 22) return 1;
if (ip_proto == 17 && dst_port == 33) return 1;
if (ip_proto == 17 && dst_port == 44) return 1;
return 0;
} It gives: .LCPI0_0:
.short 44 # 0x2c
.short 33 # 0x21
.short 11 # 0xb
.short 22 # 0x16
.zero 2
.zero 2
.zero 2
.zero 2
firewall: # @firewall
cmp byte ptr [rdi], 17
sete al
vpbroadcastw xmm0, word ptr [rdi]
vpcmpeqw xmm0, xmm0, xmmword ptr [rip + .LCPI0_0]
vpmovsxwd xmm0, xmm0
vmovmskps ecx, xmm0
test cl, cl
setne cl
and cl, al
movzx eax, cl
ret But if I remove comparison with 17: #include <stdint.h>
int firewall(uint8_t *data) {
uint8_t ip_proto = *data;
uint16_t dst_port = *(uint16_t *)data;
if (dst_port == 11) return 1;
if (dst_port == 22) return 1;
if (dst_port == 33) return 1;
if (dst_port == 44) return 1;
return 0;
} it gives beautiful ASM code: firewall: # @firewall
movzx ecx, word ptr [rdi]
cmp rcx, 44
ja .LBB0_2
mov eax, 1
movabs rdx, 17600780175360
bt rdx, rcx
jae .LBB0_2
ret
.LBB0_2:
xor eax, eax
ret So, WHAT'S happening ? gcc even ancient versions does not have such problems and generate good code. |
There's a bunch of conflicting test cases here, but the original reported issue doesn't look like a regression to me. cc @zmodem |
@nickdesaulniers What about this: https://godbolt.org/z/h6zbWvr6n ? |
Yeah, that's a more demonstrative-of-regression test case. Seeing the repeated comparisons of an unchanged register against a constant is disappointing. Looks like a critical edge we could have split perhaps; the IR looks nice, so it's probably isel. Bisecting LLVM converged on: |
Hm, let me take a look. |
After f6b60b3, we now end up with firewall: # @firewall
.cfi_startproc
# %bb.0: # %entry
vpbroadcastw 64(%rdi), %xmm0
cmpb $17, (%rdi)
vpcmpeqw .LCPI0_0(%rip), %xmm0, %xmm0
sete %al
vmovd %eax, %xmm1
xorl %eax, %eax
vpbroadcastb %xmm1, %xmm1
vpmovsxwd %xmm0, %xmm0
vpand %xmm0, %xmm1, %xmm0
vpslld $31, %xmm0, %xmm0
vmovmskps %xmm0, %ecx
testl %ecx, %ecx
setne %al
retq ... which is better but still a regression from before poison safety patches. |
Wow, @LebedevRI ! Please see here https://godbolt.org/z/r57nvhGd3 Hope this helps. |
Yeah, it does seem like we are also missing some variant of CFG simplification along the lines of https://alive2.llvm.org/ce/z/-qy6hh |
@LebedevRI Do I understand right, #54553 will fix my case and make assembler output like gcc ? (using a big 64-bit mask) ? If yes, please close this bug, and I will track #54553. |
This whole check is bogus, it's some kind of a profitability check. For now, simply extend it to not only allow branch-on-binary-ops, but also on poison-safe logic ops. Refs. llvm/llvm-project#53861 Refs. llvm/llvm-project#54553
@LebedevRI is there any progress on that ? I mean #53861 (comment) |
https://godbolt.org/z/GYqT96eq4 still triggered on Clang 16. |
@LebedevRI ? Possibly I should ping someone else ? |
I'm sorry, this wasn't on my interest list. |
This pattern looks a little complicate. Logical and/or use distributive law need to be careful with the operand position. Candidate patch: |
@bcl5980 Huge thanks! Starting from what versions, it will appear ? |
I will be very happy if you do backports. |
https://godbolt.org/z/sEj9Ys1d3 <- final (I believe) battle. Waiting while godbolt compiles Clang. |
It looks my change only can fix: https://godbolt.org/z/65Tza7cdf |
And actually before my change, @LebedevRI already check in a solution at 9ddff66 to fix the https://godbolt.org/z/65Tza7cdf |
I'm not sure which patch helped here. Looks like we still need to fix SimplifyCFG to produce the select though. |
ICX gives the best (what I expected at the beginning o the topic) result for our case : https://godbolt.org/z/aExdzzYM9 |
Sorry for bothering, but I confused. Especially about status of merging patches. So, what is merged and what is not ? What should I do in order everything to be moved further ? |
You -- nothing, unless you want to write the needed patches yourself. |
https://godbolt.org/z/c7zf5zWzv
|
@alexfh ? |
https://godbolt.org/z/fh53d5fEn Seems previous (already merged) patches are not sufficient. If condition become slightly more complex, it fails again to detect common condition. |
X && Z || Y && Z --> (X || Y) && Z https://alive2.llvm.org/ce/z/nM6kZb (X || Z) && (Y || Z) --> (X && Y) || Z https://alive2.llvm.org/ce/z/_EWLRR Fix: llvm/llvm-project#53861 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D139408
To wrap-up : https://godbolt.org/z/Te4459xd1 Different things happening, preventing from optimal program generation. Please guide me what should I do next in order everything this to be fixed. Split to other issues ? Invite someone else ? |
https://godbolt.org/z/65Tza7cdf
Compiled with -O3 to x86_64.
cmp dl, 17
is repeated for some reason. This is a bug as I think.The text was updated successfully, but these errors were encountered: