-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
float instructions should only be lowered to NEON if precision constraints permit #16648
Comments
This may be a way to fix the fast-math issue in llvm/llvm-bugzilla-archive#16275 . |
After a long discussion on the list, this approach can be very problematic for NEON intrinsics (which require that NEON instructions be generated no matter what IEEE status or fast-math flags). Since IR doens't differentiate between code that has been produced by vectorizers or NEON intrinsics, we can't apply any serialization rule indiscriminately. The only option left would be to have an extra command line option requesting IEEE compliance, and then it would be the user's responsibility to check the existence of NEON intrinsics, hand-crafted IR, etc. This is also too big a hammer to fix #16275, which already has its own fix. All in all interesting, but too low on the priority list for me to work on it. |
mentioned in issue llvm/llvm-bugzilla-archive#16275 |
This is causing soundness issue in Rust since LLVM disagrees with itself on what the semantics for these operations are (see rust-lang/rust#129880): IIUC, LLVM's optimizations assume they can calculate what that loop does, and that it follows IEEE semantics. But LLVM's codegen produces code that does not have IEEE semantics, and instead flushes subnormals to zero. Would it make sense to somehow expose the actual NEON operations with their real semantics (i.e., with subnormal flushing)? Currently, frontends emit regular LLVM operations on vector types, and then LLVM stumbles over its own feet since the backend implements the wrong semantics for those operations. For cases where the frontend actually wants to express the NEON semantics, ideally there would be some way to express that that avoids these miscompilations. |
Extended Description
On ARM it is difficult to generate optimal code that matches certain floating point precision requirements. The only way that currently exist is to explicitly alter the feature flags of the CPU that we target. This currently causes problems, such that it is e.g. not possible to take advantage of NEON for integer instructions while at the same time NEON is avoided for vector floating point operations.
The following test cases illustrate how I expect llc to behave:
The text was updated successfully, but these errors were encountered: