You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Rollup merge of rust-lang#124874 - jedbrown:float-mul-add-fast, r=saethlin
intrinsics fmuladdf{32,64}: expose llvm.fmuladd.* semantics
Add intrinsics `fmuladd{f32,f64}`. This computes `(a * b) + c`, to be fused if the code generator determines that (i) the target instruction set has support for a fused operation, and (ii) that the fused operation is more efficient than the equivalent, separate pair of `mul` and `add` instructions.
https://llvm.org/docs/LangRef.html#llvm-fmuladd-intrinsic
The codegen_cranelift uses the `fma` function from libc, which is a correct implementation, but without the desired performance semantic. I think this requires an update to cranelift to expose a suitable instruction in its IR.
I have not tested with codegen_gcc, but it should behave the same way (using `fma` from libc).
---
This topic has been discussed a few times on Zulip and was suggested, for example, by `@workingjubilee` in [Effect of fma disabled](https://rust-lang.zulipchat.com/#narrow/stream/122651-general/topic/Effect.20of.20fma.20disabled/near/274179331).
0 commit comments