Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SystemZ] Add support for half (fp16) #109164

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

JonPsson1
Copy link
Contributor

@JonPsson1 JonPsson1 commented Sep 18, 2024

Make sure that fp16<=>float conversions are expanded to libcalls and that 16-bit fp values can be loaded and stored properly via GPRs. With this patch the Half IR Type used in operations should be handled correctly with the help of pre-existing ISD node expansions.

Patch in progress...

Fixes #50374

@JonPsson1 JonPsson1 requested a review from uweigand September 18, 2024 15:45
@llvmbot llvmbot added clang Clang issues not falling into any other category backend:SystemZ clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Sep 18, 2024
@llvmbot
Copy link
Member

llvmbot commented Sep 18, 2024

@llvm/pr-subscribers-llvm-selectiondag
@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: Jonas Paulsson (JonPsson1)

Changes

Make sure that fp16<=>float conversions are expanded to libcalls and that 16-bit fp values can be loaded and stored properly via GPRs. With this patch the Half IR Type used in operations should be handled correctly with the help of pre-existing ISD node expansions.

Patch in progress...

Notes:


`Clang FE:

TargetInfo {
  /// Determine whether the _Float16 type is supported on this target.
  bool HasFloat16;
  ; If false gives an error message on _Float16 in C program.

  bool HasLegalHalfType; // True if the backend supports operations on the half
                         // LLVM IR type. 
  ; If false, Half:s are extended and ops are done in float, if true, ops are
  ; done in Half (by Clang). 

  -ffloat16-excess-precision=[standard,fast,none]
  "Allows control over excess precision on targets where native support for the
   precision types is not available. By default, excess precision is used to
   calculate intermediate results following the rules specified in ISO C99."
  ; =&gt; Even though we need to deal with Half operations coming from other
       languages in the BE, we still should to let Clang insert the required
       emulation (trunc/extend) instructions as required by C (_Float16). So
       HasLegalHalfType needs to be set to 'false'.
  ; =&gt; C code will have fpext/fptrunc inserted in many places to emulate
       _Float16, and operations are done in Float.
  ; =&gt; Other languages will emit Half operations, which has to be emulated by
       fpext/fptrunc in BE and then done in Float.

  /// Check whether llvm intrinsics such as llvm.convert.to.fp16 should be used
  /// to convert to and from __fp16.
  /// FIXME: This function should be removed once all targets stop using the
  /// conversion intrinsics.
  virtual bool useFP16ConversionIntrinsics() const {
    return true;
  }
  ; Use either conversion intrinsics or fpext/fptrunc from Clang.
  ; =&gt; Given the comment and the fact that other languages emit 'half' it
  ;    seems ideal to not use these.

  bool HalfArgsAndReturns;
  ; Should be true if ABI says that half values are passed / returned.
  ; - What does the SystemZ ABI require? Pass/return in float regs?
}

Middle End:
 ; Middle-End does not do much especially with half:s/conversion intrinsics it
   seems (some constant folding).

  ; InstCombiner removed an fptrunc before sub and converted the Float fsub
    to a Half fsub. =&gt; Middle end does not (at least currently) seem to care
    about the Clang HasLegalHalfType flag.

CodeGen:
  ; Common-code expansions available:
  ; The expansion of ISD::FP16_TO_FP / FP_TO_FP16 generates libcalls.
  ; The expansion of extloads/truncstores handles these as integer values
    in conjunction with the libcalls.

  ; Library calls:
    LLVM libcalls: llvm/include/llvm/IR/RuntimeLibcalls.def
    got 'undefined reference' from linker at first try...

  Conversions:
   - could NNP instructions (z16) be used (vcfn / vcnf)?
     (clang/test/CodeGen/SystemZ/builtins-systemz-zvector4.c)

- There are also corresponding strict fp nodes that probably should be handled
   as well just the same.

- The exact semantics of _Float16 in C is hopefully handled by Clang FE per the value of -ffloat16-excess-precision.
`

Full diff: https://github.com/llvm/llvm-project/pull/109164.diff

3 Files Affected:

  • (modified) clang/lib/Basic/Targets/SystemZ.h (+9)
  • (modified) llvm/lib/Target/SystemZ/SystemZISelLowering.cpp (+7)
  • (added) llvm/test/CodeGen/SystemZ/fp-half.ll (+100)
diff --git a/clang/lib/Basic/Targets/SystemZ.h b/clang/lib/Basic/Targets/SystemZ.h
index f05ea473017bec..6566b63d4587ee 100644
--- a/clang/lib/Basic/Targets/SystemZ.h
+++ b/clang/lib/Basic/Targets/SystemZ.h
@@ -91,11 +91,20 @@ class LLVM_LIBRARY_VISIBILITY SystemZTargetInfo : public TargetInfo {
                       "-v128:64-a:8:16-n32:64");
     }
     MaxAtomicPromoteWidth = MaxAtomicInlineWidth = 128;
+
+    HasLegalHalfType = false;    // Default=false
+    HalfArgsAndReturns = false;  // Default=false
+    HasFloat16 = true;           // Default=false
+
     HasStrictFP = true;
   }
 
   unsigned getMinGlobalAlign(uint64_t Size, bool HasNonWeakDef) const override;
 
+  bool useFP16ConversionIntrinsics() const override {
+    return false;
+  }
+
   void getTargetDefines(const LangOptions &Opts,
                         MacroBuilder &Builder) const override;
 
diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
index 582a8c139b2937..fd3dcebba1eca7 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
@@ -704,6 +704,13 @@ SystemZTargetLowering::SystemZTargetLowering(const TargetMachine &TM,
     setOperationAction(ISD::BITCAST, MVT::f32, Custom);
   }
 
+  // Expand FP16 <=> FP32 conversions to libcalls and handle FP16 loads and
+  // stores in GPRs.
+  setOperationAction(ISD::FP16_TO_FP, MVT::f32, Expand);
+  setOperationAction(ISD::FP_TO_FP16, MVT::f32, Expand);
+  setLoadExtAction(ISD::EXTLOAD, MVT::f32, MVT::f16, Expand);
+  setTruncStoreAction(MVT::f32, MVT::f16, Expand);
+
   // VASTART and VACOPY need to deal with the SystemZ-specific varargs
   // structure, but VAEND is a no-op.
   setOperationAction(ISD::VASTART, MVT::Other, Custom);
diff --git a/llvm/test/CodeGen/SystemZ/fp-half.ll b/llvm/test/CodeGen/SystemZ/fp-half.ll
new file mode 100644
index 00000000000000..393ba2f620ff6e
--- /dev/null
+++ b/llvm/test/CodeGen/SystemZ/fp-half.ll
@@ -0,0 +1,100 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z10 | FileCheck %s
+;
+; Tests for FP16 (Half).
+
+; A function where everything is done in Half.
+define void @fun0(ptr %Op0, ptr %Op1, ptr %Dst) {
+; CHECK-LABEL: fun0:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    stmg %r12, %r15, 96(%r15)
+; CHECK-NEXT:    .cfi_offset %r12, -64
+; CHECK-NEXT:    .cfi_offset %r13, -56
+; CHECK-NEXT:    .cfi_offset %r14, -48
+; CHECK-NEXT:    .cfi_offset %r15, -40
+; CHECK-NEXT:    aghi %r15, -168
+; CHECK-NEXT:    .cfi_def_cfa_offset 328
+; CHECK-NEXT:    std %f8, 160(%r15) # 8-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset %f8, -168
+; CHECK-NEXT:    llgh %r2, 0(%r2)
+; CHECK-NEXT:    lgr %r13, %r4
+; CHECK-NEXT:    lgr %r12, %r3
+; CHECK-NEXT:    brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:    llgh %r2, 0(%r12)
+; CHECK-NEXT:    ler %f8, %f0
+; CHECK-NEXT:    brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:    aebr %f0, %f8
+; CHECK-NEXT:    brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT:    sth %r2, 0(%r13)
+; CHECK-NEXT:    ld %f8, 160(%r15) # 8-byte Folded Reload
+; CHECK-NEXT:    lmg %r12, %r15, 264(%r15)
+; CHECK-NEXT:    br %r14
+entry:
+  %0 = load half, ptr %Op0, align 2
+  %1 = load half, ptr %Op1, align 2
+  %add = fadd half %0, %1
+  store half %add, ptr %Dst, align 2
+  ret void
+}
+
+; A function where Half values are loaded and extended to float and then
+; operated on.
+define void @fun1(ptr %Op0, ptr %Op1, ptr %Dst) {
+; CHECK-LABEL: fun1:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    stmg %r12, %r15, 96(%r15)
+; CHECK-NEXT:    .cfi_offset %r12, -64
+; CHECK-NEXT:    .cfi_offset %r13, -56
+; CHECK-NEXT:    .cfi_offset %r14, -48
+; CHECK-NEXT:    .cfi_offset %r15, -40
+; CHECK-NEXT:    aghi %r15, -168
+; CHECK-NEXT:    .cfi_def_cfa_offset 328
+; CHECK-NEXT:    std %f8, 160(%r15) # 8-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset %f8, -168
+; CHECK-NEXT:    llgh %r2, 0(%r2)
+; CHECK-NEXT:    lgr %r13, %r4
+; CHECK-NEXT:    lgr %r12, %r3
+; CHECK-NEXT:    brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:    llgh %r2, 0(%r12)
+; CHECK-NEXT:    ler %f8, %f0
+; CHECK-NEXT:    brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:    aebr %f0, %f8
+; CHECK-NEXT:    brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT:    sth %r2, 0(%r13)
+; CHECK-NEXT:    ld %f8, 160(%r15) # 8-byte Folded Reload
+; CHECK-NEXT:    lmg %r12, %r15, 264(%r15)
+; CHECK-NEXT:    br %r14
+entry:
+  %0 = load half, ptr %Op0, align 2
+  %ext = fpext half %0 to float
+  %1 = load half, ptr %Op1, align 2
+  %ext1 = fpext half %1 to float
+  %add = fadd float %ext, %ext1
+  %res = fptrunc float %add to half
+  store half %res, ptr %Dst, align 2
+  ret void
+}
+
+; Test case with a Half incoming argument.
+define zeroext i1 @fun2(half noundef %f) {
+; CHECK-LABEL: fun2:
+; CHECK:       # %bb.0: # %start
+; CHECK-NEXT:    stmg %r14, %r15, 112(%r15)
+; CHECK-NEXT:    .cfi_offset %r14, -48
+; CHECK-NEXT:    .cfi_offset %r15, -40
+; CHECK-NEXT:    aghi %r15, -160
+; CHECK-NEXT:    .cfi_def_cfa_offset 320
+; CHECK-NEXT:    brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT:    brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:    larl %r1, .LCPI2_0
+; CHECK-NEXT:    deb %f0, 0(%r1)
+; CHECK-NEXT:    brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT:    risbg %r2, %r2, 63, 191, 49
+; CHECK-NEXT:    lmg %r14, %r15, 272(%r15)
+; CHECK-NEXT:    br %r14
+start:
+  %self = fdiv half %f, 0xHC700
+  %_4 = bitcast half %self to i16
+  %_0 = icmp slt i16 %_4, 0
+  ret i1 %_0
+}

@llvmbot
Copy link
Member

llvmbot commented Sep 18, 2024

@llvm/pr-subscribers-backend-systemz

Author: Jonas Paulsson (JonPsson1)

Changes

Make sure that fp16<=>float conversions are expanded to libcalls and that 16-bit fp values can be loaded and stored properly via GPRs. With this patch the Half IR Type used in operations should be handled correctly with the help of pre-existing ISD node expansions.

Patch in progress...

Notes:


`Clang FE:

TargetInfo {
  /// Determine whether the _Float16 type is supported on this target.
  bool HasFloat16;
  ; If false gives an error message on _Float16 in C program.

  bool HasLegalHalfType; // True if the backend supports operations on the half
                         // LLVM IR type. 
  ; If false, Half:s are extended and ops are done in float, if true, ops are
  ; done in Half (by Clang). 

  -ffloat16-excess-precision=[standard,fast,none]
  "Allows control over excess precision on targets where native support for the
   precision types is not available. By default, excess precision is used to
   calculate intermediate results following the rules specified in ISO C99."
  ; =&gt; Even though we need to deal with Half operations coming from other
       languages in the BE, we still should to let Clang insert the required
       emulation (trunc/extend) instructions as required by C (_Float16). So
       HasLegalHalfType needs to be set to 'false'.
  ; =&gt; C code will have fpext/fptrunc inserted in many places to emulate
       _Float16, and operations are done in Float.
  ; =&gt; Other languages will emit Half operations, which has to be emulated by
       fpext/fptrunc in BE and then done in Float.

  /// Check whether llvm intrinsics such as llvm.convert.to.fp16 should be used
  /// to convert to and from __fp16.
  /// FIXME: This function should be removed once all targets stop using the
  /// conversion intrinsics.
  virtual bool useFP16ConversionIntrinsics() const {
    return true;
  }
  ; Use either conversion intrinsics or fpext/fptrunc from Clang.
  ; =&gt; Given the comment and the fact that other languages emit 'half' it
  ;    seems ideal to not use these.

  bool HalfArgsAndReturns;
  ; Should be true if ABI says that half values are passed / returned.
  ; - What does the SystemZ ABI require? Pass/return in float regs?
}

Middle End:
 ; Middle-End does not do much especially with half:s/conversion intrinsics it
   seems (some constant folding).

  ; InstCombiner removed an fptrunc before sub and converted the Float fsub
    to a Half fsub. =&gt; Middle end does not (at least currently) seem to care
    about the Clang HasLegalHalfType flag.

CodeGen:
  ; Common-code expansions available:
  ; The expansion of ISD::FP16_TO_FP / FP_TO_FP16 generates libcalls.
  ; The expansion of extloads/truncstores handles these as integer values
    in conjunction with the libcalls.

  ; Library calls:
    LLVM libcalls: llvm/include/llvm/IR/RuntimeLibcalls.def
    got 'undefined reference' from linker at first try...

  Conversions:
   - could NNP instructions (z16) be used (vcfn / vcnf)?
     (clang/test/CodeGen/SystemZ/builtins-systemz-zvector4.c)

- There are also corresponding strict fp nodes that probably should be handled
   as well just the same.

- The exact semantics of _Float16 in C is hopefully handled by Clang FE per the value of -ffloat16-excess-precision.
`

Full diff: https://github.com/llvm/llvm-project/pull/109164.diff

3 Files Affected:

  • (modified) clang/lib/Basic/Targets/SystemZ.h (+9)
  • (modified) llvm/lib/Target/SystemZ/SystemZISelLowering.cpp (+7)
  • (added) llvm/test/CodeGen/SystemZ/fp-half.ll (+100)
diff --git a/clang/lib/Basic/Targets/SystemZ.h b/clang/lib/Basic/Targets/SystemZ.h
index f05ea473017bec..6566b63d4587ee 100644
--- a/clang/lib/Basic/Targets/SystemZ.h
+++ b/clang/lib/Basic/Targets/SystemZ.h
@@ -91,11 +91,20 @@ class LLVM_LIBRARY_VISIBILITY SystemZTargetInfo : public TargetInfo {
                       "-v128:64-a:8:16-n32:64");
     }
     MaxAtomicPromoteWidth = MaxAtomicInlineWidth = 128;
+
+    HasLegalHalfType = false;    // Default=false
+    HalfArgsAndReturns = false;  // Default=false
+    HasFloat16 = true;           // Default=false
+
     HasStrictFP = true;
   }
 
   unsigned getMinGlobalAlign(uint64_t Size, bool HasNonWeakDef) const override;
 
+  bool useFP16ConversionIntrinsics() const override {
+    return false;
+  }
+
   void getTargetDefines(const LangOptions &Opts,
                         MacroBuilder &Builder) const override;
 
diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
index 582a8c139b2937..fd3dcebba1eca7 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
@@ -704,6 +704,13 @@ SystemZTargetLowering::SystemZTargetLowering(const TargetMachine &TM,
     setOperationAction(ISD::BITCAST, MVT::f32, Custom);
   }
 
+  // Expand FP16 <=> FP32 conversions to libcalls and handle FP16 loads and
+  // stores in GPRs.
+  setOperationAction(ISD::FP16_TO_FP, MVT::f32, Expand);
+  setOperationAction(ISD::FP_TO_FP16, MVT::f32, Expand);
+  setLoadExtAction(ISD::EXTLOAD, MVT::f32, MVT::f16, Expand);
+  setTruncStoreAction(MVT::f32, MVT::f16, Expand);
+
   // VASTART and VACOPY need to deal with the SystemZ-specific varargs
   // structure, but VAEND is a no-op.
   setOperationAction(ISD::VASTART, MVT::Other, Custom);
diff --git a/llvm/test/CodeGen/SystemZ/fp-half.ll b/llvm/test/CodeGen/SystemZ/fp-half.ll
new file mode 100644
index 00000000000000..393ba2f620ff6e
--- /dev/null
+++ b/llvm/test/CodeGen/SystemZ/fp-half.ll
@@ -0,0 +1,100 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z10 | FileCheck %s
+;
+; Tests for FP16 (Half).
+
+; A function where everything is done in Half.
+define void @fun0(ptr %Op0, ptr %Op1, ptr %Dst) {
+; CHECK-LABEL: fun0:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    stmg %r12, %r15, 96(%r15)
+; CHECK-NEXT:    .cfi_offset %r12, -64
+; CHECK-NEXT:    .cfi_offset %r13, -56
+; CHECK-NEXT:    .cfi_offset %r14, -48
+; CHECK-NEXT:    .cfi_offset %r15, -40
+; CHECK-NEXT:    aghi %r15, -168
+; CHECK-NEXT:    .cfi_def_cfa_offset 328
+; CHECK-NEXT:    std %f8, 160(%r15) # 8-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset %f8, -168
+; CHECK-NEXT:    llgh %r2, 0(%r2)
+; CHECK-NEXT:    lgr %r13, %r4
+; CHECK-NEXT:    lgr %r12, %r3
+; CHECK-NEXT:    brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:    llgh %r2, 0(%r12)
+; CHECK-NEXT:    ler %f8, %f0
+; CHECK-NEXT:    brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:    aebr %f0, %f8
+; CHECK-NEXT:    brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT:    sth %r2, 0(%r13)
+; CHECK-NEXT:    ld %f8, 160(%r15) # 8-byte Folded Reload
+; CHECK-NEXT:    lmg %r12, %r15, 264(%r15)
+; CHECK-NEXT:    br %r14
+entry:
+  %0 = load half, ptr %Op0, align 2
+  %1 = load half, ptr %Op1, align 2
+  %add = fadd half %0, %1
+  store half %add, ptr %Dst, align 2
+  ret void
+}
+
+; A function where Half values are loaded and extended to float and then
+; operated on.
+define void @fun1(ptr %Op0, ptr %Op1, ptr %Dst) {
+; CHECK-LABEL: fun1:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    stmg %r12, %r15, 96(%r15)
+; CHECK-NEXT:    .cfi_offset %r12, -64
+; CHECK-NEXT:    .cfi_offset %r13, -56
+; CHECK-NEXT:    .cfi_offset %r14, -48
+; CHECK-NEXT:    .cfi_offset %r15, -40
+; CHECK-NEXT:    aghi %r15, -168
+; CHECK-NEXT:    .cfi_def_cfa_offset 328
+; CHECK-NEXT:    std %f8, 160(%r15) # 8-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset %f8, -168
+; CHECK-NEXT:    llgh %r2, 0(%r2)
+; CHECK-NEXT:    lgr %r13, %r4
+; CHECK-NEXT:    lgr %r12, %r3
+; CHECK-NEXT:    brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:    llgh %r2, 0(%r12)
+; CHECK-NEXT:    ler %f8, %f0
+; CHECK-NEXT:    brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:    aebr %f0, %f8
+; CHECK-NEXT:    brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT:    sth %r2, 0(%r13)
+; CHECK-NEXT:    ld %f8, 160(%r15) # 8-byte Folded Reload
+; CHECK-NEXT:    lmg %r12, %r15, 264(%r15)
+; CHECK-NEXT:    br %r14
+entry:
+  %0 = load half, ptr %Op0, align 2
+  %ext = fpext half %0 to float
+  %1 = load half, ptr %Op1, align 2
+  %ext1 = fpext half %1 to float
+  %add = fadd float %ext, %ext1
+  %res = fptrunc float %add to half
+  store half %res, ptr %Dst, align 2
+  ret void
+}
+
+; Test case with a Half incoming argument.
+define zeroext i1 @fun2(half noundef %f) {
+; CHECK-LABEL: fun2:
+; CHECK:       # %bb.0: # %start
+; CHECK-NEXT:    stmg %r14, %r15, 112(%r15)
+; CHECK-NEXT:    .cfi_offset %r14, -48
+; CHECK-NEXT:    .cfi_offset %r15, -40
+; CHECK-NEXT:    aghi %r15, -160
+; CHECK-NEXT:    .cfi_def_cfa_offset 320
+; CHECK-NEXT:    brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT:    brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:    larl %r1, .LCPI2_0
+; CHECK-NEXT:    deb %f0, 0(%r1)
+; CHECK-NEXT:    brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT:    risbg %r2, %r2, 63, 191, 49
+; CHECK-NEXT:    lmg %r14, %r15, 272(%r15)
+; CHECK-NEXT:    br %r14
+start:
+  %self = fdiv half %f, 0xHC700
+  %_4 = bitcast half %self to i16
+  %_0 = icmp slt i16 %_4, 0
+  ret i1 %_0
+}

Copy link

github-actions bot commented Sep 18, 2024

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff eef5ea0c42fc07ef2c948be59b57d0df8ec801ca 159f70d573dd1ffce155633e3c99ec8b563d4604 --extensions h,c,cpp -- clang/test/CodeGen/SystemZ/Float16.c clang/test/CodeGen/SystemZ/fp16.c compiler-rt/lib/builtins/extendhfdf2.c compiler-rt/test/builtins/Unit/extendhfdf2_test.c clang/include/clang/Basic/TargetInfo.h clang/lib/Basic/Targets/SystemZ.h clang/lib/CodeGen/Targets/SystemZ.cpp clang/test/CodeGen/SystemZ/strictfp_builtins.c clang/test/CodeGen/SystemZ/systemz-abi.c clang/test/CodeGen/SystemZ/systemz-inline-asm.c compiler-rt/lib/builtins/clear_cache.c llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp llvm/lib/Target/SystemZ/MCTargetDesc/SystemZMCTargetDesc.cpp llvm/lib/Target/SystemZ/MCTargetDesc/SystemZMCTargetDesc.h llvm/lib/Target/SystemZ/SystemZAsmPrinter.cpp llvm/lib/Target/SystemZ/SystemZISelDAGToDAG.cpp llvm/lib/Target/SystemZ/SystemZISelLowering.cpp llvm/lib/Target/SystemZ/SystemZISelLowering.h llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp llvm/lib/Target/SystemZ/SystemZRegisterInfo.cpp llvm/lib/Target/SystemZ/SystemZRegisterInfo.h
View the diff from clang-format here.
diff --git a/compiler-rt/test/builtins/Unit/extendhfdf2_test.c b/compiler-rt/test/builtins/Unit/extendhfdf2_test.c
index 422e272c11..bf33291d8d 100644
--- a/compiler-rt/test/builtins/Unit/extendhfdf2_test.c
+++ b/compiler-rt/test/builtins/Unit/extendhfdf2_test.c
@@ -7,81 +7,63 @@
 
 double __extendhfdf2(TYPE_FP16 a);
 
-int test__extendhfdf2(TYPE_FP16 a, uint64_t expected)
-{
-    double x = __extendhfdf2(a);
-    int ret = compareResultD(x, expected);
+int test__extendhfdf2(TYPE_FP16 a, uint64_t expected) {
+  double x = __extendhfdf2(a);
+  int ret = compareResultD(x, expected);
 
-    if (ret){
-        printf("error in test__extendhfdf2(%#.4x) = %f, "
-               "expected %f\n", toRep16(a), x, fromRep64(expected));
-    }
-    return ret;
+  if (ret) {
+    printf("error in test__extendhfdf2(%#.4x) = %f, "
+           "expected %f\n",
+           toRep16(a), x, fromRep64(expected));
+  }
+  return ret;
 }
 
 char assumption_1[sizeof(TYPE_FP16) * CHAR_BIT == 16] = {0};
 
-int main()
-{
-    // qNaN
-    if (test__extendhfdf2(makeQNaN16(),
-                          UINT64_C(0x7ff8000000000000)))
-        return 1;
-    // NaN
-    if (test__extendhfdf2(fromRep16(0x7f80),
-                          UINT64_C(0x7ffe000000000000)))
-        return 1;
-    // inf
-    if (test__extendhfdf2(makeInf16(),
-                          UINT64_C(0x7ff0000000000000)))
-        return 1;
-    // -inf
-    if (test__extendhfdf2(makeNegativeInf16(),
-                          UINT64_C(0xfff0000000000000)))
-        return 1;
-    // zero
-    if (test__extendhfdf2(fromRep16(0x0),
-                          UINT64_C(0x0)))
-        return 1;
-    // -zero
-    if (test__extendhfdf2(fromRep16(0x8000),
-                          UINT64_C(0x8000000000000000)))
-        return 1;
-    if (test__extendhfdf2(fromRep16(0x4248),
-                          UINT64_C(0x4009200000000000)))
-        return 1;
-    if (test__extendhfdf2(fromRep16(0xc248),
-                          UINT64_C(0xc009200000000000)))
-        return 1;
-    if (test__extendhfdf2(fromRep16(0x6e62),
-                          UINT64_C(0x40b9880000000000)))
-        return 1;
-    if (test__extendhfdf2(fromRep16(0x3c00),
-                          UINT64_C(0x3ff0000000000000)))
-        return 1;
-    if (test__extendhfdf2(fromRep16(0x0400),
-                          UINT64_C(0x3f10000000000000)))
-        return 1;
-    // denormal
-    if (test__extendhfdf2(fromRep16(0x0010),
-                          UINT64_C(0x3eb0000000000000)))
-        return 1;
-    if (test__extendhfdf2(fromRep16(0x0001),
-                          UINT64_C(0x3e70000000000000)))
-        return 1;
-    if (test__extendhfdf2(fromRep16(0x8001),
-                          UINT64_C(0xbe70000000000000)))
-        return 1;
-    if (test__extendhfdf2(fromRep16(0x0001),
-                          UINT64_C(0x3e70000000000000)))
-        return 1;
-    // max (precise)
-    if (test__extendhfdf2(fromRep16(0x7bff),
-                          UINT64_C(0x40effc0000000000)))
-        return 1;
-    // max (rounded)
-    if (test__extendhfdf2(fromRep16(0x7bff),
-                          UINT64_C(0x40effc0000000000)))
-        return 1;
-    return 0;
+int main() {
+  // qNaN
+  if (test__extendhfdf2(makeQNaN16(), UINT64_C(0x7ff8000000000000)))
+    return 1;
+  // NaN
+  if (test__extendhfdf2(fromRep16(0x7f80), UINT64_C(0x7ffe000000000000)))
+    return 1;
+  // inf
+  if (test__extendhfdf2(makeInf16(), UINT64_C(0x7ff0000000000000)))
+    return 1;
+  // -inf
+  if (test__extendhfdf2(makeNegativeInf16(), UINT64_C(0xfff0000000000000)))
+    return 1;
+  // zero
+  if (test__extendhfdf2(fromRep16(0x0), UINT64_C(0x0)))
+    return 1;
+  // -zero
+  if (test__extendhfdf2(fromRep16(0x8000), UINT64_C(0x8000000000000000)))
+    return 1;
+  if (test__extendhfdf2(fromRep16(0x4248), UINT64_C(0x4009200000000000)))
+    return 1;
+  if (test__extendhfdf2(fromRep16(0xc248), UINT64_C(0xc009200000000000)))
+    return 1;
+  if (test__extendhfdf2(fromRep16(0x6e62), UINT64_C(0x40b9880000000000)))
+    return 1;
+  if (test__extendhfdf2(fromRep16(0x3c00), UINT64_C(0x3ff0000000000000)))
+    return 1;
+  if (test__extendhfdf2(fromRep16(0x0400), UINT64_C(0x3f10000000000000)))
+    return 1;
+  // denormal
+  if (test__extendhfdf2(fromRep16(0x0010), UINT64_C(0x3eb0000000000000)))
+    return 1;
+  if (test__extendhfdf2(fromRep16(0x0001), UINT64_C(0x3e70000000000000)))
+    return 1;
+  if (test__extendhfdf2(fromRep16(0x8001), UINT64_C(0xbe70000000000000)))
+    return 1;
+  if (test__extendhfdf2(fromRep16(0x0001), UINT64_C(0x3e70000000000000)))
+    return 1;
+  // max (precise)
+  if (test__extendhfdf2(fromRep16(0x7bff), UINT64_C(0x40effc0000000000)))
+    return 1;
+  // max (rounded)
+  if (test__extendhfdf2(fromRep16(0x7bff), UINT64_C(0x40effc0000000000)))
+    return 1;
+  return 0;
 }

@nikic
Copy link
Contributor

nikic commented Sep 19, 2024

Note that you need to also have softPromoteHalfType return true to get correct legalization for half operations.

@JonPsson1
Copy link
Contributor Author

Note that you need to also have softPromoteHalfType return true to get correct legalization for half operations.

Thanks for pointing that out - patch updated.

@uweigand
Copy link
Member

I think we should define and implement a proper ABI for the half type as well.

@llvmbot llvmbot added the clang:codegen IR generation bugs: mangling, exceptions, etc. label Oct 2, 2024
@JonPsson1
Copy link
Contributor Author

Patch updated after some progress...

With this version, the fp16 values are passed to conversion functions as integer, which seems to be the default. It is however a bit tricky to do this and at the same time pass half values in FP registers.

At this point I wonder for one thing if it would be better to pass FP16 values to the conversion functions as _Float16 instead? It seems this may be possible to change in the configurations by looking at COMPILER_RT_HAS_FLOAT16 / compiler-rt/lib/builtins/extendhfsf2.c / fp_extend.h...

Not really sure if those conversion functions are supposed to be built and only used for soft-promotion of fp16, or if there are any external implications, for instance gcc compatability.

Any other comments also welcome...

@tgross35
Copy link
Contributor

With this version, the fp16 values are passed to conversion functions as integer, which seems to be the default. It is however a bit tricky to do this and at the same time pass half values in FP registers.

At this point I wonder for one thing if it would be better to pass FP16 values to the conversion functions as _Float16 instead? It seems this may be possible to change in the configurations by looking at COMPILER_RT_HAS_FLOAT16 / compiler-rt/lib/builtins/extendhfsf2.c / fp_extend.h...

Not really sure if those conversion functions are supposed to be built and only used for soft-promotion of fp16, or if there are any external implications, for instance gcc compatability.

My understanding is that in GCC's __gnu_h2f_ieee/__gnu_f2h_ieee is always i32<->i16 (integer ABI), then __extendhfsf2/__truncsfhf2 uses either int16_t or _Float16 on a per-target basis as controlled by __LIBGCC_HAS_HF_MODE__ (I don't know where this gets set). In LLVM compiler-rt, COMPILER_RT_HAS_FLOAT16 is the control to do the same thing but it affects extend/trunc as well as h2f/f2h. I think the discrepancy works out here because if a target has _Float16, it will never be calling __gnu_h2f_ieee __gnu_f2h_ieee.

From your first two sentences it sounds like f16 is getting passed in a FP register but going FP->GPR->__gnu_h2f_ieee->FP->some_math_op->FP->__gnu_f2h_ieee->GPR->FP? I think it makes sense to either always pass f16 as i16 and avoid the FP registers, or make _Float16 available so COMPILER_RT_HAS_FLOAT16 can be used.

@uweigand mentioned figuring out an ABI for _Float16, is this possible? That seems like the best option.

A quick check seems to show that GCC 13 does not support _Float16 on s390x, nor does the crossbuild libgcc.a provide __gnu_h2f_ieee, __gnu_f2h_ieee, __extendhfsf2, or __truncsfhf2. So I think LLVM will be the one to set the precedent here.

Note that there are some common issues with these conversions, would probably be good to test against them if possible #97981 #97975.

@uweigand
Copy link
Member

My understanding is that in GCC's __gnu_h2f_ieee/__gnu_f2h_ieee is always i32<->i16 (integer ABI), then __extendhfsf2/__truncsfhf2 uses either int16_t or _Float16 on a per-target basis as controlled by __LIBGCC_HAS_HF_MODE__ (I don't know where this gets set). In LLVM compiler-rt, COMPILER_RT_HAS_FLOAT16 is the control to do the same thing but it affects extend/trunc as well as h2f/f2h. I think the discrepancy works out here because if a target has _Float16, it will never be calling __gnu_h2f_ieee __gnu_f2h_ieee.

From what I can see in the libgcc sources, __gnu_h2f_ieee/__gnu_f2h_ieee is indeed always i32<->i16, but it is only present on 32-bit ARM, no other platforms. On AArch64, GCC will always use inline instructions to perform the conversion. On 32-bit and 64-bit Intel, the compiler will use inline instructions if AVX512-FP16 is available; if not, but SSE2 is available, the compiler will use __extendhfsf2/__truncsfhf2 with a HFmode argument (this corresponds to _Float16, i.e. it is passed in SSE2 registers, not like an integer); if not even SSE2 is available, using the type will result in an error.

I never see __extendhfsf2/__truncsfhf2 being used with int16_t, even in principle, on any platform in libgcc. There is indeed a setting __LIBGCC_HAS_HF_MODE__ (controlled indirectly by the GCC target back-end's TARGET_LIBGCC_FLOATING_POINT_MODE_SUPPORTED_P setting), but the only thing that appears to be controlled by this flag is whether routines for complex multiplication and division (__mulhc3 / __divhc3) are being built. Am I missing something here?

From your first two sentences it sounds like f16 is getting passed in a FP register but going FP->GPR->__gnu_h2f_ieee->FP->some_math_op->FP->__gnu_f2h_ieee->GPR->FP? I think it makes sense to either always pass f16 as i16 and avoid the FP registers, or make _Float16 available so COMPILER_RT_HAS_FLOAT16 can be used.

@uweigand mentioned figuring out an ABI for _Float16, is this possible? That seems like the best option.

Yes, we're working on that. What we're planning to do is to have _Float16 be passed and returned in the same way as float and double, i.e. using (part of) certain floating-point registers. These registers are available on every SystemZ architecture level, so we would not have to guard their use (like Intel does with the SSE2 registers).

A quick check seems to show that GCC 13 does not support _Float16 on s390x, nor does the crossbuild libgcc.a provide __gnu_h2f_ieee, __gnu_f2h_ieee, __extendhfsf2, or __truncsfhf2. So I think LLVM will be the one to set the precedent here.

Yes, we'd have to add those. I don't think we want __gnu_h2f_ieee or __gnu_f2h_ieee as those are ARM-only. We'd be defining and using __extendhfsf2 and __truncsfhf2, which would be defined with _Float16 arguments passed in floating-point registers. Either way, we should define the same set of routines (with the same ABI) in libgcc and compiler-rt.

Note that there are some common issues with these conversions, would probably be good to test against them if possible #97981 #97975.

Thanks for pointing this out!

@tgross35
Copy link
Contributor

tgross35 commented Oct 23, 2024

From what I can see in the libgcc sources, __gnu_h2f_ieee/__gnu_f2h_ieee is indeed always i32<->i16, but it is only present on 32-bit ARM, no other platforms. On AArch64, GCC will always use inline instructions to perform the conversion. On 32-bit and 64-bit Intel, the compiler will use inline instructions if AVX512-FP16 is available; if not, but SSE2 is available, the compiler will use __extendhfsf2/__truncsfhf2 with a HFmode argument (this corresponds to _Float16, i.e. it is passed in SSE2 registers, not like an integer); if not even SSE2 is available, using the type will result in an error.

I never see __extendhfsf2/__truncsfhf2 being used with int16_t, even in principle, on any platform in libgcc. There is indeed a setting __LIBGCC_HAS_HF_MODE__ (controlled indirectly by the GCC target back-end's TARGET_LIBGCC_FLOATING_POINT_MODE_SUPPORTED_P setting), but the only thing that appears to be controlled by this flag is whether routines for complex multiplication and division (__mulhc3 / __divhc3) are being built. Am I missing something here?

I think this is accurate, libgcc just appears to (reasonably) not provide any f16-related symbols on platforms where GCC doesn't support _Float16. LLVM does seem to use __gnu_h2f_ieee and __gnu_f2h_ieee though, on targets where Clang doesn't have _Float16 (e.g. PowerPC, Wasm, x86-32 without SSE), which is why it shows up in the current state of this PR. Presumably this is HasLegalHalfType?

For that reason we just always provide the symbols in rust's compiler-builtins (though we let LLVM figure out that f16 is i16).

@uweigand mentioned figuring out an ABI for _Float16, is this possible? That seems like the best option.

Yes, we're working on that. What we're planning to do is to have _Float16 be passed and returned in the same way as float and double, i.e. using (part of) certain floating-point registers. These registers are available on every SystemZ architecture level, so we would not have to guard their use (like Intel does with the SSE2 registers).

That is great news, especially considering how problematic the target-feature-dependent ABI on x86-32 has been.

@JonPsson1
Copy link
Contributor Author

Patch reworked:

  • Make f16 a legal type using FP16BitRegClass in order to properly model in/out args with physregs.

  • Add FP16 register class (not "VR16"), and have everything work correctly also without vector support.

  • Conversion functions added as libcalls, taking args in fp registers.

(twoaddr-kill.mir test updated as the hard-coded register class enum value for GRH32BitRegClass has changed.)

Still some more points to go over, but it seems to be working fairly well at this point.

  • Todo:
    • vector f16..?
    • Support strict F16 as well?
    • atomic memops?
    • Maybe check SystemZTTI cost functions to make sure they do not give low costs for vector operations?
    • F16 vector constants, loads ands stores are not needed (at least currently).

@JonPsson1
Copy link
Contributor Author

Patch improved further:

  • Atomic memops handled.

  • Spill/reload
    Handled in loadRegFromStackSlot() and storeRegToStackSlot(). VRegs can be used here which
    makes it straightforward, but special sequences needed (without using VSTE/VLE).

  • __fp16:
    HalfArgsAndReturns=true => __fp16 arguments allowed.
    Tests added.

  • f16 vectors:
    Tests added. All seems to work.

  • strict fp:
    Again the question of conversion functions:
    IDS::STRICT_FP_ROUND/STRICT_FP_EXTEND needs to be lowered to something, but not sure
    if that requires special treatment, or if the same conversion functions can be used.
    Maybe wait with strict fp16?

Copy link
Member

@uweigand uweigand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a full review, but some general comments inline.

setOperationAction(ISD::FCOS, VT, Expand);
setOperationAction(ISD::FSINCOS, VT, Expand);
setOperationAction(ISD::FREM, VT, Expand);
setOperationAction(ISD::FPOW, VT, Expand);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these be Promote just like all the other f16 operations? Expand triggers a libcall, which doesn't match the excess-precision setting - also, we actually don't have f16 libcalls in libm ...

Copy link
Contributor Author

@JonPsson1 JonPsson1 Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, if there are no f16 libcalls it works to have them be promoted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just crosslinking that there is an effort to add f16 libcalls #95250 but I have no clue what the plan is as far as lowering to them.

@uweigand
Copy link
Member

uweigand commented Mar 5, 2025

I think that compiler-rt should provide all six conversion routines (f16<->f32, f16<->f64, f16<->f128), simply because libgcc has them and GCC-generated code will use them, and compiler-rt is supposed to be usable as drop-in replacement.

Also, I think LLVM should ideally generate calls to all those six routines, like GCC does. Not only is this probably the most efficient (the f16<->f32 and f16<->f64 routines should be pretty much equal in performance), but more importantly, in the truncate case, using only a single truncation is more exact due to avoiding double-rounding effects.

@JonPsson1
Copy link
Contributor Author

Does s390x have +soft-float or any features that toggle the availability of float conversion ops?

Yes, we support soft-float, and as I wrote earlier I am worried what happens with soft-float and conversion functions. I saw that with soft-float, the same conversion functions are called, but with args in integer registers, while with hard-float they go in fp-registers. For that to work, there would have to be another library with the same functions but with the soft-float abi. I am not sure if compiler-rt could be built for soft-float, but perhaps the gcc lib has them, in which case it seem reasonable to allow that ABI change in the tests. The user just has to provide a soft-float library for the conversion functions as well.

I guess we should provide all six conversion functions (for hard float) then, per your comments.

@uweigand
Copy link
Member

uweigand commented Mar 5, 2025

To clarify about soft-float - we do support -msoft-float as a compiler option, but there is no actual soft-float library provided anywhere; never has been. The compiler option is still useful in particular for the case of the Linux kernel: kernel code does not actually have any floating-point operations or even data types at the C source level, but the compiler would normally still be free to use floating-point instruction and registers (or even vector instruction and registers). This would conflict with the fact that the kernel does not save / restore floating-point or vector registers at system call (or interrupt) entry or exit. Therefore, the kernel uses -msoft-float to indicate to the compiler to never touch any floating-point or vector register in generated code. Because the kernel does not use floating-point operations at the source level, the compiler can achieve this goal without ever emitting any call to a soft-float library routine.

There is no user-space code that uses -msoft-float on SystemZ, and that wouldn't really make sense anyway - every SystemZ hardware in existence has full support for floating-point operations natively (IEEE-32, -64, and -128).

To bring this back to the original topic of floating-point support in the compiler runtime libraries: there are a small number of such routines today, but those are not intended for use with soft-float (they do expect the normal hard-float ABI, i.e. passing floating-point values in floating-point registers), but rather implement a few corner-case operations (like int128 <-> float conversion) where we actually do not have hardware instructions. Any new f16-related compiler runtime library routine would fall into this category, not the soft-float category.

@tgross35
Copy link
Contributor

tgross35 commented Mar 5, 2025

If there isn't any reason to be consistent with the other LLVM targets then agreed, using the direct libcalls seems better. The new library support could likely land separately, right? As long as the lowering is correct, considering this PR is already pretty expansive.

To bring this back to the original topic of floating-point support in the compiler runtime libraries: there are a small number of such routines today, but those are not intended for use with soft-float (they do expect the normal hard-float ABI, i.e. passing floating-point values in floating-point registers), but rather implement a few corner-case operations (like int128 <-> float conversion) where we actually do not have hardware instructions. Any new f16-related compiler runtime library routine would fall into this category, not the soft-float category.

From https://llvm.godbolt.org/z/sr6aW5e1e it looks like +soft-float means f64 gets passed in integer registers, it doesn't seem to still use the float ABI (unless %f and %r refer to the same registers, I don't know much about s390x). This is what I expect however including passing f16 as u16 - as far as I know other targets do something similar

Yes, we support soft-float, and as I wrote earlier I am worried what happens with soft-float and conversion functions. I saw that with soft-float, the same conversion functions are called, but with args in integer registers, while with hard-float they go in fp-registers. For that to work, there would have to be another library with the same functions but with the soft-float abi. I am not sure if compiler-rt could be built for soft-float, but perhaps the gcc lib has them, in which case it seem reasonable to allow that ABI change in the tests. The user just has to provide a soft-float library for the conversion functions as well.

For reference, this is what we do for Rust. Some more details, as an aside:

We have started splitting +soft-float into separate targets since accidentally mixing softfloat and hardfloat is an ABI footgun (more here). So kernels use e.g. aarch64-unknown-none-softfloat, loongarch64-unknown-none-softfloat, which, behind the scenes, is the same target but adds +soft-float everywhere. The difference is this lets us distribute a different version of our builtins library and libm between the softfloat and hardfloat version, which gives us functional float support OOTB. Not that you generally want the large-ish softfloat routines on embedded or kernel anyway, but it works if you do.

There isn't actually a softfloat 390x target currently but this will need to be added at some point for RfL support (it's trivial).

@uweigand
Copy link
Member

uweigand commented Mar 6, 2025

For the missing libfunctions, I see these files in compiler-rt/lib/builtins:

compiler-rt/lib/builtins/extendhfsf2.c
compiler-rt/lib/builtins/extendhftf2.c
compiler-rt/lib/builtins/extendhfxf2.c

For double, I get undefined reference to `__extendhfdf2' from clang, and in extendhfxf2.c the function seems to be called '__extendhfxf2' instead. Should a wrapper be added for the gcc compatible name whis is emitted?

No, XF is a different mode from DF. There should be a file extendhfdf2.c analogous to the others, but using

#define SRC_HALF
#define DST_DOUBLE

This seems to be missing currently.

For f128, I get undefined reference to `__extendhftf2'. My guess is that CRT_HAS_TF_MODE needs to be defined, which is done in compiler-rt/lib/builtins/int_types.h, but I am not quite sure how to do it. Would anybody know the preferred way?

This should be defined by the existing logic in int_types.h as far as I can see. We have

#if defined(CRT_HAS_128BIT) && defined(CRT_HAS_F128)
#define CRT_HAS_TF_MODE
#endif

and both conditions should be true on s390x. On the one hand, we have

#if defined(__LP64__) || defined(__wasm__) || defined(__mips64) ||             \
    defined(__SIZEOF_INT128__) || defined(_WIN64)
#define CRT_HAS_128BIT
#endif

and we should be defining __LP64__.

On the other hand, we have

#elif __LDBL_MANT_DIG__ == 113 ||                                              \
    (__FLT_RADIX__ == 16 && __LDBL_MANT_DIG__ == 28)
// Use long double instead of __float128 if it matches the IEEE 128-bit format
// or the IBM hexadecimal format.
#define CRT_LDBL_128BIT
#define CRT_HAS_F128

and we should match the __LBDL_MANT_DIG__ == 113 condition.

So this should already work. If it doesn't, you'll have to debug the build process here (e.g. look at preprocessed output to see what's actually going on here).

For truncations, there seems to be a __truncdfhf2 but not __trunctfhf2...

I'm pretty sure that would be the same CRT_HAS_TF_MODE problem then.

If there isn't any reason to be consistent with the other LLVM targets then agreed, using the direct libcalls seems better. The new library support could likely land separately, right? As long as the lowering is correct, considering this PR is already pretty expansive.

Sure, that probably should land separately. I'd still like to at least have an implementation ahead of time just so we can properly test the LLVM back-end support ...

There isn't actually a softfloat 390x target currently but this will need to be added at some point for RfL support (it's trivial).

Thanks for pointing this out. I agree, we need that target for s390x then as well. (And probably soon - the kernel folks are already trying to enable Rust for Linux on s390x.)

@uweigand
Copy link
Member

uweigand commented Mar 6, 2025

So this should already work. If it doesn't, you'll have to debug the build process here (e.g. look at preprocessed output to see what's actually going on here).

Ah, sorry, now I see what's going on. In this line you added to the cmake logic:

set(s390x_SOURCES ${GENERIC_SOURCES})

you need to add ${GENERIC_TF_SOURCES} as well, otherwise all the TF files won't get built at all.

@llvmbot llvmbot added the llvm:SelectionDAG SelectionDAGISel as well label Mar 7, 2025
@JonPsson1
Copy link
Contributor Author

JonPsson1 commented Mar 7, 2025

Thanks for the help.

  • new file extendhfdf2.c (and added to GENERIC_SOURCES list)
  • $GENERIC_TF_SOURCES added to s390x_SOURCES. This did not work immediately. However, if I removed the check for COMPILER_RT_HAS_FLOAT16 in the source files, it builds. Not sure why/if that is needed there (there is no check for it in e.g. extendhfdf2.c), or how it could be set.

Still problem with wrong-code: All conversions gave wrong-code when building compiler-rt with gcc, but when bootstrapping it (by using -DLLVM_ENABLE_RUNTIMES=compiler-rt), now only the double case fails. The clang assembly files are identical, but the libfunctions are different when I compared with libgcc (correct output) and compiler-rt. I stepped through my small program in gdb and found that as expected when returning from libgcc after the first extension, the register holds 1.5. When doing the same from compiler-rt it doesn't have the correct value. I checked that the ABI is correct at least - first instruction is moving f0 to r0, which seems right. At this point I wouldn't know what to look for to debug it further.

My test program:

#define T2 double

T2 __attribute__((noinline)) fun2(T2 Op0, T2 Op1) {
  return Op0 + Op1;
}

volatile _Float16 A;
volatile _Float16 B;
int main() {
  A = 1.5;
  B = 0.5;
  T2 Res2 = fun2(((T2) A), ((T2) B));
  _Float16 Res = ((_Float16) Res2);
  return Res < 1.0 ? 11 : 3;   // Res = 2.0 => return 3  
}

clang -target s390x-linux-gnu -march=z16 ./test4.c -O3 -o ./a.out --rtlib=libgcc -save-temps ; ./a.out; echo $?
3

clang -target s390x-linux-gnu -march=z16 ./test4.c -O3 -o ./a.out --rtlib=compiler-rt -save-temps ; ./a.out; echo $?                         
11

@tgross35
Copy link
Contributor

tgross35 commented Mar 7, 2025

To clarify, the code is calling __extendhfdf2 then __truncdfhf2 from either libgcc or from compiler-rt with your patches, and the compiler-rt version is incorrect? Could you have it print the intermediate results as u16 hex (Op0 + Op1, Res2, Res)?

Not sure if you are testing only on s390x but there should probably be a unit test at https://github.com/llvm/llvm-project/tree/d90423e310482bdbc731242fa25dcb3dd44e69de/compiler-rt/test/builtins/Unit to see if things work on other platforms.

@JonPsson1
Copy link
Contributor Author

To clarify, the code is calling __extendhfdf2 then __truncdfhf2 from either libgcc or from compiler-rt with your patches, and the compiler-rt version is incorrect? Could you have it print the intermediate results as u16 hex (Op0 + Op1, Res2, Res)?

Yes, my test program is calling either libgcc or compiler-rt implementaions of the same function. Yes, on this branch.

                                      libgcc                    compiler-rt

vlreph  %v0, 0(%r1)                   0x3e00             (1.5)  0x3e00
brasl   %r14, __extendhfdf2@PLT       0x3ff8000000000000        0x3fc0000000000000
ldr     %f8, %f0
vlreph  %v0, 0(%r13)                  0x3800             (0.5)  0x3800
brasl   %r14, __extendhfdf2@PLT       0x3fe0000000000000        0x3f00000000000000
ldr     %f2, %f0
ldr     %f0, %f8
brasl   %r14, fun2@PLT                0x4000000000000000 (2.0)  0x3fc0010000000000
brasl   %r14, __truncdfhf2@PLT        0x4000                    0x3000
brasl   %r14, __extendhfsf2@PLT       0x40000000                0x3e000000

Not sure if you are testing only on s390x but there should probably be a unit test at https://github.com/llvm/llvm-project/tree/d90423e310482bdbc731242fa25dcb3dd44e69de/compiler-rt/test/builtins/Unit to see if things work on other platforms.

I am testing on s390x, and I guess it's unfortunate that there is no such test for hfdf - this is the file that was missing and that I added. I guess it should be added once it is confirmed to work properly (or if someone would provide the right values). The tests that are already there have been passing always while working on this, even when building with gcc.

@tgross35
Copy link
Contributor

tgross35 commented Mar 8, 2025

Nothing about the implementation stands out to me as wrong. It looks like the exponent isn't correct, I guess you could step through this portion

dstExp = (dst_rep_t)srcExp + (dst_rep_t)(dstExpBias - srcExpBias);
dstSigFrac = (dst_rep_t)srcSigFrac << (dstSigFracBits - srcSigFracBits);
and double check the intermediate values? Our implementation is doing something similar except it adds in place rather than shifting back and forth, but I don't know why that would make a difference https://github.com/rust-lang/compiler-builtins/blob/7bec089672eb5cd83d7902edd59479527bc9d8d1/src/float/extend.rs#L42-L45.

By the way, these sites are pretty helpful for double checking float reprs https://float.exposed https://weitz.de/ieee/.

@uweigand
Copy link
Member

uweigand commented Mar 8, 2025

  • However, if I removed the check for COMPILER_RT_HAS_FLOAT16 in the source files, it builds. Not sure why/if that is needed there (there is no check for it in e.g. extendhfdf2.c), or how it could be set.

This is set if the host compiler used to build compiler-rt supports the _Float16 type, see this in compiler-rt/lib/builtins/CMakeLists.txt:

      check_c_source_compiles("_Float16 foo(_Float16 x) { return x; }
                               int main(void) { return 0; }"
                              COMPILER_RT_HAS_${arch}_FLOAT16)

If this is not set, you need to figure out why it is not using the correct host compiler. I think it should be using the clang built from the same sources, which should now support _Float16 - if it doesn't, this may cause other problems, potentially even explaining the wrong results you're seeing.

return __extendXfYf2__(a);
}

COMPILER_RT_ABI float __gnu_h2d_ieee(src_t a) { return __extendhfdf2(a); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you should define the __gnu_h2d_ieee or the __aeabi_h2d names here - those are not defined by the ARM standard or by other compilers today. This function should only be present under the __extendhfdf2 name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok - removed.

@JonPsson1
Copy link
Contributor Author

If this is not set, you need to figure out why it is not using the correct host compiler. I think it should be using the clang built from the same sources, which should now support _Float16 - if it doesn't, this may cause other problems, potentially even explaining the wrong results you're seeing.

It seems that with a completely new build with the new cmake config the test actually passes:

-- Performing Test COMPILER_RT_HAS_s390x_FLOAT16
-- Performing Test COMPILER_RT_HAS_s390x_FLOAT16 - Success

I previously got confused here after rerunning cmake in the same folder as I thought it was enough to see the newly built clang being used (ninja -v) with compiler-rt. It seems maybe cmake itself got confused and used system gcc for its checks while building with clang? Anyway, it works now, sorry.

I also found the problem with the miscompile (after some debugging, sigh), that the file I copied (extendhfsf.c) has float hard coded as return type, even though it defines DST_SINGLE. extendhftf2.c uses dst_t as return type, but I see other files using the type directly, so I will leave those files and extendhftf2.c as they were. I guess I have a motivating case to use dst_t and src_t everywhere, but that's another issue. So the miscompile was due to a final conversion from double to float before returning. My bad.

While adding the corresponding (to extendsfdf2_test.c) test I tried inserting an obvious error, but this did not cause it to fail, not even with ninja check-all, which I thought covered all tests (ninja check-compiler-rt did not report it either). How can I run these tests?

I tried building (same way as before) and running the existing extendhfsf2_test.c but got:

error in test__extendhfsf2(0x7e00) = 0.000000, expected nan

This was unexpected, and I get the same error with libgcc.

By the way, these sites are pretty helpful for double checking float reprs https://float.exposed https://weitz.de/ieee/.

I tried float.exposed but I couldn't really convert an f16 hex to a double hex. Is it supposed to be able to do this?

@tgross35
Copy link
Contributor

tgross35 commented Mar 9, 2025

By the way, these sites are pretty helpful for double checking float reprs https://float.exposed https://weitz.de/ieee/.

I tried float.exposed but I couldn't really convert an f16 hex to a double hex. Is it supposed to be able to do this?

It should, on the half page you can paste the int repr into "Raw Hexadecimal Integer Value", then click float or double at the top. I don't use it too much for that though, mostly exploding into sign/exponent/mantissa when debugging soft floats.

@JonPsson1
Copy link
Contributor Author

-- A few more tests added covering
alloca
access into aggregate type with half elements with getelementptr.
atomicrmw

Would a test for { half, half, half, half } in Swift-return.ll make sense?

I think testing coverage now is fairly ok - can't think of any more instructions to handle although not all various combinations and uses of them have been duplicated from float (Test for the new extendhfdf2.cpp function is still todo).

-- Manged to build and run lbm with _Float16 by simply changing the element type o LBM_Grid from double to _Float16. It ran without crashing, although 4x quicker which I guess/hope is due to some value check that stops execution somewhere. I confirmed that the executable indeed contains a lot of conversion calls and vlreph and more.

-- Another try to activate the tests for the builtins, but so far no luck. In compiler-rt/test/builtins/CMakeLists.txt, the librt_has_XXX list is built, but I tried

diff --git a/compiler-rt/test/builtins/Unit/extendhfsf2_test.c b/compiler-rt/test/builtins/Unit/extendhfsf2_test.c
index 86150e8fb0d7..c85188e76ed3 100644
--- a/compiler-rt/test/builtins/Unit/extendhfsf2_test.c
+++ b/compiler-rt/test/builtins/Unit/extendhfsf2_test.c
@@ -1,5 +1,5 @@
 // RUN: %clang_builtins %s %librt -o %t && %run %t
-// REQUIRES: librt_has_extendhfsf2
+
 
 #include <stdio.h>
 
@@ -12,7 +12,7 @@ int test__extendhfsf2(TYPE_FP16 a, uint32_t expected)
     float x = __extendhfsf2(a);
     int ret = compareResultF(x, expected);
 
-    if (ret){
+    if (true){
         printf("error in test__extendhfsf2(%#.4x) = %f, "
                "expected %f\n", toRep16(a), x, fromRep32(expected));
     }

ninja check-all

but no failure...

Experiment with soft-promotion in FP regs (not working).
Try to make f16 legal instead
Atomic loads/stores, spill/reload, tests for __fp16 and half vectors.
strict f16 with tests.
Review
Make use of vector facility if present.
Use 4-byte spill size in case of no vector support.
Build the generic compiler-rt sources for s390x.
Don't set libcall names.
@llvm.s390.tdc, fcopysign, strict_fminimum/fmaximum.
More tests for f16, but not complete.
libfuncs built also to double and long double.
@JonPsson1
Copy link
Contributor Author

@Meinersbur @petrhosek @smeenai @danliew-apple

Reaching out here to some people who seem to have been working with this CMake file: Would you have any idea how I could activate and run the compiler-rt tests for s390x (see above for my unsuccessful experiments)? Thanks.

@JonPsson1
Copy link
Contributor Author

Thanks to @boomanaiden154 for pointing out that I need to pass -DCOMPILER_RT_BUILD_BUILTINS=ON to cmake - the tests now builds and runs.

The test for half->double conversions added, with the 64-bit hex values taken from the compiler-rt conversion function results (should be correct, right :). Wanted to use the makeXXX16() functions as much as possible, but not sure what to pass to makeQNaN16(uint16_t rand), so left that as it was, which also should be fine, I suppose.

@JonPsson1
Copy link
Contributor Author

Have nothing more to do for this now, so waiting for review.

if (!useSoftFloat()) {
// Promote all f16 operations to float, with some exceptions below.
for (unsigned Opc = 0; Opc < ISD::BUILTIN_OP_END; ++Opc)
setOperationAction(Opc, MVT::f16, Promote);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought Promote for half types is broken and you need to use SoftPromoteHalf for correct rounding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens is that calls to compiler-rt are emitted for conversions and as we keep all f16 in fp-registers always, this seems to work fine.

I thought SoftPromoteHalf is for targets that use i16 for f16?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SoftPromoteHalf is a type legalizer action not an OperationAction. Promote as an OperationAction should work for f16.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikic are you referring to #97975 and #97981? It would probably be good to add a test against those issues here if there isn't already.

Cc issue author @beetrees

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the second one, it should be sufficient to ensure that i16<->f16 conversions are asm-only with no libcalls (it's possible this is tested somewhere and I'm just overlooking it in the large diff)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Yes, this should work then and are tested, see fp-half-move.ll.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:SystemZ clang:codegen IR generation bugs: mangling, exceptions, etc. clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category compiler-rt:builtins compiler-rt llvm:ir llvm:SelectionDAG SelectionDAGISel as well
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SystemZ Backend: Add support for operations such as FP16_TO_FP and FP_TO_FP16
9 participants