Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-43333

Avro to Name union type members after types

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.2
    • 3.5.0
    • Structured Streaming
    • None

    Description

      Spark converts Avro union types into record types, where each member of the union type corresponds to a field in the record type. The current behaviour is to name the record fields "member0", "member1", etc, for each member of the union type. We propose having the option to instead use the member type name.

      The purpose of this is twofold:

      1. To allow adding or removing types to the union without affecting the record names of other member types. If the new or removed type is not ordered last, then existing queries referencing "member2" may need to be rewritten to reference "member1" or "member3".
      2. Referencing the type name in the query is more readable than referencing "member0".

      For example, our system produces an avro schema from a Java type structure where subtyping maps to union types whose members are ordered lexicographically. Adding a subtype can therefore easily result in all references to "member2" needing to be updated to "member3".

      Attachments

        Activity

          People

            siying Siying Dong
            jose_gonzalez Jose Gonzalez
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: