Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.3.2
-
None
Description
Spark converts Avro union types into record types, where each member of the union type corresponds to a field in the record type. The current behaviour is to name the record fields "member0", "member1", etc, for each member of the union type. We propose having the option to instead use the member type name.
The purpose of this is twofold:
- To allow adding or removing types to the union without affecting the record names of other member types. If the new or removed type is not ordered last, then existing queries referencing "member2" may need to be rewritten to reference "member1" or "member3".
- Referencing the type name in the query is more readable than referencing "member0".
For example, our system produces an avro schema from a Java type structure where subtyping maps to union types whose members are ordered lexicographically. Adding a subtype can therefore easily result in all references to "member2" needing to be updated to "member3".