Details
Description
Doing an outer-join using joinWith on DataFrames used to return missing values as null in Spark 2.4.8, but returns them as Rows with null values in Spark 3+.
The issue can be reproduced with the following test that succeeds on Spark 2.4.8 but fails starting from Spark 3.0.0.
The problem only arises when working with DataFrames: Datasets of case classes work as expected as demonstrated by this other test.
I couldn't find an explanation for this change in the Migration guide so I'm assuming this is a bug.
A git bisect pointed me to that commit.
Reverting the commit solves the problem.
A similar solution, but without reverting, is shown here.
Happy to help if you think of another approach / can provide some guidance.
Attachments
Issue Links
- causes
-
SPARK-44323 Scala None shows up as null for Aggregator BUF or OUT
- Open
- links to