Pyspark, iteratively get values from column containing json string

Question

I wonder how you would iteratively get the values from a json string in pyspark. I have the following format of my data and would like to create the "value" column: id_1 id_2 json_string value 1 1001 {"1001":106, "2200":101} 106 1 2200 {"1001":106, "2200":101} 101 Which gives the error Column is not iterable However, just inserting the key manually works,

Accepted Answer

You can use it within expr() which would allow you to concat the string and id_2.data_ls = [    ("1", "1001", '''{"1001":106, "2200":101}'''),     ("1", "2200", '''{"1001":106, "2200":101}''')]data_sdf = spark.createDataFrame(data_ls, ("id1", "id2", "jstr"))# +---+----+--------------------+# |id1| id2|                jstr|# +---+----+--------------------+# |  1|1001|{"1001":106, "220...|# |  1|2200|{"1001":106, "220...|# +---+----+--------------------+data_sdf.     withColumn('val', func.expr('get_json_object(jstr, concat("$.", id2))')).     show(truncate=False)# +---+----+------------------------+---+# |id1|id2 |jstr                    |val|# +---+----+------------------------+---+# |1  |1001|{"1001":106, "2200":101}|106|# |1  |2200|{"1001":106, "2200":101}|101|# +---+----+------------------------+---+

id_1	id_2	json_string	value
1	1001	{“1001”:106, “2200”:101}	106
1	2200	{“1001”:106, “2200”:101}	101

Advertisement

Answer