PySpark Dataframe Transformation
Migration from Scala to Python
Section titled “Migration from Scala to Python”Migrating a history Scala project to Python, I find some tips that can help me
forget the type system in scala. Feel good!!! :smile:
dataclass vs case class
Section titled “dataclass vs case class”You have to create a case class for each data model in Scala,
whiledataclass is your alternative in python
@dataclass()class Event: event_id: int event_name: strCreate Dataframe from dataclass
Section titled “Create Dataframe from dataclass”spark = ( SparkSession.builder.master("local[*]") .appName("test") .getOrCreate())d = [ Event(1, "abc"), Event(2, "ddd"),]
# Row objectdf = spark.createDataFrame(Row(**e.__dict__) for e in d)df.show()# +--------+----------+# |event_id|event_name|# +--------+----------+# | 1| abc|# | 2| ddd|# +--------+----------+