Skip to content
GitHubDiscordThreads

PySpark Dataframe Transformation

Migrating a history Scala project to Python, I find some tips that can help me forget the type system in scala. Feel good!!! :smile:

You have to create a case class for each data model in Scala, whiledataclass is your alternative in python

@dataclass()
class Event:
event_id: int
event_name: str
spark = (
SparkSession.builder.master("local[*]")
.appName("test")
.getOrCreate()
)
d = [
Event(1, "abc"),
Event(2, "ddd"),
]
# Row object
df = spark.createDataFrame(Row(**e.__dict__) for e in d)
df.show()
# +--------+----------+
# |event_id|event_name|
# +--------+----------+
# | 1| abc|
# | 2| ddd|
# +--------+----------+