PySpark Dataframe Transformation

author:BZdate:2024-11-15

Migration from `Scala` to `Python`

Migrating a history Scala project to Python, I find some tips that can help me forget the type system in scala. Feel good!!! :smile:

`dataclass` vs `case class`

You have to create a case class for each data model in Scala, whiledataclass is your alternative in python

1
@dataclass()
2
class Event:
3
    event_id: int
4
    event_name: str

Create Dataframe from `dataclass`

1
spark = (
2
    SparkSession.builder.master("local[*]")
3
    .appName("test")
4
    .getOrCreate()
5
)
6
d = [
7
    Event(1, "abc"),
8
    Event(2, "ddd"),
9
]
10

11
# Row object
12
df = spark.createDataFrame(Row(**e.__dict__) for e in d)
13
df.show()
14
# +--------+----------+
15
# |event_id|event_name|
16
# +--------+----------+
17
# |       1|       abc|
18
# |       2|       ddd|
19
# +--------+----------+

PySpark Dataframe Transformation

Migration from Scala to Python

dataclass vs case class

Create Dataframe from dataclass

Migration from `Scala` to `Python`

`dataclass` vs `case class`

Create Dataframe from `dataclass`