Article python
Databricks Wheel Job
Categories
Databricks Jobs
Recently I successfully deploy my python wheel to Databricks Cluster.
Here are some tips if you plan to deploy pyspark.
pysparkprojectpytest
pyspark project
My previous spark project is
scalabased and I useIDEAtocompileandtestconveniently.:smile::smile::smile:
DatabricksJob nice UI save your time to createJARjob.
This is official guide: Databricks Wheel Job
What I did:
-
Initialize a python project
# create python virtual environment python -m venv pyspark_venv # active your venv source pyspark_venv/bin/activate # check your current python which python # install python lib pip install uv ruff pyspark pytest wheel ## if pip failed at proxy error ## adding your proxy ## --proxy http://proxy:port # create your project uv init --package <your package name>After
uvcommand complete, a nice python project is created.pyspark-app ├── README.md ├── pyproject.toml └── src └── pyspark_app └── __init__.py -
:exclamation: pyspark
entry point- add one file
__main__.pyin pyspark_app - modify
[project.scripts]inpyproject.tomland this isentry pointof Databricks job
Now the project is
pyspark-app ├── README.md ├── pyproject.toml └── src └── pyspark_app ├── __init__.py └── __main__.py - add one file
pytest
Please check your pytest installed.
Let create a new package test
pyspark-app
├── README.md
├── pyproject.toml
└── src
└── pyspark_app
├── __init__.py
├── __main__.py
└── test
├── __init__.py
├── conftest.py
└── test_spark.py
def test_spark(init_spark):
spark = init_spark
df = spark.range(10)
df.show()
""" output
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/11/01 20:59:42 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
PASSED [100%]+---+
| id|
+---+
| 0|
| 1|
| 2|
| 3|
| 4|
| 5|
| 6|
| 7|
| 8|
| 9|
+---+
"""
Now you can work on your spark application with test
wheel file
Final step is building wheel file
# 1. change your work directory to pyproject.toml
# 2. run below command
python -m build --wheel
# project is now changing to
pyspark-app
├── README.md
├── build
│ ├── bdist.macosx-12.0-x86_64
│ └── lib
│ └── pyspark_app
│ ├── __init__.py
│ ├── __main__.py
│ └── test
│ ├── __init__.py
│ ├── conftest.py
│ └── test_spark.py
├── dist
│ └── pyspark_app-0.1.0-py3-none-any.whl
├── pyproject.toml
└── src
├── pyspark_app
│ ├── __init__.py
│ ├── __main__.py
│ └── test
│ ├── __init__.py
│ ├── conftest.py
│ └── test_spark.py
└── pyspark_app.egg-info
├── PKG-INFO
├── SOURCES.txt
├── dependency_links.txt
├── entry_points.txt
└── top_level.txt
Your wheel file is at line 20
Go to view all at Project template