Databricks Wheel Job
Databricks Jobs
Section titled “Databricks Jobs”Recently I successfully deploy my python wheel to Databricks Cluster.
Here are some tips if you plan to deploy pyspark.
pysparkprojectpytest
pyspark project
Section titled “pyspark project”My previous spark project is
scalabased and I useIDEAtocompileandtestconveniently.:smile::smile::smile:
DatabricksJob nice UI save your time to createJARjob.
This is official guide: Databricks Wheel Job
What I did:
-
Initialize a python project
Terminal window # create python virtual environmentpython -m venv pyspark_venv# active your venvsource pyspark_venv/bin/activate# check your current pythonwhich python# install python libpip install uv ruff pyspark pytest wheel## if pip failed at proxy error## adding your proxy## --proxy http://proxy:port# create your projectuv init --package <your package name>After
uvcommand complete, a nice python project is created.Terminal window pyspark-app├── README.md├── pyproject.toml└── src└── pyspark_app└── __init__.py -
:exclamation: pyspark
entry point- add one file
__main__.pyin pyspark_app - modify
[project.scripts]inpyproject.tomland this isentry pointof Databricks job
Now the project is
Terminal window pyspark-app├── README.md├── pyproject.toml└── src└── pyspark_app├── __init__.py└── __main__.py - add one file
pytest
Section titled “pytest”Please check your pytest installed.
Let create a new package test
pyspark-app├── README.md├── pyproject.toml└── src └── pyspark_app ├── __init__.py ├── __main__.py └── test ├── __init__.py ├── conftest.py └── test_spark.pydef test_spark(init_spark): spark = init_spark df = spark.range(10) df.show()
""" outputTo adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).24/11/01 20:59:42 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicablePASSED [100%]+---+| id|+---+| 0|| 1|| 2|| 3|| 4|| 5|| 6|| 7|| 8|| 9|+---+"""Now you can work on your spark application with test
wheel file
Section titled “wheel file”Final step is building wheel file
# 1. change your work directory to pyproject.toml# 2. run below commandpython -m build --wheel
# project is now changing to
pyspark-app├── README.md├── build│ ├── bdist.macosx-12.0-x86_64│ └── lib│ └── pyspark_app│ ├── __init__.py│ ├── __main__.py│ └── test│ ├── __init__.py│ ├── conftest.py│ └── test_spark.py├── dist│ └── pyspark_app-0.1.0-py3-none-any.whl├── pyproject.toml└── src ├── pyspark_app │ ├── __init__.py │ ├── __main__.py │ └── test │ ├── __init__.py │ ├── conftest.py │ └── test_spark.py └── pyspark_app.egg-info ├── PKG-INFO ├── SOURCES.txt ├── dependency_links.txt ├── entry_points.txt └── top_level.txtYour wheel file is at line 20
Go to view all at Project template