Install¶
Databricks Community Cloud¶
It's free for Community Edition: https://community.cloud.databricks.com/login.html
Locally On Windows¶
install jdk: https://www.oracle.com/au/java/technologies/downloads/
add java
binpath to system env varPATHand check java:java --versiondownload apache spark: https://spark.apache.org/downloads.html
unzip to a folder
c:/sparkand add to env var "SPARK_HOME" and addbinto env pathcheck park version
spark-submit --versioninstall pyspark
pip install pyspark[ml,mllib,sql,pandas_on_spark]copy
winutils.exefromhttps://github.com/cdarlint/winutils/tree/master/hadoop-3.3.5/bintoc:/hadoop/binand set env varHADOOP_HOMEtoc:/hadoop???