Tested on Linux Ubuntu-based system with >= 8GB RAM.
-
Ensure that Hadoop is properly set as per instructions in HadoopSetup.md
-
Download the binary: https://dlcdn.apache.org/spark/spark-3.1.2/spark-3.1.2-bin-without-hadoop.tgz
-
Extract it at location of choice (
DIR
). -
Export spark-related variables and update the environment
PATH
variable.
export SPARK_HOME=<absolute path to DIR/spark-3.1.2-bin-without-hadoop>
export PATH=$PATH:$SPARK_HOME/bin
- Create copy of
DIR/spark-3.1.2-bin-without-hadoop/conf/spark-env.sh.template
asDIR/spark-3.1.2-bin-without-hadoop/conf/spark-env.sh
.
Add the line export SPARK_DIST_CLASSPATH=$(hadoop classpath)
to DIR/spark-3.1.2-bin-without-hadoop/conf/spark-env.sh