Skip to content

Latest commit

 

History

History
19 lines (12 loc) · 837 Bytes

SparkSetup.md

File metadata and controls

19 lines (12 loc) · 837 Bytes

Setting Apache Spark 3.1.2

Tested on Linux Ubuntu-based system with >= 8GB RAM.

  1. Ensure that Hadoop is properly set as per instructions in HadoopSetup.md

  2. Download the binary: https://dlcdn.apache.org/spark/spark-3.1.2/spark-3.1.2-bin-without-hadoop.tgz

  3. Extract it at location of choice (DIR).

  4. Export spark-related variables and update the environment PATH variable.

export SPARK_HOME=<absolute path to DIR/spark-3.1.2-bin-without-hadoop>
export PATH=$PATH:$SPARK_HOME/bin
  1. Create copy of DIR/spark-3.1.2-bin-without-hadoop/conf/spark-env.sh.template as DIR/spark-3.1.2-bin-without-hadoop/conf/spark-env.sh.

Add the line export SPARK_DIST_CLASSPATH=$(hadoop classpath) to DIR/spark-3.1.2-bin-without-hadoop/conf/spark-env.sh