spark_cluster

Vagrant template to provision a Spark Cluster

Details

See Vagrantfile for details and to make changes.
Spark running as a standalone cluster. Tested with Spark 2.1.x (check Spark Connector Compatability).
One head node Centos 7.4 machine and N worker (slave) machines [0-9].
Spark running in standalone cluster mode.

Prerequsites

Vagrant
Virtual Box
Vagrant Hosts Plugin: vagrant plugin install vagrant-hosts
- This allows us to provision the hosts files for all the instances.

Usage

Clone this repository.
Download a pre-built Spark package and place it into this directory; symlink as "spark.tgz".
Download the Spark Connector and place it into this directory.
Optional: Download the PostgreSQL JDBC Driver.
Open up Vagrantfile in a text editor.
Optional: Change the N_WORKERS to the number of desired worker hosts [0-9].
Feel free to make other changes, e.g. RAM and CPU for each of the machines.
When you're ready, just run vagrant up in the directory the Vagrantfile is in.
- By Default: Vagrant will spin up one "head node" and N worker nodes in a Spark standalone cluster.
- You can start a standalone instance using: vagrant up hn0

Testing

SSH in using vagrant ssh hn0 or vagrant ssh wn0.
Spark is running as root.
The Spark WebUI should be available at http://192.168.99.200:8080.

sudo jps -ml
PID org.apache.spark.deploy.master.Master --host 192.168.99.200 --port 7077 --webui-port 8080 -h 192.168.99.200

Setup Environment Vairables
- GSC_JAR=$(ls /vagrant/greenplum-spark_2.11-*.jar)
- POSTGRES_JAR=$(ls /vagrant/postgresql-*.jar)
Run SCALA

Spark Connector 1.2: Read from Greenplum with Spark Connector / Write to Greenplum with JDBC:
- sudo spark-shell --jars "${GSC_JAR},${POSTGRES_JAR}" --driver-class-path ${POSTGRES_JAR}
Spark Connector 1.3+: Read and Write to Greenplum with Spark Connector:
- sudo spark-shell --jars "${GSC_JAR}"

Cleanup

Shut down the cluster with vagrant halt and delete it with vagrant destroy.
You can always run vagrant up to turn on or build a brand new cluster.

License

See the LICENSE.txt file.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
LICENSE.txt		LICENSE.txt
README.md		README.md
Vagrantfile		Vagrantfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spark_cluster

Details

Prerequsites

Usage

Testing

Cleanup

License

About

Releases

Packages

License

brendan-stephens/spark_cluster

Folders and files

Latest commit

History

Repository files navigation

spark_cluster

Details

Prerequsites

Usage

Testing

Cleanup

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages