-
Notifications
You must be signed in to change notification settings - Fork 0
Building Shark from Source Code
This guide describes the components needed to compile and run Shark from the beginning.
The only prerequisite for this guide is that you have Java version 6 or 7 and Scala 2.9.3 installed on your machine. If you don't have Scala 2.9.3, you can download it by running:
$ wget http://www.scala-lang.org/files/archive/scala-2.9.3.tgz
$ tar xvfz scala-2.9.3.tgz
Then download our patched version of Hive and untar it:
$ wget http://spark-project.org/download-hive-0.9.0-bin.tar.tz
$ tar xvfz hive-0.9.0-bin.tar.gz
Clone the branch-0.7 branch of Spark from Github, and compile and publish Spark to your local repository:
$ git clone https://github.com/mesos/spark.git -b branch-0.7 spark-0.7
$ sbt/sbt publish-local
Clone the branch-0.7 branch of Shark from Github:
$ git clone https://github.com/amplab/shark.git -b branch-0.7 shark-0.7
Edit the shark-0.7/conf/shark-env.sh
file and change SCALA_HOME
and HIVE_HOME
to point the the right locations:
$ cd shark-0.7
$ cp conf/shark-env.sh.template conf/shark-env.sh
$ vim conf/shark-env.sh
Hive requires that /tmp
and /user/hive/warehouse/src
exist on your computer. Create them if they don't already exist.
Compile Shark:
$ sbt/sbt package
The build system uses Maven/Ivy to fetch its dependencies. If this is the first time you are building the project, it can take a while to download all the dependencies. Subsequent builds, however, will be much faster.
Once it is built, you can start the Shark CLI:
$ shark-0.7/bin/shark-withinfo
bin/shark-withinfo
is useful for development, since it outputs logging information to the Shark console. bin/shark
provides a less verbose version of the CLI.
To verify that Shark is running, you can try the following example, which creates a table with sample data:
shark> CREATE TABLE src(key INT, value STRING);
shark> LOAD DATA LOCAL INPATH '${env:HIVE_HOME}/examples/files/kv1.txt' INTO TABLE src;
shark> SELECT COUNT(1) FROM src;
shark> CREATE TABLE src_cached AS SELECT * FROM SRC;
shark> SELECT COUNT(1) FROM src_cached;