-
Notifications
You must be signed in to change notification settings - Fork 327
Running Shark Locally
This guide describes how to get Shark running locally. It creates a small Hive installation on one machine and allows you to execute simple queries. The only prerequisite for this guide is that you have Java and Scala 2.9.3 installed on your machine. If you don't have Scala 2.9.3, you can download it by running:
$ wget http://www.scala-lang.org/files/archive/scala-2.9.3.tgz
$ tar xvfz scala-2.9.3.tgz
Download the binary distribution of Shark 0.7.0. The package contains two folders, shark-0.7.0
and hive-0.9.0-bin
.
$ wget http://spark-project.org/download/shark-0.7.0-hadoop1-bin.tgz # Hadoop 1/CDH3 - or -
$ wget http://spark-project.org/download/shark-0.7.0-hadoop2-bin.tgz # Hadoop 2/CDH4
$ tar xvfz shark-0.7.0-*-bin.tgz
The Shark code is in the shark-0.7.0/
directory. To setup your environment to run Shark locally, you need to set HIVE_HOME and SCALA_HOME environmental variables in a file shark-0.7.0/conf/shark-env.sh
to point to the folders you just downloaded. Shark comes with a template file shark-env.sh.template
that you can copy and modify to get started:
$ cd shark-0.7.0/conf
$ cp shark-env.sh.template shark-env.sh
Now edit the following two lines in shark-env.sh:
export HIVE_HOME=/path/to/hive-0.9.0-bin
export SCALA_HOME=/path/to/scala-2.9.3
Next, create the default Hive warehouse directory. This is where Hive will store table data for native tables.
$ sudo mkdir -p /user/hive/warehouse
$ sudo chmod 0777 /user/hive/warehouse # Or make your username the owner
You can now start the Shark CLI:
$ ./bin/shark
To verify that Shark is running, you can try the following example, which creates a table with sample data:
CREATE TABLE src(key INT, value STRING);
LOAD DATA LOCAL INPATH '${env:HIVE_HOME}/examples/files/kv1.txt' INTO TABLE src;
SELECT COUNT(1) FROM src;
CREATE TABLE src_cached AS SELECT * FROM SRC;
SELECT COUNT(1) FROM src_cached;
In addition to the Shark CLI, there are several executables in shark-0.7.0/bin
:
-
bin/shark-withdebug
: Runs Shark CLI with DEBUG level logs printed to the console. -
bin/shark-withinfo
: Runs Shark CLI with INFO level logs printed to the console.