Skip to content

Running Shark Locally

Reynold Xin edited this page Oct 17, 2013 · 60 revisions

This guide describes how to get Shark running locally. It creates a small Hive installation on one machine and allows you to execute simple queries. The only prerequisite for this guide is that you have Java and Scala 2.9.3 installed on your machine. If you don't have Scala 2.9.3, you can download it by running:

$ wget http://www.scala-lang.org/files/archive/scala-2.9.3.tgz
$ tar xvfz scala-2.9.3.tgz

Download the binary distribution of Shark 0.7.0. The package contains two folders, shark-0.7.0 and hive-0.9.0-bin.

$ wget https://github.com/amplab/shark/releases/download/v0.8.0/shark-0.8.0-bin-hadoop1.tgz   # Hadoop 1/CDH3 - or -
$ wget https://github.com/amplab/shark/releases/download/v0.8.0/shark-0.8.0-bin-cdh4.tgz   # Hadoop 2/CDH4

$ tar xvfz shark-*-bin-*.tgz
$ cd shark-*-bin-*

The Shark code is in the shark-0.8.0/ directory. To setup your environment to run Shark locally, you need to set HIVE_HOME and SCALA_HOME environmental variables in a file shark-0.8.0/conf/shark-env.sh to point to the folders you just downloaded. Shark comes with a template file shark-env.sh.template that you can copy and modify to get started:

$ cd shark-0.8.0/conf
$ cp shark-env.sh.template shark-env.sh

Now edit the following two lines in shark-env.sh:

export HIVE_HOME=/path/to/hive-0.9.0-shark-0.8.0-bin
export SCALA_HOME=/path/to/scala-2.9.3

Next, create the default Hive warehouse directory. This is where Hive will store table data for native tables.

$ sudo mkdir -p /user/hive/warehouse
$ sudo chmod 0777 /user/hive/warehouse  # Or make your username the owner

You can now start the Shark CLI:

$ ./bin/shark

To verify that Shark is running, you can try the following example, which creates a table with sample data:

CREATE TABLE src(key INT, value STRING);
LOAD DATA LOCAL INPATH '${env:HIVE_HOME}/examples/files/kv1.txt' INTO TABLE src;
SELECT COUNT(1) FROM src;
CREATE TABLE src_cached AS SELECT * FROM SRC;
SELECT COUNT(1) FROM src_cached;

In addition to the Shark CLI, there are several executables in shark-0.7.0/bin:

  • bin/shark-withdebug: Runs Shark CLI with DEBUG level logs printed to the console.
  • bin/shark-withinfo: Runs Shark CLI with INFO level logs printed to the console.