Skip to content

Commit

Permalink
Changing README to pint to Wiki
Browse files Browse the repository at this point in the history
  • Loading branch information
pwendell committed Oct 10, 2012
1 parent 62589ef commit 761b7b2
Showing 1 changed file with 1 addition and 73 deletions.
74 changes: 1 addition & 73 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,76 +6,4 @@ modification to the existing data nor queries. Shark supports Hive's query langu
metastore, serialization formats, and user-defined functions.


## Build

Shark requires Hive 0.7.0 and Spark (0.4-SNAPSHOT).

Get Hive from Apache:

$ export HIVE_HOME=/path/to/hive
$ wget http://archive.apache.org/dist/hive/hive-0.7.0/hive-0.7.0-bin.tar.gz
$ tar xvzf hive-0.7.0-bin.tar.gz
$ mv hive-0.7.0-bin $HIVE_HOME

Get Spark from Github, compile, and publish to local ivy:

$ git clone https://github.com/mesos/spark.git spark
$ cd spark
$ sbt/sbt publish-local

Get Shark from Github:

$ git clone git://github.com/amplab/shark.git shark
$ cd shark

Before building Shark, first modify the config file:

$ conf/shark-env.sh

Compile Shark (make sure `$HIVE_HOME` is set in `conf/shark-env.sh`):

$ sbt/sbt products


## Execution

There are several executables in /bin:

* `shark`: Runs Shark CLI.
* `shark-withinfo`: Runs Shark with INFO level logs printed to the console.
* `shark-withdebug`: Runs Shark with DEBUG level logs printed to the console.
* `shark-shell`: Runs Shark scala console. This provides an experimental feature
to convert Hive QL queries into `TableRDD`.
* `clear-buffer-cache.py`: Automatically clears OS buffer caches on Mesos EC2
clusters. This is handy for performance studies.


## Runtime Configuration

Shark reuses Hive's configuration files, which are loaded from `$HIVE_HOME/conf`.

We also include a few Shark-specific configuration parameters that can be set
in the same way as you would set configuration parameters in Hive (e.g. from the
Shark CLI):

shark> set shark.exec.mode = [hive | shark (default)]
shark> set shark.explain.mode = [hive | shark (default)]


## Caching

Shark caches tables in memory as long as their name ends in "`_cached`". For example,
if you have a table named "test", you can create a cached version of it as follows:

shark> CREATE TABLE test_cached AS SELECT * FROM test;


References
----------
For information on setting up Hive or HiveQl, please read:
https://cwiki.apache.org/confluence/display/Hive/GettingStarted

For information on Spark, please read:
https://github.com/mesos/spark


# For current documentation, see the [Shark Project Wiki](https://github.com/amplab/shark/wiki)

0 comments on commit 761b7b2

Please sign in to comment.