Skip to content

Implementation of a spark application using hive that implement some queries inspired by TPC-DS

License

Notifications You must be signed in to change notification settings

GiovanniPaoloGibilisco/TPC-DS-Spark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 

Repository files navigation

How to install Spark with Hive support and run tpcds queries:

  1. Install Hadoop and configure hdfs

  2. Compile Spark with hive support

git clone https://github.com/apache/spark (eventually move to the desired tag)

./make-distribution.sh --name spark-with-hive --tgz -Psparkr -Phadoop-2.6 -Dhadoop.version=2.6.2 -Phive -Phive-thriftserver -Pyarn (check that the version of hadoop you are using is the same used in the command to build spark, this takes a while...)

Install Spark in some folder (e.g. /opt/spark extracting the generated archive)

  1. Install hive

  2. Generate tpcds dataset (I'm using https://github.com/hortonworks/hive-testbench)

    • if not alreadygenerated on hdfs, put it there
  3. Load the tables in hive to setup the metastore using script: reset-metastore.sh

    • e.g. ./reset-metastore.sh 2 /data
  4. Build the spark application with the embedded queries

  5. run the query submitting the application to spark

    • spark-submit --master spark://clusterino1:7077 --class it.polimi.spark.tpcds.Query target/uber-tcp-ds-0.0.1-SNAPSHOT.jar -i /data/2 -o /output -db tpcds_text_2 -id R1
    • the db is the one created in step 5, the name is "tpcds_text_"+
    • custom queries can be executed using -q "query text" instead of the -id argument

To change the dataset size:

  1. Repeat Step 4
  2. Repeat Step 5 (optionally dropping the other database)
  3. Repeat Step 7 (as many time as needed with the required queries)

About

Implementation of a spark application using hive that implement some queries inspired by TPC-DS

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages