Skip to content

Latest commit

 

History

History
24 lines (17 loc) · 665 Bytes

README.md

File metadata and controls

24 lines (17 loc) · 665 Bytes

sparkioref

Benchmark the IO performance of Apache Spark (Scala/Python). Currently supported: csv/json, parquet, FITS.

Run the benchmark

Edit the run_benchmark.sh file with your data and cluster configuration, and launch it using

./run_benchmark.sh

Example

Configuration:

  • Spark 2.3.1
  • HDFS 2.8.4
  • Input dataset: 1,100,000,000 objects (x, y, z)
  • 153 cores (9 executors), 300 GB RAM total
  • No cache: 100 iterations (data distributed and read)