The original benchmark YCSB measures databases performance based on generated input. However, for geospatial workloads there is a need to have the same input to measure geospatial operations on the same data. Hence, the benchmark should be adapted to consume data from the file input and include spatial operations in workload measurements.
- YCSB
- Couchbase fork of YCSB
- Blogpost about creating input from JSON file in YCSB
- GeoYCSB paper with spatial benchmark for MongoDB and Couchbase
- MongoDB fork of YCSB with updates for Mongo 6
- This repository: adjustment (modernizing) and rewriting parts of GeoYCSB. ArangoDB & PolyphenyDB support.
- To get here, use https://ycsb.site
- Our project docs
- The original announcement from Yahoo!
- Couchbase fork of YCSB
- Reading JSON in YCSB - Couchbase blogpost
- MongoDB fork of YCSB
- Scientific paper of GeoYCSB
- Spatial Benchmark
-
Download the latest release of YCSB:
curl -O --location https://github.com/brianfrankcooper/YCSB/releases/download/0.17.0/ycsb-0.17.0.tar.gz tar xfvz ycsb-0.17.0.tar.gz cd ycsb-0.17.0
-
Set up a database to benchmark. There is a README file under each binding directory.
-
Run YCSB command.
On Linux:
bin/ycsb.sh load basic -P workloads/workloada bin/ycsb.sh run basic -P workloads/workloada
On Windows:
bin/ycsb.bat load basic -P workloads\workloada bin/ycsb.bat run basic -P workloads\workloada
Running the ycsb
command without any argument will print the usage.
See https://github.com/brianfrankcooper/YCSB/wiki/Running-a-Workload for a detailed documentation on how to run a workload.
See https://github.com/brianfrankcooper/YCSB/wiki/Core-Properties for the list of available workload properties.
GeoYCSB contains geospatial workloads to benchmark databases performance on the real input.
For this, the GeoWorkload.java
is created to be able to use the real GeoJSON data
and generate queries with geospatial operations.
In addition, GeoYCSB introduces new workload settings:
geo_insert
geo_update
geo_scan
geo_near
geo_box
geo_intersect
GeoWorkload
requires a Memcached
instance to sample values from the storage-based generator
by fetching pre-generated values/documents from an internal in-memory database instead of
generating new random values on the fly. It could be configured in the workload settings:
geo_storage_host
geo_storage_port
Database client has to be extended with geo operations. Currently, MongoDB, Couchbase, ArangoDB, and PolyphenyDB are extended. Other databases could be adapted in the same way.
For example, to run a geo benchmark on the MongoDB:
-
Create MongoDB database and populate a collection (
usertable
was defined in YCSB, this convention is also kept in GeoYCSB) with GeoJSON values. Create geospatial index on the GeoJSON field. -
Load the benchmark:
bin/ycsb.sh load mongodb -P workloads/geo/workloadga -p mongodb.url="mongodb://localhost:27017/ycsb?w=1" -p mongodb.auth="true"
- Run the benchmark:
bin/ycsb.sh run mongodb -P workloads/geo/workloadga -p mongodb.url="mongodb://localhost:27017/ycsb?w=1" -p mongodb.auth="true"
YCSB requires the use of Maven 3; if you use Maven 2, you may see errors such as these.
To build the full distribution, with all database bindings:
mvn clean package
To build a single database binding:
mvn -pl site.ycsb:mongodb-binding -am clean package