Skip to content

Latest commit

 

History

History
98 lines (69 loc) · 4.37 KB

README.md

File metadata and controls

98 lines (69 loc) · 4.37 KB

RockBench

Benchmark to measure ingest throughput of a realtime database.

A real-time database is one that can sustain a high write rate of new incoming data, while at the same time allow applications to make decisions based on fresh data. There could be a time lag between when the data is written to the database and when it is visible in a query. This is called the data latency, or end-to-end latency, of the database. The data latency is different from a query latency, which is what is typically used to measure the latency of querying a database.

Data latency is one of the distinguishing factors that differentiates one real-time database from another. It is an important measure for developers of low-latency applications, like real-time personalization, IoT automation and security analytics, where speed is critical.

RockBench measures the data latency of any real-time database. It is designed to continuously stream documents in batches of fixed size to a database and also calculate and report the data latency by querying the database at fixed intervals.

The goal for rockbench: have a representative way to generate real life event data with the following characteristics:

  • Streaming Writes: records are written in streaming fashion
  • Nested Objects: records have nested objects with arrays of nested objects inside it
  • Mutable records: support both updates and inserts

Usage

You can run this directly, or through Docker container.

  • Clone the repository
git clone https://github.com/rockset/rockbench.git
  • To run directly
# Build
go build

# Send data to Rockset and report data latency
ROCKSET_API_KEY=xxxx ROCKSET_COLLECTION=yyyy WPS=1 BATCH_SIZE=50 DESTINATION=Rockset TRACK_LATENCY=true ./rockbench

# Send data to ElasticSearch and report data latency
ELASTIC_AUTH="ApiKey xxx" ELASTIC_URL=https://... ELASTIC_INDEX=index_name WPS=1 BATCH_SIZE=50 DESTINATION=Elastic TRACK_LATENCY=true ./rockbench

# Send data to Snowflake and report data latency
SNOWFLAKE_ACCOUNT=xxxx SNOWFLAKE_USER=xxxx SNOWFLAKE_PASSWORD=xxxx SNOWFLAKE_WAREHOUSE=xxxx SNOWFLAKE_DATABASE=xxxx SNOWFLAKE_STAGES3BUCKETNAME=xxxx AWS_REGION=xxxx WPS=1 BATCH_SIZE=50 TRACK_LATENCY=true DESTINATION=Snowflake ./rockbench
  • To run with Docker container
docker build -t rockset/write_generator .
docker run -e [env variable as above] rockset/write_generator

Modes

RockBench can also measure the speed of patches.

mode operation
add Perform strictly inserts (using either id scheme)
patch Perform patches on id range specified from [0, NUM_DOCS)
add_then_patch Perform add mode then patch mode

Setting NUM_DOCS to a non-negative value will limit the number of writes made and then perform patches against that document set. Patch mode must be explicitly enabled via MODE=patch or MODE=add_then_patch and the patches per second is controlled via PPS. PPS == WPS unless PPS is specified. BATCH_SIZE is used for both patching and inserting. Each patch will update a timestamp field for latency detection and also one other field/array in the document.

Patches can take on various forms, currently

  • replace: replaces random fields with roughly equivalent type and similar size
  • add: Adds new top level fields and prepends entries into the top level tags array

Specify PATCH_MODE as either 'replace' or 'add'. Default will be 'replace'.

You can also specify the _id scheme for Rockset destination to be either uuid or sequential (increasing sequential numbers) using ID_MODE

How to extend RockBench to measure your favourite realtime database

Implement the Destination interface and provide the appropriate configs required. Check Rockset and Elastic for reference. The interface has two methods:

  • SendDocument: Method to send batch of documents to the destination
  • GetLatestTimestamp: Fetch the latest timestamp from the database

Once the new source is implemented, handle it in main.go.