DBKeys

A collection of database benchmarks and micro-benchmarks. This project will evolve in a test-harness, that defines a set of desired micro-benchmarks. Each benchmark is defined by its input and output. The test-harness provides the input to each competitor, times the executions and verifies the produced output.

Code Usage

make 
./dbkeys <benchmark id> <params>

Micro-Benchmarks

Aggregation

SELECT G, SUM(S)
FROM table
GROUP BY G;

This micro-benchmark executes the SQL query above, which groups all rows of a table based on the values of column G. For each group, we sum the values of column S for all rows that belong to said group. Example:

Relation: Students
--------------------------------------------
| Student Name | Major | # Enrolled Courses | 
--------------------------------------------
|      A       |   CS  |          6         |
|      B       |   EE  |          2         |
|      C       |   EE  |          5         |
|      D       |   EE  |          2         |
|      E       |   CS  |          2         |
|      F       |   EE  |          1         |
|      G       |   CS  |          0         |
--------------------------------------------

SELECT Major, SUM(# Enrolled Courses)
FROM Students
GROUP BY Major

-----------------------------------
| Major | SUM(# Enrolled Courses) |
-----------------------------------
| CS    |          8              |
| EE    |         10              |
-----------------------------------

Parameteres that affect an aggregation:

Parameter	Value
Input Size	> 0
#Unique Groups	[1, Input Size]

What are good ranges for the aggregation parameters?

Input Size: [10^4, 10^9] rows

Unique Groups: [10, Input Size)/2] rows

Note: Of course our initial experiments do not have to scale to inputs with 1B rows. We can start with small inputs < 10M and examine the cases of interest for each architecture. For example, for CAPE we considered a few cases; the entire input fits in the associative memory, the input fits in a CPU cache (best cache for our competitor), the input is many times larger than the associative memory. We similarly varied the #unique groups; the groups fit in the associative memory, the cpu cache, many times larger than both.

Implementation Details

Both columns of the input are 32-bit integer numbers.

Running the aggregation benchmark

Execute the aggregation microbenchmark:

./dbkeys agg <input_size> <no_unique_groups>

Join

SELECT *
FROM fact, dimension
WHERE fact.FA = dimension.B;

This micro-benchmark executes the SQL query above, which join tables fact and dimension based on the values of attributes fact.A and dimension.b.
Example:

Relation: fact
-------------------
|  FA |  FB |  FC | 
-------------------
|  2  |  -  |  -  |
|  3  |  -  |  -  |
|  5  |  -  |  -  |
|  2  |  -  |  -  |
-------------------

Relation: dimension
-------------
|  DA |  DB | 
-------------
|  1  |  -  |
|  2  |  -  |
|  3  |  -  |
|  4  |  -  |
|  5  |  -  |
-------------

-------------------------------
| FA | DA | FB | FC | DA | DB |
-------------------------------
| 2 |  2  |  - |  - |  - |  - |
| 3 |  3  |  - |  - |  - |  - |
| 5 |  5  |  - |  - |  - |  - |
| 2 |  2  |  - |  - |  - |  - |
-------------------------------

Execute the join microbenchmark:

./dbkeys join <fact table size> <dimension table size>

Notes about joins

The fact table is usually many times larger that a dimension table (1:12 ratio). Each row of the fact table matches to one row of the dimension table (pk-fk join).

What are good ranges for the join parameters?

Fact Size: [10^5 - 10^9] rows

Dimension Size: [10^4 - 10^8] rows

Note: Of course our initial experiments do not have to scale to inputs with 1B rows. We can start with small inputs < 10M and examine the cases of interest for each acceleerator.

Implementation Details

Both FA and DA columns are 32 bit integers.

TODO

Discuss how to do verification of results

References

Castle Paper

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
c_kernels		c_kernels
common		common
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
ProjectGoal.md		ProjectGoal.md
README.md		README.md
main.c		main.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DBKeys

Code Usage

Micro-Benchmarks

Aggregation

Parameteres that affect an aggregation:

What are good ranges for the aggregation parameters?

Implementation Details

Running the aggregation benchmark

Join

Notes about joins

What are good ranges for the join parameters?

Implementation Details

TODO

References

About

Releases

Packages

Contributors 2

Languages

License

UWHustle/DBKeys

Folders and files

Latest commit

History

Repository files navigation

DBKeys

Code Usage

Micro-Benchmarks

Aggregation

Parameteres that affect an aggregation:

What are good ranges for the aggregation parameters?

Implementation Details

Running the aggregation benchmark

Join

Notes about joins

What are good ranges for the join parameters?

Implementation Details

TODO

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages