Tunadb is an research in-memory Database Management System aimed to enhance query compilation, job processing, and data prefetching. Through the integration of creative approaches and novel models, this project seeks to create a high-performance and efficient database solution.
Note: This is a research project. While TunaDB demonstrates promising capabilities, it is unsuitable for production use.
TunaDB adopts our MxTask-based approach to manage query execution efficiently. By breaking down large queries into smaller tasks, it optimizes resource efficiency and task parallelism, enabling fast data processing.
TunaDB's query compilation process is streamlined by using FlounderIR as a lightweight intermediate representation to comile queries into executable assembly code. It facilitates the translation of high-level inquiries to low-level execution activities by simplifying the query structure. The shipped implementation of FlounderIR improves on the original implementation by some optimizations (e.g., better register assignment and branch relocation).
TunaDB provides powerful profiling features for in-depth performance investigation. It supports inlined perf-counter profiling, as well as perf sampling of memory addresses and instructions, offering useful insights into system behavior and identifying potential bottlenecks.
Furthermore, developers can benefit from the seamless integration of third-party applications like Intel® VTune™ and perf. TunaDB will make compiled query code available for such tools. This allows for a comprehensive inspection of compiled code, enabling detailed performance evaluations and optimizations to unlock the DBMS's full potential.
Please install the following dependencies
cmake
>= 3.10
clang
>= 13
(gcc
is not tested)clang-tidy
>= 13
libnuma
orlibnuma-dev
bison
flex
libgtest-dev
for tests intest/
(optional)
git clone https://github.com/jmuehlig/mxtasking-tunadb.git
cmake . -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=clang-15 -DCMAKE_CXX_COMPILER=clang++-15
make tunadb -j4
The tunadb
binary will be located in bin/
.
Calling the binary ./bin/tunadb
will start both the server and a client in a single process.
You can now create tables, insert data, and execute queries, using the client's console.
TunaDB will start an additional web client that is available for convenient use when the --web-client
switch is added (./bin/tunadb --web-client
).
The web client allows to execute queries, show query plans (both logical and physical), show generated FlounderIR and assembly code and profile the execution.
After startup, the web console is available under
http://0.0.0.0:9100
TunaDB can execute one SQL file to initially load data before starting the server.
Use the --load <file.sql>
option.
The given SQL file may
- create tables (
CREATE TABLE <table> (...)
), - copy data from (CSV) files (
COPY <table> FROM '<file>'
), - execute further SQL files (
.LOAD FILE '<file.sql>'
), - and/or update statistics (
.UPDATE STATISTICS <table>
).
If you want to bring the data of the TPC-H benchmark into TunaDB:
-
Create a folder
sql/data/tpch
-
Generate all .tbl files and move them into
sql/data/tpch
-
Load the SQL script
sql/load_tpch.sql
:`./bin/tunadb --load sql/load_tpch.sql`
See ./bin/tunadb --help
for further options and flags.
More information about the code structure and implemented commands, data types, etc. are given in src/db/README.md.
- Jan Mühlig, Jens Teubner. Micro Partitioning: Friendly to the Hardware and the Developer. DaMoN 2023: 27-34. Read the Paper
- Henning Funke, Jan Mühlig, Jens Teubner. Low-latency query compilation. VLDB J. 31(6): 1171-1184 (2022). Read the Paper | See the original Source Code
- Jan Mühlig, Jens Teubner. MxTasks: How to Make Efficient Synchronization and Prefetching Easy. SIGMOD Conference 2021: 1331-1344. Read the Paper | See the original Source Code
- Henning Funke, Jan Mühlig, Jens Teubner. Efficient generation of machine code for query compilers. DaMoN 2020: 6:1-6:7. Read the Paper
The code is separated in four different branches:
src/application
contains stuff of MxTask-based applications (TunaDB is one of them). For guidance: Every application should be hold in a separated folder and end up in at least one binary (stored inbin/
).src/db
contains database-related implementations, such as indices, types, execution engine, etc..src/mx
includes all stuff for the task-based abstractionMxTasking
.src/flounder
includes the low-latency IR, used for jit compiling operators.src/perf
includes an implementation of in-source perf counter and sampling.
Besides TunaDB, this repository includes further task-based applications used for papers or development.
The folder src/application/blinktree_benchmark
contains the benchmark code used in our paper MxTasks: How to Make Efficient Synchronization and Prefetching Easy.
The folder src/application/radix_join_benchmark
contains the benchmark code used in our paper Micro Partitioning: Friendly to the Hardware and the Developer.
The folder src/application/hello_world
contains a task-based example for creating and spawning a simple task.
TunaDB would not be possible without the help of various external libraries.
The used libraries will be downloaded automatically (using git
) during the build process.
Special thanks to:
argparse
(view on GitHub) under MIT licensenlohmann json
(view on GitHub) under MIT licenselinenoise
(view on GitHub) under BSD-2 licensecpp-httplib
(view on GitHub) under MIT licenseasmjit
(view on GitHub) under Zlib license{fmt}
(view on GitHub) under MIT licensespdlog
(view on GitHub) under MIT licensestatic_vector
(view on GitHub) under MIT licenserobin-map
(view on GitHub) under MIT licenselibcount
(view on GitHub) under Apache-2.0 licensexxhashct
(view on GitHub) published without licenseittapi
(view on GitHub) under GPLv2 and 3-Clause BSD licenses
If you have any questions or comments, feel free to contact via mail: jan.muehlig@tu-dortmund.de.