Skip to content
This repository has been archived by the owner on Feb 22, 2020. It is now read-only.

Everlag/naive-columstore

Repository files navigation

naive-columstore

This is a simple, toy column store implemented as a result of reading 'The Design and Implementation of Modern Column-Oriented Database Systems'[1]. To keep everything sane and typesafe, this is a bespoke implementation for a dataset collected from mtgprice.

I'm starting to benchmark early and releases will be tagged for when benchmarks can be stably compared.

Testing Bootstrap

As all queries written here have direct equivalents to SQL, it makes sense to bootstrap all test results against a known-good implementation.

As a result, queries implemented here are tested against the same query run in postgres.

The following are the commands to run in psql as the postgres user to import the data file. Open psql from the repo root, \copy is used relatively to import the testing data.

/*Create a database to test under*/
CREATE DATABASE pricedata_tests WITH
	OWNER     postgres
	ENCODING 'UTF8';

\connect pricedata_tests

/*Add a schema to work under*/
CREATE SCHEMA prices;

/*Add a table to work under*/
CREATE TABLE prices.mtgprice (

	name TEXT NOT NULL,
	set TEXT NOT NULL,
	time timestamp NOT NULL,

	price int NOT NULL,
	CONSTRAINT uniqueMTGpriceEntryKey UNIQUE (name, set, time)

);

/*Import data, will take a minute or two*/
\copy prices.mtgprice from 'prices.csv' DELIMITER ',' CSV HEADER;

/*Ensure data is present, should return true*/
select (count(*) = 1000000) as valid from prices.mtgprice;

Historical Benchmarks

When settling on a bitmap to use, I performed some benchmarks. Here they are formatted with github.com/cespare/prettybench

willf/bitset ~240MB
[]bool ~270MB
RoaringBitmap/roaring ~ 200MB

willf/bitset
BenchmarkUint32More-4                                100          16320800 ns/op         262208 B/op          4 allocs/op
BenchmarkUint32Less-4                                100          15615659 ns/op         131104 B/op          2 allocs/op
BenchmarkUint32Delta-4                               100          15973511 ns/op       22995204 B/op         39 allocs/op
BenchmarkUint32Sum-4                                2000            743859 ns/op              0 B/op          0 allocs/op
BenchmarkBoolAND-4                                 50000             31699 ns/op              0 B/op          0 allocs/op
BenchmarkSelectAllMoreThanDollar-4                   100          16020921 ns/op         262208 B/op          4 allocs/op
BenchmarkFiniteString32Within-4                       50          25190530 ns/op         262241 B/op          6 allocs/op
BenchmarkSelectAllMoreThanDollarMaterial-4            20          74621560 ns/op       47334222 B/op         45 allocs/op
BenchmarkSelectAllMoreDollarLessTen-4                 50          30496544 ns/op         393313 B/op          6 allocs/op
BenchmarkSelectAllMoreDollarLessTenMaterial-4         20          76079955 ns/op       37299048 B/op         45 allocs/op
BenchmarkSelectAfterTimeWiseMidPoint-4               100          17992860 ns/op         131104 B/op          2 allocs/op
BenchmarkLatestPriceMaterial-4                        50          25104048 ns/op         266057 B/op         17 allocs/op
ok      github.com/Everlag/test 100.352s

[]bool
PASS
BenchmarkUint32More-4                                100          12300487 ns/op       5863804 B/op          35 allocs/op
BenchmarkUint32Less-4                                100          10776367 ns/op       5863804 B/op          35 allocs/op
BenchmarkUint32Delta-4                               100          17705522 ns/op      22995210 B/op          39 allocs/op
BenchmarkUint32Sum-4                                2000            760287 ns/op             0 B/op           0 allocs/op
BenchmarkBoolAND-4                                   500           3306324 ns/op             0 B/op           0 allocs/op
BenchmarkSelectAllMoreThanDollar-4                   100          12233875 ns/op       5863804 B/op          35 allocs/op
BenchmarkFiniteString32Within-4                      100          23427404 ns/op      11727673 B/op          72 allocs/op
BenchmarkSelectAllMoreThanDollarMaterial-4            20          77372820 ns/op      52935826 B/op          76 allocs/op
BenchmarkSelectAllMoreDollarLessTen-4                 50          27022548 ns/op      11727608 B/op          70 allocs/op
BenchmarkSelectAllMoreDollarLessTenMaterial-4         20          73027185 ns/op      48633354 B/op         111 allocs/op
BenchmarkSelectAfterTimeWiseMidPoint-4               100          15291031 ns/op       5863803 B/op          35 allocs/op
BenchmarkLatestPriceMaterial-4                       100          24598447 ns/op      11731457 B/op          83 allocs/op
ok      github.com/Everlag/test 104.711s


RoaringBitmap/roaring
PASS(one off errors, I skipped...)
BenchmarkUint32More-4                                 50          27632560 ns/op        595473 B/op         283 allocs/op
BenchmarkUint32Less-4                                 50          27661220 ns/op        595473 B/op         283 allocs/op
BenchmarkUint32Delta-4                               100          18106963 ns/op      22995212 B/op          40 allocs/op
BenchmarkUint32Sum-4                                2000            751310 ns/op             0 B/op           0 allocs/op
BenchmarkBoolAND-4                                 20000             96504 ns/op             0 B/op           0 allocs/op
BenchmarkSelectAllMoreThanDollar-4                    50          26896896 ns/op        595473 B/op         283 allocs/op
BenchmarkFiniteString32Within-4                      100          13598676 ns/op          4288 B/op         141 allocs/op
BenchmarkSelectAllMoreThanDollarMaterial-4            20          98438055 ns/op      47667796 B/op         341 allocs/op
BenchmarkSelectAllMoreDollarLessTen-4                 20          58682050 ns/op       1199171 B/op         568 allocs/op
BenchmarkSelectAllMoreDollarLessTenMaterial-4         10         117794540 ns/op      38105236 B/op         625 allocs/op
BenchmarkSelectAfterTimeWiseMidPoint-4                50          23716366 ns/op        297760 B/op         144 allocs/op
BenchmarkLatestPriceMaterial-4                       100          14929403 ns/op         33240 B/op         230 allocs/op
ok      github.com/Everlag/test 98.223s

References

[1] Abadi D, Boncz P, Harizopoulos S, Idreos S, Madden S. The design and implementation of modern column-oriented database systems. Now; 2013 Nov 25.