TPC-DI: MS SQL Server Benchmark

The project was complete in 2021 as part of the Big Data Management and Analytics (BDMA) program for the Database Warehouse course at Universite Libre de Bruxelles (ULB) in Bruselles.

Are you a current BDMA student? Don't be shy! Reach out for insights and tips!

Project Team

The work in this repo was completed by:

Diogo Repas
Nicole Kovacs
Andres Espinal
Adam Broniewski

The files you need...

The report in the /Deliverables path provides an overview of how the benchmark was performed.
The /Helpers path has the scripts used to run the benchmark.

Note: The DimSecurity table has unresolved issues in this script the result in no data being returned when it is used. As such, use of the FactMarketHistory, FactWatches and a part of DimTrade was left out of the benchmark

Instructions for Replication

Generate files following TCP-DI instructions
Use python script to unpack FINWIRE files. This was done using Helpers/Scripts/ConvertFinwireFilesToCSV.py
Load files into MS SQL database using SSIS
Move raw files into schema named “source” in SQL table format. The schema was created using Helpers/Scripts/CreateDBTableSchema.sql
Transform and load all tables from “source” to “dbo” using main SQL script located here Helpers/Scripts/historical_load.sql

Even if you are not using SQL for the transformation, reading through the SQL script will provide you an overview of the transformations needed in whatever integration service you are using.

Have questions? Drop me a line at abroniewski@gmail.com

Benchmarking Methodology

The following tools were installed to complete the benchmark:

SQL Server 2019 Express
SQL Server Data Tools 2017 (Standalone along with Visual Studio)
Materials and programs provided by TPC-DI

The benchmark queries and logging were implemented using Microsoft SQL Server Integration Services (SSIS). The timing results were plotted in a live Tableau dashboard that collects the logging results automatically from the database. Data was generated using the TPC-DI data generator at 4 scale factors (SF):

SF 3
SF 10
SF 20
SF 30

There were two research papers used as a general reference for the TPC-DI ETL process that provided support in identifying data quality issues. These papers were:

Data Quality Problems in TPC-DI Based Data Integration Processes
TPC-DI: The First Industry Benchmark for Data Integration

A git repository was also used as a reference for the data warehouse table creation. The repository used was:

https://github.com/detobel36/tpc-di (reviewed and checked against current version of TPC-DI spec)

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Deliverables		Deliverables
Helpers		Helpers
Research Papers		Research Papers
SSIS/TPCDI ETL		SSIS/TPCDI ETL
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TPC-DI: MS SQL Server Benchmark

Project Team

The files you need...

Instructions for Replication

Benchmarking Methodology

About

Releases

Packages

Languages

abroniewski/TPC-DI-MS-SQL-Benchmark

Folders and files

Latest commit

History

Repository files navigation

TPC-DI: MS SQL Server Benchmark

Project Team

The files you need...

Instructions for Replication

Benchmarking Methodology

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages