The dbt Engineers wrote community blog post on achieving realtime analytics with dbt. This project attempts to replicate what has been done by dbt engineers. The link to the blog post is posted below:
https://discourse.getdbt.com/t/how-to-create-near-real-time-models-with-just-dbt-sql/1457
Additionally, this project also attempts to compare the above approach against Snowflake Streams and Dynamic Tables.
Reference Article:
Streams and Tasks: https://quickstarts.snowflake.com/guide/getting_started_with_streams_and_tasks/index.html
Dynamic Tables: https://quickstarts.snowflake.com/guide/getting_started_with_dynamic_tables/index.html
- Lambda View (Using SQL Server. But it should be Database Agnostic)
- Snowflake Streams
- Dynamic Table in Snowflake (with dbt https://docs.getdbt.com/reference/resource-configs/snowflake-configs)
- Create directory
./realtime-dbt
- Create virtual environment for Python
python -m venv env
- Install dbt and dbt adapter for sql server
python -m pip install dbt-sqlserver
- Install pyodbc and faker library
pip install faker pyodbc
git init
git add ReadMe.md
git commit -m "first commit"
git branch -M main
git remote add origin https://github.com/aj-22/realtime-dbt.git
git push -u origin main
- Download SSMS and SQL Server Developer Edition 2022
- Ensure installation in Integrated Mode or change the security settings later to enable "SQL Server and WIndows Authentication Mode"
- Run the following query to get port number and ip address
By default the port number is 1433. But since I have multiple instances running, my port number is 1435
USE MASTER GO xp_readerrorlog 0, 1, N'Server is listening on' GO
- Run this query to create database
CREATE DATABASE analytics
- Initialize:
dbt init
- Get location of
profiles.yml
:dbt debug --config-dir
- Enter connection parameters
analytics:
target: dev
outputs:
dev:
type: sqlserver
driver: ODBC Driver 18 for SQL Server
server: localhost
port: 1435
database: analytics
schema: dev
user: sa
password: ********
encrypt: false
trust_cert: false
- Test if connection works:
dbt debug
- Update database_connect/connection.py file
- Execute
dbt seed
to load source data anddbt run
to create incremental models and fresh view - Run
loader.py
to set up event based data loading - Run
generator.py
to create the transaction files - Verify if data is getting loaded successfully in database
- Execute
dbt run
to start a batch job
- Create Snowflake free trial account
- Install DBeaver
- Install Git plugin for DBeaver
- Clone this repo in local (if not already done)
- Open the project from DBeaver's Git plugin
- Set up Snowflake Connection in DBeaver
Open three CMD Terminals
On Terminal 1, run following commands
env\Scripts\activate.bat
cd realtime-dbt\data
python loader.py
On Terminal 2, run following commands
env\Scripts\activate.bat
cd realtime-dbt\data
python generator.py
On Terminal 3, run following commands
env\Scripts\activate.bat
cd realtime-dbt\analytics
dbt run