Skip to content

mikeymo394/Banxware-Home-Assignment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

  1. SETTING UP ENVIRONMENT

This procedure outlines the steps to complete the given home assignment. As the first step of setting up the environment was already given. A ‘www.snowflake.com’ account with username was created. A database named ‘home_assignment’ was also created on the Snowflake account.

Open a command prompt. Go to Search in Start menu and find Command Prompt.

Once the Command Prompt is opened, type ‘pip install dbt-snowflake’ to install dbt (Data Built Tool) and Press Enter.

Verify the installing of dbt by running or typing ‘dbt --version’ into the Command Prompt.

The configuration given was used to create the ‘profiles.yml’ using the text document or Notepad. The content of the ‘profile.yml’ file is as follows to fit the Snowflake system:

dbt_banxware_assignment:

outputs:

dev:

type: snowflake

threads: 4

account: <your_snowflake_account_details>

database: <snowflake_database_name> -in this project was 'home_assignment'.

user:

password:

schema: <your_snowflake_schema_name>

warehouse: COMPUTE_WH –Obtained from Snowflake system

role: ACCOUNTADMIN –Obtained from Snowflake system

target: dev

Using the ‘cd’ command, navigate to the 'dbt_banxware_assignment' folder and test the connection using the command ‘dbt debug’. Output is shown below with a successful connection and ‘All Check Passed!’ prompt.

2. DATA INGESTION

Load the two CSV Files (`sales.csv` and `customers.csv`) given in the data folder into Snowflake. The `sales.csv` and `customers.csv` files was placed inside the `seeds/` directory. A ‘dbt_project.yml’ file was created to define the seed files in the ‘dbt_banxware_assgnment’ folder. Details is given as follow:

name: dbt_banxware_assignment

version: '1.0.0'

config-version: 2

profile: dbt_banxware_assignment

model-paths: ["models"]

analysis-paths: ["analyses"]

test-paths: ["tests"]

seed-paths: ["seeds"]

macro-paths: ["macros"]

snapshot-paths: ["snapshots"]

target-path: "target"

clean-targets:

- "target"

- "dbt_packages"

Use the command `dbt seed` to Ingest CSV Files into the Snowflake account.

3. DATA TRANSFORMATION

The `transformed_sales_data.sql` file was created in the model folder with a text document. Details of the model used to create the file is given as:

with sales as (

select

order_id,

order_date,

customer_id,

product_id,

product_name,

quantity,

price,

order_status,

-- Cast order_date to DATE

cast(order_date as date) as order_date_casted,

-- Extract year, month, day from the casted date

extract(year from cast(order_date as date)) as order_year,

extract(month from cast(order_date as date)) as order_month,

extract(day from cast(order_date as date)) as order_day,

-- Calculate total sales amount

quantity * price as total_sales_amount

from {{ ref('sales') }}

)

select * from sales

Run the model with the Command Prompt ‘dbt run’.

4. DATA ANALYSIS

In answering the questions in the given task, the following sql scripts was saved in a queries folder in the dbt_banxware_assignment folder.

Question 1: Top 5 Products by Total Sales Amount in 2023

select

product_id,

sum(total_sales_amount) as total_sales

from RAW.transformed_sales_data

where order_year = 2023

group by product_id

order by total_sales desc

limit 5;

Saved as `Q1_top_5_products_total_sales_2023`.

Question 2: Top 5 Customers by Total Sales Amount in 2023

select

customer_id,

sum(total_sales_amount) as total_sales

from RAW.transformed_sales_data

where order_year = 2023

group by customer_id

order by total_sales desc

limit 5;

Saved as `Q2_names_top_5_customers_total_sales_2023`.

Question 3: Average Order Value for Each Month in 2023

select

order_month,

avg(total_sales_amount) as avg_order_value

from RAW.transformed_sales_data

where order_year = 2023

group by order_month

order by order_month;

Saved as `Q3_average_order_value_month_2023`.

Question 4: Customer with Highest Order Volume in October 2023

select

customer_id,

count(order_id) as order_count

from RAW.transformed_sales_data

where order_year = 2023

and order_month = 10

group by customer_id

order by order_count desc

limit 1;

Saved as ` Q4_customer_highest_order_volume_october_2023`.