-
Notifications
You must be signed in to change notification settings - Fork 0
/
section 5: Design-tables-fact-dimensions
160 lines (142 loc) · 10.9 KB
/
section 5: Design-tables-fact-dimensions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
Design Dimension Tables for Star schemas and Snowflake Schemas:
Foundational Concepts:
Dimensions: Provide context for measurements (facts) in data warehousing terminology.
Dimension Table vs. Dimension: Not interchangeable terms; a dimension refers to the concept, while a dimension table is a specific implementation in a relational database.
Primary Keys: Each row in a relational database needs a unique primary key. In data warehousing, surrogate keys are preferred over natural keys for primary keys.
Foreign Keys: Definition: Links to primary keys of other tables to establish relationships.
Star Schema: Structure: Single table with all necessary fields.
Snowflake Schema: Structure: Normalized into separate tables for each level of hierarchy.
Rules for Snowflake Schema:
Non-terminal Dimensions: Have a primary key (surrogate key) and foreign key (surrogate key from higher level).
Terminal Dimensions: Highest level in hierarchy, has only a primary key (surrogate key).
Understanding these schema types and key principles helps in designing effective dimensional tables for data warehousing.
Each schema type—star and snowflake—offers different advantages depending on the data structure and hierarchical relationships needed.
Foreign keys play a crucial role in maintaining these relationships across tables.
This foundational understanding sets the stage for designing robust data warehouse architectures tailored to specific business needs.
Four Main Type of Data Warehousing Fact Tables:
It's important to understand that a fact (the actual measurement) is not the same as a fact table (the structure where these measurements are stored).
Here are the four types of fact tables we will discuss:
Transaction Fact Table
Periodic Snapshot Fact Table
Accumulating Snapshot Fact Table
Factless Fact Table
Transaction Fact Table:
Used for recording facts or measurements from individual transactions that occur in source systems.
Managed at the appropriate level of detail within the data warehouse.
Periodic Snapshot Fact Table:
Tracks measurements at regular intervals, providing periodic readings of specific metrics.
Not focused on individual transactions but on periodic snapshots.
Accumulating Snapshot Fact Table:
Tracks the progress of a well-defined business process through its various stages.
Provides snapshots at different points in time to show the progression.
Factless Fact Table:
Recording the occurrence of an event or transaction without needing to record any actual measurements.
Documenting coverage or eligibility relationships, even if no event occurred.
Role of transaction Fact Tables:
It's known as the transaction grained fact table, referring to the level of detail present in any given transaction fact table.
Overview of Transaction Fact Tables
Core of Dimensional Models:
Transaction fact tables are central to our dimensional models.
Most of the fact tables you build will be of this type.
Function:
They store the facts or measurements from transactions occurring in the business world, governmental agencies, schools, or any organization for which the data warehouse is built.
Structure:
You can store more than one fact in a single fact table, but there are rules governing which facts should be stored together and which should be kept separate.
Context for Facts:
Facts alone are insufficient for meaningful data analysis.
Context is necessary to make informed decisions based on data.
Structuring a Fact Table
You might think the fact table should look like a report, but that's not the case. Instead, we use surrogate keys to provide context.
Surrogate Keys: Use surrogate keys, not natural keys, as primary and foreign keys.
Linking Fact and Dimension Tables
Foreign Key Relationships: The surrogate keys in the fact table act as foreign keys, linking to primary keys in the dimension tables.
Transaction fact tables are essential for capturing and storing detailed transaction data.
They use surrogate keys to link to dimension tables, providing the necessary context for the facts they contain.
This structure allows for efficient data management and meaningful analysis in a data warehousing environment.
Summary
Transaction Fact Tables:
Store facts related to transactions occurring in the business.
Must follow rules to determine if multiple facts can be stored together.
Key Rules:
Facts must be at the same grain (level of detail).
Facts must occur simultaneously.
Primary and Foreign keys for Fact tables:
Fact Tables: Primary keys are composed of foreign keys linking to dimension tables.
Dimension Tables: Use surrogate keys as primary keys, and natural keys from source systems.
Primary Key: Unique identifier, often a combination of surrogate keys in fact tables.
Foreign Key: Links fact tables to dimension tables.
Natural Key: Unique identifier from source systems, not used as the primary key in fact tables.
Role of periodic snapshot fact tables:
This type of fact table captures regular, periodic measurements of data, providing an aggregated, easy-to-access view of data over time.
There are two main types of periodic snapshot fact tables:
Periodic Snapshot Fact Table: Linked to transactional data but aggregated for periodic analysis.
Periodic Snapshot Fact Table (Non-Transactional): Measures levels that aren't directly linked to transactional data.
Key Characteristics
Periodic Measurement: Captures data at regular intervals (e.g., weekly, monthly).
Simplified Analysis: Facilitates easier and more efficient analysis of periodic trends.
Two Types:
Transactional Periodic Snapshot: Derived from transactional data.
Benefits:
Aggregated Data: Provides a summary view, facilitating easier trend analysis.
Simplified Queries: Reduces complexity in SQL queries for periodic analysis.
Clear Business Insights: Helps track and analyze regular balances or levels efficiently.
Non-Transactional Periodic Snapshot: Measures levels without underlying transactions.
Summary
Periodic Snapshot Fact Tables: Used for regular, periodic measurements.
Two Types: Transactional and Non-Transactional.
Simplified Analysis: Facilitates easier trend analysis and reduces query complexity.
Structure: Both types have similar structures, differing only in the source of their data.
periodic snapshots and semi-additive facts:
Semi-Additive Facts: Essential for certain types of analysis in periodic snapshot fact tables.
Operations: Cannot be summed over time but can be averaged or summed within a specific time period.
Use Case: Periodic snapshot fact tables are invaluable for measuring and analyzing regular levels over time,
such as end-of-week balances or periodic measurements in non-transactional scenarios.
Role of Accumulating Snapshot Fact Tables:
The accumulating snapshot fact table. This type is used to track the progress of a business process with formally defined stages.
Purpose of Accumulating Snapshot Fact Tables
These fact tables are designed to:
Measure elapsed time spent in each stage of a business process.
Track both completed and in-progress phases.
Provide a snapshot of where one or more business processes are within their life cycles.
Include multiple relationships from the fact table back to a single dimension table, unlike the one-to-one relationships in other fact tables.
Accumulating snapshot fact tables are powerful tools for tracking the progress and elapsed time of business processes with multiple stages.
They help in understanding the life cycle of processes and in making data-driven decisions based on the stages of these processes.
Accumulating Snapshot Fact Table Ex:
Accumulating snapshot fact tables are powerful tools for tracking the progress of complex business processes, allowing for detailed analysis and easy visualization of the data flow.
By examining a single row of data, you can analyze the entire process for one or more students, making it easier to identify patterns, averages, and total amounts at different stages.
This type of fact table simplifies the analysis of business processes with multiple stages and participants.
Factless Fact Tables:
Fact tables in a data warehouse typically house the measurements we want to track, but a fact is not synonymous with a fact table.
The factless fact table, however, serves two unique purposes:
Recording the occurrence of a transaction without any measurable metrics.
Documenting a formal record of coverage or eligibility relationships.
Factless fact tables are versatile tools in a data warehouse for tracking the occurrence of events and relationships without measurable metrics.
They simplify the analysis of data involving transactions or associations where the presence of records alone signifies meaningful events or relationships.
Fact Tables Comparison of Star schema vs snowflake Schema:
Hierarchical Data in Fact Tables: In a data warehouse, we often deal with hierarchical data.
Star Schema for Fact Tables: In a star schema, the hierarchical information is consolidated into a single dimension table.
Snowflake Schema for Fact Tables: In a snowflake schema, the hierarchical data is normalized into multiple dimension tables.
Efficient Design for Snowflake Schema Fact Tables:
Instead of including multiple foreign keys, we follow a similar approach to the star schema:
Include only the lowest level foreign key (Faculty_Key) in the fact table.
Use the relationships defined in the dimension tables to navigate the hierarchy.
Summary
Star Schema:
All hierarchical data in a single dimension table.
Fact table contains a single foreign key to the dimension table.
Snowflake Schema:
Hierarchical data normalized into multiple dimension tables.
Fact table contains only the lowest level foreign key.
Use joins to navigate the hierarchy in queries.
This approach ensures a streamlined and effective design for fact tables in both star and snowflake schemas.
SQL for Dimension and fact tables:
Star Schema Dimension Table: In a star schema, dimension tables typically contain denormalized data.
Snowflake Schema Non-Terminal Dimension Table: Non-terminal dimension tables in a snowflake schema have foreign keys pointing to higher-level dimension tables.
Snowflake Schema Terminal Dimension Table: Terminal dimension tables in a snowflake schema have no further levels in the hierarchy and thus no foreign keys.
Transaction-Grained Fact Table: Transaction-grained fact tables record individual events or transactions.
Periodic Snapshot Fact Table: Periodic snapshot fact tables capture data at regular intervals.
In summary, the SQL statements for creating tables in a data warehouse follow a consistent pattern:
Define columns with data types and constraints.
Specify primary keys for all tables.
Use foreign keys to establish relationships between fact tables and dimension tables.
Ensure that foreign keys reference the appropriate primary keys in their respective dimension tables.