section 5: Design-tables-fact-dimensions


Design Dimension Tables for Star schemas and Snowflake Schemas:
	Foundational Concepts:
		Dimensions: Provide context for measurements (facts) in data warehousing terminology.
		Dimension Table vs. Dimension: Not interchangeable terms; a dimension refers to the concept, while a dimension table is a specific implementation in a relational database.
		Primary Keys: Each row in a relational database needs a unique primary key. In data warehousing, surrogate keys are preferred over natural keys for primary keys.
		Foreign Keys: Definition: Links to primary keys of other tables to establish relationships.
	Star Schema: Structure: Single table with all necessary fields.
	Snowflake Schema: Structure: Normalized into separate tables for each level of hierarchy.
	Rules for Snowflake Schema:
		Non-terminal Dimensions: Have a primary key (surrogate key) and foreign key (surrogate key from higher level).
		Terminal Dimensions: Highest level in hierarchy, has only a primary key (surrogate key).
	Understanding these schema types and key principles helps in designing effective dimensional tables for data warehousing. 
	Each schema type—star and snowflake—offers different advantages depending on the data structure and hierarchical relationships needed. 
	Foreign keys play a crucial role in maintaining these relationships across tables. 
	This foundational understanding sets the stage for designing robust data warehouse architectures tailored to specific business needs.
	
Four Main Type of Data Warehousing Fact Tables:
	It's important to understand that a fact (the actual measurement) is not the same as a fact table (the structure where these measurements are stored).
	Here are the four types of fact tables we will discuss:
		Transaction Fact Table
		Periodic Snapshot Fact Table
		Accumulating Snapshot Fact Table
		Factless Fact Table
		
	Transaction Fact Table:
		Used for recording facts or measurements from individual transactions that occur in source systems.
		Managed at the appropriate level of detail within the data warehouse.
		
	Periodic Snapshot Fact Table:
		Tracks measurements at regular intervals, providing periodic readings of specific metrics.
		Not focused on individual transactions but on periodic snapshots.
	
	Accumulating Snapshot Fact Table:
		Tracks the progress of a well-defined business process through its various stages.
		Provides snapshots at different points in time to show the progression.
		
	Factless Fact Table:
		Recording the occurrence of an event or transaction without needing to record any actual measurements.
		Documenting coverage or eligibility relationships, even if no event occurred.

Role of transaction Fact Tables:
	It's known as the transaction grained fact table, referring to the level of detail present in any given transaction fact table.
	Overview of Transaction Fact Tables
		Core of Dimensional Models:
			Transaction fact tables are central to our dimensional models.
			Most of the fact tables you build will be of this type.
		Function:
			They store the facts or measurements from transactions occurring in the business world, governmental agencies, schools, or any organization for which the data warehouse is built.
		Structure:
			You can store more than one fact in a single fact table, but there are rules governing which facts should be stored together and which should be kept separate.
		Context for Facts:
			Facts alone are insufficient for meaningful data analysis.
			Context is necessary to make informed decisions based on data.
		Structuring a Fact Table
			You might think the fact table should look like a report, but that's not the case. Instead, we use surrogate keys to provide context.
			Surrogate Keys: Use surrogate keys, not natural keys, as primary and foreign keys.
		Linking Fact and Dimension Tables
			Foreign Key Relationships: The surrogate keys in the fact table act as foreign keys, linking to primary keys in the dimension tables.

	Transaction fact tables are essential for capturing and storing detailed transaction data. 
	They use surrogate keys to link to dimension tables, providing the necessary context for the facts they contain. 
	This structure allows for efficient data management and meaningful analysis in a data warehousing environment.

	Summary
		Transaction Fact Tables:
			Store facts related to transactions occurring in the business.
			Must follow rules to determine if multiple facts can be stored together.
		Key Rules:
			Facts must be at the same grain (level of detail).
			Facts must occur simultaneously.

Primary and Foreign keys for Fact tables:
	Fact Tables: Primary keys are composed of foreign keys linking to dimension tables.
	Dimension Tables: Use surrogate keys as primary keys, and natural keys from source systems.
	Primary Key: Unique identifier, often a combination of surrogate keys in fact tables.
	Foreign Key: Links fact tables to dimension tables.
	Natural Key: Unique identifier from source systems, not used as the primary key in fact tables.

Role of periodic snapshot fact tables:
	This type of fact table captures regular, periodic measurements of data, providing an aggregated, easy-to-access view of data over time.
	There are two main types of periodic snapshot fact tables:
	Periodic Snapshot Fact Table: Linked to transactional data but aggregated for periodic analysis.
	Periodic Snapshot Fact Table (Non-Transactional): Measures levels that aren't directly linked to transactional data.
	Key Characteristics
		Periodic Measurement: Captures data at regular intervals (e.g., weekly, monthly).
		Simplified Analysis: Facilitates easier and more efficient analysis of periodic trends.
		Two Types:
			Transactional Periodic Snapshot: Derived from transactional data.
				Benefits:
					Aggregated Data: Provides a summary view, facilitating easier trend analysis.
					Simplified Queries: Reduces complexity in SQL queries for periodic analysis.
					Clear Business Insights: Helps track and analyze regular balances or levels efficiently.
			Non-Transactional Periodic Snapshot: Measures levels without underlying transactions.
		Summary
			Periodic Snapshot Fact Tables: Used for regular, periodic measurements.
			Two Types: Transactional and Non-Transactional.
			Simplified Analysis: Facilitates easier trend analysis and reduces query complexity.
			Structure: Both types have similar structures, differing only in the source of their data.
		
periodic snapshots and semi-additive facts:
	Semi-Additive Facts: Essential for certain types of analysis in periodic snapshot fact tables.
	Operations: Cannot be summed over time but can be averaged or summed within a specific time period.
	Use Case: Periodic snapshot fact tables are invaluable for measuring and analyzing regular levels over time, 
		  such as end-of-week balances or periodic measurements in non-transactional scenarios.

Role of Accumulating Snapshot Fact Tables:
	The accumulating snapshot fact table. This type is used to track the progress of a business process with formally defined stages.
	Purpose of Accumulating Snapshot Fact Tables
		These fact tables are designed to:
		Measure elapsed time spent in each stage of a business process.
		Track both completed and in-progress phases.
		Provide a snapshot of where one or more business processes are within their life cycles.
		Include multiple relationships from the fact table back to a single dimension table, unlike the one-to-one relationships in other fact tables.
	Accumulating snapshot fact tables are powerful tools for tracking the progress and elapsed time of business processes with multiple stages. 
	They help in understanding the life cycle of processes and in making data-driven decisions based on the stages of these processes. 
	
Accumulating Snapshot Fact Table Ex:
	Accumulating snapshot fact tables are powerful tools for tracking the progress of complex business processes, allowing for detailed analysis and easy visualization of the data flow. 
	By examining a single row of data, you can analyze the entire process for one or more students, making it easier to identify patterns, averages, and total amounts at different stages. 
	This type of fact table simplifies the analysis of business processes with multiple stages and participants.

Factless Fact Tables:
	Fact tables in a data warehouse typically house the measurements we want to track, but a fact is not synonymous with a fact table.
	The factless fact table, however, serves two unique purposes:
		Recording the occurrence of a transaction without any measurable metrics.
		Documenting a formal record of coverage or eligibility relationships.
	Factless fact tables are versatile tools in a data warehouse for tracking the occurrence of events and relationships without measurable metrics. 
	They simplify the analysis of data involving transactions or associations where the presence of records alone signifies meaningful events or relationships.

Fact Tables Comparison of Star schema vs snowflake Schema:
	Hierarchical Data in Fact Tables: In a data warehouse, we often deal with hierarchical data.
	Star Schema for Fact Tables: In a star schema, the hierarchical information is consolidated into a single dimension table.
	Snowflake Schema for Fact Tables: In a snowflake schema, the hierarchical data is normalized into multiple dimension tables. 
	Efficient Design for Snowflake Schema Fact Tables:

		Instead of including multiple foreign keys, we follow a similar approach to the star schema:
			Include only the lowest level foreign key (Faculty_Key) in the fact table.
			Use the relationships defined in the dimension tables to navigate the hierarchy.
	Summary
		Star Schema:
			All hierarchical data in a single dimension table.
			Fact table contains a single foreign key to the dimension table.
		Snowflake Schema:
			Hierarchical data normalized into multiple dimension tables.
			Fact table contains only the lowest level foreign key.
			Use joins to navigate the hierarchy in queries.
		This approach ensures a streamlined and effective design for fact tables in both star and snowflake schemas.

SQL for Dimension and fact tables:
	Star Schema Dimension Table: In a star schema, dimension tables typically contain denormalized data.
	Snowflake Schema Non-Terminal Dimension Table: Non-terminal dimension tables in a snowflake schema have foreign keys pointing to higher-level dimension tables. 
	Snowflake Schema Terminal Dimension Table: Terminal dimension tables in a snowflake schema have no further levels in the hierarchy and thus no foreign keys.
	Transaction-Grained Fact Table: Transaction-grained fact tables record individual events or transactions.
	Periodic Snapshot Fact Table: Periodic snapshot fact tables capture data at regular intervals.
	In summary, the SQL statements for creating tables in a data warehouse follow a consistent pattern:
		Define columns with data types and constraints.
		Specify primary keys for all tables.
		Use foreign keys to establish relationships between fact tables and dimension tables.
		Ensure that foreign keys reference the appropriate primary keys in their respective dimension tables.