Implementing a Data Warehouse with SQL Server Jump Start
37 Slides449.08 KB
Implementing a Data Warehouse with SQL Server Jump Start Richard Currey Senior Technical Trainer–New Horizons United George Squillace Senior Technical Trainer–New Horizons Great Lakes
Meet Richard Currey Senior Technical Trainer – New Horizons United – Focused on database and development technologies – MCDBA, MCITP Dev / Admin / BI, MCSE Data Platform, BI – MCSD, MCPD Web Developer, ASP .NET Developer, Windows Developer 28 Years Industry Experience – Designed, developed and managed BI-centric projects at several Fortune 500 organizations – Extensive consulting and project management background
Meet George SQUILLACE Senior Technical Trainer – New Horizons Great Lakes (20 Years) – “SQL” is in his name! – Focused on database technologies – MCT Since 1997 28 Years Industry Experience – SQL Server 2012: certified as MCSA, MCSE: Data Platform, & MSCE: Business Intelligence – Certified in every version of SQL Server since SQL 2000 – Certified in every version of Windows Server from NT 4.0 through Windows Server 2008 – Certified in Exchange Server 5.5 through Exchange
Course Modules Implementing a Data Warehouse with SQL Server 01 Design and Implement Dimensions and Fact Tables 04 Control Flow 02 Data Flow - Extract Data 05 Configure and Deploy SSIS 03 Data Flow - Transform Data 06 Manage Enterprise Data
Setting Expectations Target Audience – Data warehousing specialists who want to expand their knowledge of SQL Server Integration Services (SSIS) – Database professionals who want to take exam 70-463 and get certified in data warehouse implementations Suggested Prerequisites/Supporting Material – SQL Server development experience and exposure to extract, transform, and load (ETL) processes – Course 10777, Implementing a Data Warehouse with Microsoft SQL Server 2012 – MS Press Book: Training Kit (Exam 70-463): Implementing a Data Warehouse with Microsoft SQL Server 2012
01 Design and Implement Dimensions and Fact Tables Click to edit Master subtitle style Richard Currey Senior Technical Trainer–New Horizons United George Squillace Senior Technical Trainer–New Horizons Great Lakes
Module 1 Overview Schema Design: Star vs. Snowflake Facts and Fact Tables Fact and Dimension Granularity Conformed and Non-Conformed Dimensions Time Dimensions
Topic: Schema Design: Star vs. Snowflake
Topic: Schema Design: Star versus Snowflake Star Schema Snowflake Schema Processing and Performance Considerations
Star Schema DimSalesPerson DimSalesPerson SalesPersonKey SalesPersonKey SalesPersonName SalesPersonName StoreName StoreName StoreCity StoreCity StoreRegion StoreRegion A star schema has a single table for each dimension Each table supports all attributes for that dimension Typically a denormalized solution DimCustomer DimCustomer CustomerKey CustomerKey CustomerName CustomerName City City Region Region DimProduct DimProduct ProductKey ProductKey ProductName ProductName ProductLine ProductLine SupplierName SupplierName FactOrders CustomerKey SalesPersonKey ProductKey ShippingAgentKey TimeKey OrderNo LineItemNo Quantity Revenue Cost Profit DimDate DimDate DateKey DateKey Year Year Quarter Quarter Month Month Day Day DimShippingAgent DimShippingAgent ShippingAgentKey ShippingAgentKey ShippingAgentName ShippingAgentName
DEMO Implementing a Star Schema
Snowflake Schema More normalized solution Typically contains multiple tables per dimension Each table contains dimension key, value, and the foreign key value for the parent DimStore DimStore StoreKey StoreKey StoreName StoreName GeographyKey GeographyKey DimGeography DimGeography GeographyKey GeographyKey City City Region Region DimCustomer DimCustomer CustomerKey CustomerKey CustomerName CustomerName GeographyKey GeographyKey DimProductLine DimProductLine ProductLineKey ProductLineKey ProductLineName ProductLineName DimSalesPerson DimSalesPerson SalesPersonKey SalesPersonKey SalesPersonName SalesPersonName StoreKey StoreKey FactOrders CustomerKey SalesPersonKey ProductKey ShippingAgentKey TimeKey OrderNo LineItemNo Quantity Revenue Cost Profit DimProduct DimProduct ProductKey ProductKey ProductName ProductName ProductLineKey ProductLineKey SupplierKey SupplierKey DimDate DimDate DateKey DateKey Year Year Quarter Quarter Month Month Day Day DimShippingAgent DimShippingAgent ShippingAgentKey ShippingAgentKey ShippingAgentName ShippingAgentName DimSupplier DimSupplier SupplierKey SupplierKey SupplierName SupplierName
DEMO Implementing a Snowflake Schema
Processing and Performance Considerations Star schema requires de-normalization during the load process – Can impact the ETL times Snowflake schema can increase dimension complexity – Can impact Analysis Services solutions, negatively affecting cube performance
Topic: Facts and Fact Tables
Topic: Facts and Fact Tables What Is a Fact? Grouping Facts What Is Granularity? Design Considerations
What Is a Fact? Facts are the key metrics used to measure business results: – Sales – Production – Inventory Can be additive, semi-additive, or non-additive
Grouping Facts Facts are grouped into fact tables Related facts should be in the same fact table Facts with different granularity should be in different tables
What Is Granularity? Granularity refers to the level of detail in which facts are recorded Facts can be at different levels of granularity
Design Considerations Fact tables should have all keys relating to dimensions Primary key should be composite of all dimension keys Separate additive, semi-additive, and non-additive facts
Topic: Fact and Dimension Granularity
Topic: Fact and Dimension Granularity How to Determine Fact Granularity Dimension Granularity
How to Determine Fact Granularity Granularity is determined based on business needs Should be the lowest level of detail that needs to be examined If data from transactional systems has more detail than needed for analysis, ETL should aggregate the details
Dimension Granularity Dimension granularity needs to be matched with fact granularity Each dimension has its own granularity Fact tables are keyed to the granularity of the dimensions
Topic: Conformed and Non-Conformed Dimensions
Topic: Conformed and Non-Conformed Dimensions What Are Conformed and Non-Conformed Dimensions? Shared and Degenerate Dimensions What Is a Slowly Changing Dimension?
What Are Conformed and Non-Conformed Dimensions? Conformed dimension – Shared by multiple fact tables – Used when all business users have the same definitions for the dimension Non-conformed dimension – Dimension table targeted to a single fact table – Used when dimensions have different definitions for different business units
DEMO Creating Conformed and Non-Conformed Dimensions
Shared and Degenerate Dimensions Shared dimension – Used by multiple facts – Dimension key is stored in the fact table – Dimension value is stored in the dimension table with other attributes of that dimension Degenerate dimension – Used by a single fact table – Dimension value is stored directly in the fact table – No corresponding dimension table
What Is a Slowly Changing Dimension? When the historical attribute values are retained if the attributes are updated Used when the organization does not want to lose track of what actually happened – Example: customer moves from Connecticut to Seattle Slowly changing dimension types: – Type 1: Attribute history is not retained – Type 2: Attribute change creates a new record – Type 3: Original attribute value recorded and latest value recorded with an effective date
DEMO Implementing a Slowly Changing Dimension
Topic: Time Dimensions
Topic: Time Dimensions Types of Time Dimensions Time Dimensions and Hierarchies
Types of Time Dimensions Based on standard calendar breakdowns – Year Month Day – Year Quarter Week Day Based on fiscal calendar – Year Fiscal Quarter Fiscal Month Fiscal Week Day Time dimension needs to contain all hierarchy elements to the lowest granularity for the fact tables
Time Dimensions and Hierarchies Establishes the “buckets” that the business uses Typically there are multiple hierarchies in the dimension – Calendar – Business Created using Microsoft Excel, scripts, or are autogenerated
DEMO Creating a Time Dimension
2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Office, Azure, System Center, Dynamics and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.