Colby Hurt Regan Large Laura Pyle Data Warehousing
16 Slides1.02 MB
Colby Hurt Regan Large Laura Pyle Data Warehousing
Goals: Understanding the path of different types of data to the data warehouse Pros and Cons of the data warehouse Overview of basic data warehouse concepts
What is a Data Warehouse? A data warehouse is a data management system that stores current and historical data from multiple sources in a business friendly manner for easier insights and reporting
Where does the data come from? EXTERNAL DATA SYSTEM ENTERED
Transactional databases prioritize gathering data
ETL Extract – pull the data from the transactional database Transform – rearrange the data Load – Load it into the DW.
ETL Example – Open Source – Cloud Based Programming Tools Python SQL Server Integration Services Java Hadoop
Example ETL is run every night, 6 days a week its incremental, Sunday it’s a full refresh.
Example
A Data Warehouse prioritizes the reporting of that data Data warehouses used for: Business intelligence (BI) Reporting Data analysis
Pros & Cons One stop shopping – centralized data repository Data formatted ready for analysis Quicker and better decisions Will not slow down transactional systems Costly to set up (money & time) Costly to maintain (money & time) It’s a big IT project and requires special skills
Overview
Conclusion Using a data warehouse keeps you from slowing down your transactional system. Companies will continue to leverage data. It's important to understand the sources of your data and the path it takes to get to your data warehouse. Creating a data warehouse is not cheap or easy.
Questions?
References: What Is a Data Warehouse? (2021, September 23). Vertabelo Data Modeler. https://vertabelo.com/blog/what-is-data-warehouse/ Wakefield, B. (2021, June 18). The Benefits Of Using Python And T-SQL Over SSIS For ETL. Medium. https://datadrivenperspectives.com/the-benefits-of-using-python-and-t-sql-over-ssis-for-etl-ca50c6e11819 Types of ETL Tools. (n.d.). Dremio. Retrieved October 29, 2023, from https://www.dremio.com/resources/guides/adv-types-etl-tools/#: :text Python%20can%20be%20used%20 instead AWS Glue - Managed ETL Service - Amazon Web Services. (n.d.). Amazon Web Services, Inc. https://aws.amazon.com/glue/