Data Analytics using SQL Server tools Week 1

88 Slides4.82 MB

Data Analytics using SQL Server tools Week 1

Personal Introduction Education MSSE Software Engineering Work Experience 9 Year’s of Experience

Course Instructors and TA

Installation of the Environment

Your turn to introduce!! Name Educational Background Professional Background Expectations from this course

SETTING EXPECTATIONS This is a High level Intro to SQL Server Tools and Data Analytics. We shall cover all major SQL Server tools on-premise and in Azure cloud to help you kick start your journey in the Data Analytics domain This course will be your first step on the journey to become a Data Analyst We encourage the participants to complete this course and take our advanced courses on Data Warehousing and Big Data Our goal is to get you excited about Data Analytics We encourage you to go through the entire data analysis development cycle using the industry best practices and jump start your journey to becoming a world class Data Analyst

COURSE OUTLINE 1 SQL basics to advanced 2 SSIS Basics 3 Data Modeling 4 SSIS Advanced 5 Azure Data Factory 6 Power BI Step-by-step guide to learn SQL from basics to advanced level for Data Analytics Connecting to variety of data source, ETL operations like data cleaning, shaping, transforming, appending, merging etc What is data modeling and how to build a data warehouse using SQL and SSIS SSIS configuration settings, variables, and deployment of SSIS packages parameters Using Azure data Factory to perform everything we learned onpremise to perform in the cloud Learning DAX to perform Data Visualization and Reporting using Power BI

Time spent in class 1 SQL basics to advanced Week 1 and 2 Week 3 to 6 2 SSIS Basics Week 3 to 6 3 Data Modeling Week 3 to 6 4 SSIS Advanced Week 3 to 6 5 Azure Data Factory Week 7 and 8 6 Power BI

What is the Data Analytics Ecosystem?

Data is the new Oil?

Oil Ecosystem Source Oil Refinery

Oil Ecosystem Source Oil Refinery

Oil Ecosystem Extract Ingest Source Process Oil Refinery Enrich Serve

Data Ecosystem Extract Ingest Source Process Data Factory Enrich Serve

Data Ecosystem Extract Ingest Source Process Data Factory Enrich Serve

Data Ecosystem Extract Ingest Process Data Factory Source Scheduled / Event Triggered Automated Enrich Serve

Data Ecosystem Extract Process Ingest Data Factory Source Scheduled / Event Triggered Automated Data Cleaning Data Transformation Ad-hoc querying Data Modeling Data warehousing Enrich Serve

Data Ecosystem Extract Process Ingest Enrich Data Factory Source Scheduled / Event Triggered Automated Data Cleaning Data Transformation Ad-hoc querying Data Modeling Data warehousing Machine Learning Data Science Artificial Intelligence Deep Learning Serve

Data Ecosystem Extract Process Ingest Enrich Data Factory Source Scheduled / Event Triggered Automated Data Cleaning Data Transformation Ad-hoc querying Data Modeling Data warehousing Machine Learning Data Science Artificial Intelligence Deep Learning Serve

Data Ecosystem Extract Process Ingest Enrich Data Factory Source Scheduled / Event Triggered Automated Data Cleaning Data Transformation Ad-hoc querying Data Modeling Data warehousing Security Data Governance Machine Learning Data Science Artificial Intelligence Deep Learning Serve

Data Ecosystem Extract Process Ingest Enrich Data Factory Source Scheduled / Event Triggered Automated Data Cleaning Data Transformation Ad-hoc querying Data Modeling Data warehousing Machine Learning Data Science Artificial Intelligence Deep Learning Serve

Data Ecosystem Extract Process Ingest Enrich Data Factory Source Scheduled / Event Triggered Automated Data Cleaning Data Transformation Ad-hoc querying Data Modeling Data warehousing Machine Learning Data Science Artificial Intelligence Deep Learning Serve

Data Ecosystem Extract Process Ingest DWH Engineer DWH Specialist Enrich Data Factory Source Scheduled / Event Triggered Automated Data Cleaning Data Transformatio Ad-hoc querying Data Modeling Data warehousing Machine Learning n Data Science Artificial Intelligence Deep Learning Serve

Data Ecosystem Extract Process Ingest BI Developer BI Engineer Enrich Data Factory Source Scheduled / Event Triggered Automated Data Cleaning Data Transformatio Ad-hoc querying Data Modeling Data warehousing Machine Learning n Data Science Artificial Intelligence Deep Learning Serve

ML Engineer Deep Learning Engineer Data Ecosystem Extract Process Ingest Enrich Data Factory Source Scheduled / Event Triggered Automated Data Cleaning Data Transformation Ad-hoc querying Data Modeling Data warehousing Machine Learning Data Science Artificial Intelligence Deep Learning Serve

Data Analytics Ecosystem

Data Analytics Life Cycle Implementation

Data Life Cycle in AWS

Data Life Cycle in Azure

Data Life Cycle in Google Cloud

Data Analytics Eco-system on Azure

ET L Extract: Extraction refers to pulling the source data from the original database or data source. With ETL, the data goes into a temporary staging area. Transform: Transformation refers to the process of changing the structure of the information, so it integrates with the target data system and the rest of the data in that system. Load: Loading refers to the process of depositing the information into a data storage system (mostly a data warehouse). ETL first extracts data from homogeneous or heterogeneous data sources. Next, it deposits the data into a staging area. From there, the data goes through a cleansing process, gets enriched and transformed, and is finally stored in a data warehouse.

ET L

EL T ELT is a data integration process that transfers data from a source system into a target system without business logic-driven transformations on the data. Extraction: Raw data is extracted from various sources, such as applications, SaaS, or databases. Loading: Data is delivered directly to the target system (normally a data lake) – typically with schema and data type migration factored into the process. Transformation: The target platform can then transform data for reporting purposes.

EL T

Data Analysis Development Methodology

What is a Database? It is a container of data It could represent one or more business applications It could be a line of business application or a reporting system Usually accessed on-premise or over the web

What is a Database?

Table s

Table s

Data Types

Data Types

Data Types

Data Types

Data Types

Data Types

Some database related terminologies Table, Entity, Relation, (similar to an Excel Worksheet) Row, Record, Instance Column, Field, Attribute Primary Key – unique and mandatory Foreign Key – a cross- reference between tables because it references the primary key of another table Relationship – created though foreign keys

RDBM S A relational “database” management system (RDBMS) organizes data The logical structure of the database is based upon the information needs of an organization Entities (“things” of interest to the organization), AND Relationships (how the Entities are associated with each other)

Advantages of RDBMS Establish a centralized, logical view of data Minimizes data duplication (i.e. “redundancy”) Promote data accuracy and integrity Capacity of database Superior multi-user or concurrent access Security Retrieve information quickly Inter-operability

What is SQL? Single Table Queries Introduction to SQL Server Databases Module 2 The Big 6 SELECT statement FROM statement Expressions Functions Nulls TOP and DISTINCT Examples Assignment

What is SQL? SQL stands for Structured Query Language SQL is pronounced S-Q-L or sequel SQL is a standard language for managing, manipulating and querying databases Developed at IBM in the early 1970’s In 1986, ANSI and ISO standard groups officially adopted the standard “Database Language SQL” definition Most SQL databases have their own proprietary extensions in addition to the SQL standard SQL is the language used to ask questions (query) of a database which will return answers (results)

Simple Accessible Applicable Powerful Pervasive Valuable Universal Why SQL?

Components of SQL? SQL consists of three components which offer everything required to manage, maintain and use a database 1. Data Definition Language 2. Data Manipulation Language 3. Data Control Language

Microsoft Access Flavors of SQL? https://products.office.com/en- ca/access Microsoft SQL Server https://www.microsoft.com/en- us/sql-server/sql-server-2016 MariaDB, MySQL https://mariadb.org/ https://www.mysql.com/ Postgresql https://www.postgresql.org/ Oracle https://www.oracle.com/database/ Hadoop, Spark, Hive, Pig https://hadoop.apache.org

Data Manipulation Language (DML)is used to manipulate data within a table This component There are four main commands: SELECT To select rows of data from a table INSERT To insert rows of data into a table UPDATE To change rows of data in a table DELETE To remove rows of data from a table

The Big 6 SELECT SELECT ColumnName FROM FROM TableName WHERE WHERE LogicalCondition GROUP BY GROUP BY ColumnName HAVING HAVING LogicalCondition ORDER BY ORDER BY ColumnName

The SELECT statement

FROM clause

SELECT Examples SELECT * FROM FactInternetSales; SELECT ProductKey, ProductAlternateKey, Color FROM DimProduct; SELECT EmployeeKey, FirstName, MiddleName, LastName, Title, EmailAddress, Phone FROM DimEmployee

SELECT : Aliasing SELECT ProductKey, ProductAlternateKey, EnglishProductName AS ProductName, Color FROM DimProduct; SELECT EmployeeKey AS ID, FirstName, MiddleName, LastName, Title, EmailAddress, Phone FROM DimEmployee

SSMS Tricks

SELECT : Top SELECT Top (10) ProductKey, ProductAlternateKey, EnglishProductName AS ProductName, Color FROM DimProduct; SELECT Top (15) EmployeeKey AS ID, FirstName, MiddleName, LastName, Title, EmailAddress, Phone FROM DimEmployee

Concatenation SELECT : Expressions SELECT col1 col2 FROM TableName Mathematical Expressions SELECT intCol1 * intCol2 FROM Table Name

Concatenation SELECT : Expressions SELECT EmployeeKey, FirstName ' ' LastName AS Name, Title FROM DimEmployee; Mathematical Expressions SELECT EmployeeKey, FirstName ' ' LastName AS Name, Title, VacationHours SickLeaveHours AS TimeOff, BaseRate * 40 AS WeeklyPay FROM DimEmployee;

WHERE statement

Operator s

LIKE Operator

Wildcard Examples

Using NOT Complex WHERE Statements Using NOT with IN Using NOT with LIKE Using NOT with AND & OR Handling NULL Using ISNULL() Using IS NULL

GROUP BY clause

HAVING clause

ORDER BY clause

Recap Big 6

SQL Order of Operations FROM WHERE GROUP BY HAVING SELECT ORDER BY

Data Wrangling / Data Cleaning Cleaning Strings LEFT / RIGHT LEN REPLACE REPLICATE SUBSTRING TRIM CHARINDEX Handling Date / Time Removing Nulls Finding duplicates Removing duplicates

String Functions SELECT : Functions LEFT() The LEFT() function extracts a number of characters from a string (starting from left). LOWER() The LOWER() function converts a string to lower-case. Replace() The REPLACE() function replaces all occurrences of a substring within a string, with a new substring. REPLACE(string, o l d s t r i n g , new string) Replicate() The REPLICATE() function repeats a string a specified number of times. REPLICATE(string, integer) Len() The LEN() function returns the length of a string.

String Functions CHARINDEX() SELECT : Functions The CHARINDEX() function searches for a substring in a string, and returns the position. CHARINDEX(substring, string, start) TRIM() The TRIM() function removes the space character OR other specified characters from the start or end of a string. By default, the TRIM() function removes leading and trailing spaces from a string. TRIM([characters FROM ]string) SUBSTRING() The SUBSTRING() function extracts some characters from a string. SUBSTRING(string, start, length) CONCAT() The CONCAT() function adds two or more strings together. CONCAT(string1, string2, ., string n)

Date Functions GETDATE() SELECT : Functions The GETDATE() function returns the current database system date and time, in a 'YYYY-MM-DD hh:mm:ss.mmm' format. DATEPART() The DATEPART() function returns a specified part of a date. DATEPART(interval, date ) MONTH() The MONTH() function returns the month part for a specified date (a number from 1 to 12). DATEADD() The DATEADD() function adds a time/date interval to a date and then returns the date. DATEADD(interval, number, date ) DATEDIFF() The DATEDIFF() function returns the difference between two dates. DATEDIFF(interval, date1, date2)

Datatype Functions CAST() SELECT : Functions The CAST() function converts a value (of any type) into a specified datatype. CAST(expression AS d a t a ty p e ( l e n g t h ) ) CONVERT() The CONVERT() function converts a value (of any type) into a specified datatype. CONVERT(data type(length), expression, s t y l e ) https://docs.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transactsql?view sql-server-ver15 FORMAT() The FORMAT() function formats a value with the specified format https://docs.microsoft.com/en-us/sql/t-sql/functions/format-transact-sql?view sqlserver-ver15

NUL L NULL means Unknown Use ISNULL() to change to something else Any operation performed with NULL returns NULL ISNULL() The ISNULL() function returns a specified value if the expression is NULL. ISNULL(expression, value) COALESCE() The COALESCE() function returns the first non-null value in a list. COALESCE(val1, v a l 2 , . . . . , val n)

SELECT : DISTINCT SELECT EnglishProductName, Color FROM DimProduct; SELECT DISTINCT EnglishProductName, Color FROM DimProduct;

Types of Joins

Types of Joins

Types of Joins

Types of Joins

Types of Joins

Back to top button