Predictive Analysis with SQL Server 2008
25 Slides2.51 MB
Predictive Analysis with SQL Server 2008
Agenda Data Mining enabling Predictive Analysis The Value of Predictive Analysis SQL Server 2008 Predictive Analysis Complete Predictive Analysis Integrated Predictive Analysis Extensible Predictive Analysis
Data Mining enabling Predictive Analysis Role of Software Data Mining Proactive Predictive Analysis Interactive OLAP Ad-Hoc Reporting Passive Canned Reporting Business Insight Presentation Exploration Discovery
The Value of Predictive Analysis Inform Common Business Decisions with Actionable Insight Estimate Survey Results Funnel Marketing Campaigns Seek Profitable Customers Predictive Analysis Predict Sales & Inventory Anticipate Customer Churn Understand Customer Needs
SQL Server 2008 Predictive Analysis Part of SQL Server 2008 Analysis Services Pervasiv e Delivery through Microso ft Office Compre hensive Develop ment Environ ment Enterpri se Grade Capabili ties Rich and Innovati ve Algorith ms Com plet e Native Repor ting Integra tion InFlight Mining during Data Integra tion Insigh tful Analysi s Predic tive KPIs Inte grat ed Pred ictiv e Prog ram ming Cust om Algo rith ms and Visu aliza tions Exte nsibl e
Complete Predictive Analysis Pervasive Delivery through Microsoft Office Comprehensive Empower all users with predictive analysis capabilities Enable advanced users with more validation and control Collaborative Share analysis through interactive graphical visualizations Share insight with clear and prompt publishing capabilities Intuitive Enable complex data mining through simple, automated tasks Reduce the learningcurve with a familiar environment Deliver actionable insight with clear graphical visualizations
Data Mining Add-Ins for Microsoft Office 2007 DIG for Insight at your Desktop Define Data Identify Task Get Results “What Microsoft has done is to make data mining available on the desktop to everyone” - David Norris, Associate Analyst, Bloor Research
Data Mining Add-In for Microsoft Office 2007 Full Development Lifecycle within Excel Data Preparation – Explore, clean and set up your data for data mining Data Modeling – Build patterns and trends from data to make predictions Accuracy and Validation – Test and validate your model Model Usage & Management – Browse, modify, and manage existing mining models that are stored on an instance of Analysis Services Documentation – Trace your actions as Data Mining Extensions (DMX) statements or as Analysis Services Scripting Language (ASSL).
Complete Predictive Analysis Comprehensive Development Environment Intuitive Data Mining Wizard Graphic Data Mining Designer Visual & Statistical Validation – Cross-validation – Lift charts – Profit charts Easy and Efficient Access to Source Data – Caching – Filtering – Aliasing
Complete Predictive Analysis Enterprise Grade Capabilities Superior Performance and Security Robust High Availability Scalability Features Rapid Development Enhanced Manageability Analysis Services
Complete Predictive Analysis Rich and Innovative Algorithms Algorithms to solve common business problems Innovative Innovative Algorithm s from Microsoft Research Tradition al Algorith ms such as ARIMA Broad Range of Choices to Build Optimal Models Market Basket Analysis Churn Analysis Market Segment Analysis Forecasting Data Exploration Unsupervised Learning Web Site Analysis Campaign Analysis Information Quality Text Analysis
Integrated Predictive Analysis Native Reporting Integration Create reports that include prediction Build reports using data mining queries as your data source Access visual prediction Query Builder directly within Report Designer Generate parameter-driven reports based on predictive probability – For example, present high-risk customers Probability to churn is over 65%
Integrated Predictive Analysis In-Flight Data Mining During Data Integration Enhance ETL: – Flag anomalous data – Classify business entities – Identify missing values – Perform text mining Extend SQL Server Integration Services: – Score rows with Data Mining Query transformations – Train mining models with Data Mining Training destinations
Integrated Predictive Analysis Insightful Analysis Use the OLAP cube for data mining – Include data mining results as dimensions in OLAP cubes – Include prediction functions in calculations and KPIs
Integrated Predictive Analysis Predictive KPIs Integration with Microsoft Office PerformancePoint Server 2007 Combine predictive and retrospective KPIs for more insightful dashboards – Forecast future performance against targets to anticipate potential challenges – Discover and monitor trends in key influencers
Extensible Predictive Analysis Predictive Programming Autom Autom atic atic Data Data Mining Mining Patter Patter n n Explor Explor ation ation Incorporate predictive analysis into your business applications through comprehensive APIs Pred ictio n ? Create a built-in recommendation engine Update models based on most recent data Warn for flawed data Display leading on-the-fly indicators for factors/metrics Identify profile for churning/high Recommend relevant value customers products Anticipate customer risk/churn Focus promotions on customers with a high expected lifetime value
Extensible Predictive Analysis Data Mining APIs Plug-in Algorithms Visualizations PMML XMLA Industry standard metadata Data mining Extensions (DMX) ADOMD.NET and OLE DB AMO Add custom data mining algorithms D N E T EX Redistributable Viewer - embed standard visualizations in your application Plug-in Viewer APIs - embed custom visualizations in your application Exchange models with other software vendors D E B M E SQL-like query language Access and query models from clients or stored procedures Management interfaces
ABS-CBN Interactive (ABSi) Subsidiary of the largest integrated media and entertainment company in the Philippines Wireless Services Firm Doubles Response Rates with SQL Server 2005 Data Mining Challenge Solution Benefit Selling custom ring More accurate and ABSi tones and other personalized downloadable deployed service content for mobile Microsoft recommendations phone users to customers SQL Server requires staying in Doubling response tune with the 2005 to use rates from market. Searching its data marketing transactional data campaigns mining for hints on what to Ad hoc reporting in feature to offer users in crossminutes, not days selling value-added Eight times faster determine mobile services took mining productthat we could double data “Our is very impressed our response days and management didn’t process provide rate customerthrough our SQL Server recommenda 2005 data mining managers of other Faster data mining specific services ask us to provide the same magic for them—which is what we prediction tions. recommendations. will do with the full project rollout” - Grace Cunanan, Technical Specialist, ABS-CBN Interactive
More Data Mining Customers .8 TB SS2005 DW for Ring-Tone Marketing Uses Relational, OLAP and Data Mining 5 TB DW, serving the 2nd largest global HMO with over 3000 OLAP users. Developed data mining solution to identify members who would most benefit from proactive intervention to prevent health deterioration. 3 TB end-to-end BI decision support system Oracle competitive win End-to end DW on SQL Server, including OLAP Extensive use of Data Mining Decision Trees 1.2 TB, 20 billion records Large Brazilian Grocery Chain .88 TB DW at main TV network in Italy Increased viewership by understanding trends .5 TB DW at US Cable company End to end BI, Analysis and Reporting
Summary Complete Pervasive Delivery through Microsoft Office empowers all users with predictive insight Comprehensive Development Environment delivers an intuitive and rich environment Enterprise Grade Capabilities provide enhanced server advantages Rich and Innovative Algorithms support common business problems effectively Integrated Native Reporting Integration seamlessly infuses prediction into reports In-Flight Mining during Data Integration dynamically enhances data quality & relevance Insightful Analysis enables to slice data by the hidden patterns within Predictive KPIs extend monitoring with insights to future performance Extensible Predictive Programming embeds prediction within the application Custom Algorithms & Visualizations provide the flexibility to meet uncommon needs
2004 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.
What’s New in SQL Server 2008? Enhanced Mining Structures – – – – – Split data into training and testing partitions more effectively Query against structure data to present complete information beyond the scope of the model Build models over filtered data Create incompatible models within the same structure Use cross-validation to: Test multiple models simultaneously Confirm the stability of results given more or less data Better Time Series Support – – – Accuracy & Stability Combine best of both worlds blending ARTXP for optimized near-term predictions and ARIMA for stable long term predictions Prediction Flexibility Build a forecasting model on one series and apply the patterns to data from another series. What If Anticipate the impact of changes in near-term future values, on long-term forecasts More Data Mining Add-Ins for Office 2007 – – New Analysis Tools Generate interactive forms for scoring new cases with Prediction Calculator Discover the relationship between items, which are frequently purchased together with Shopping Basket Analysis New Query and Validation Tools Choose training and test sets from mining structures Render richly-formatted cross validation and accuracy reports in Excel Leverage model documentation for reference and collaboratio n
Data Mining Algorithms Algorithm Description Decision Trees Calculates the odds of an outcome based on values in a training set Association Rules Helps identify relationships between various elements. Naïve Bayes Clearly shows the differences in a particular variable for various data elements Sequence Clustering Groups or clusters data based on a sequence of previous events Time Series Analyzes and forecasts time-based data combining the power of ARIMA for long-term prediction and the power of ARTXP (developed by Microsoft Research) for short-term prediction. Together optimizing prediction accuracy Neural Nets Seeks to uncover non-intuitive relationships in data Text Mining Analyzes unstructured text data Linear Regression Determines the relationship between columns in order to predict an outcome Logistic Regression Determines the relationship between columns in order to evaluate the probability that a column will contain a specific state
Data Mining Architecture Data Mining Structures – Define the data columns used for analysis Data Mining Models – Apply data mining algorithms to the data structures to: Predict values Identify clusters Find patterns and associations
Clalit Health Services Data Mining Helps Clalit Preserve Health and Save Provides health care for 3.7 million insured members, representing about 60 percent of Israel’s population Lives Challenge Solution Benefit Use sociodemographic and medical records to generate a predictive score, A chance to Identify which identifying elder preserve life and members would members with enhance life most benefit from highest risk for quality health proactive Reduced health deterioration intervention to care costs Once identified, prevent health Tightly integrated physicians can try deterioration solution to involve these in patients that the data mining model “Providing physicians withpatients a list of proactive predicts are at risk of health deterioration over the next year, gives treatment plans to them the opportunity to intervene, and prevent what has been prevent health predicted.” deterioration - Mazal Tuchler, Data Warehouse Manager , Clalit Health Services