studymafia Seminar On Data Mining Submitted To: studymafia studymafia

22 Slides1.13 MB

www.studymafia.org Seminar On Data Mining Submitted To: www.studymafia.org www.studymafia.org Submitted By:

Content Data Mining Data Mining Definition Data Mining – Two Main Components Data Mining vs. Data Analysis What is (not) Data Mining? Related Fields Data Mining Process Major Data Mining Tasks Uses of Data Mining Sources of Data for Mining Challenges of Data Mining Advantages Conclusion Reference

Data Mining New buzzword, old idea. Inferring new information from already collected data. Traditionally job of Data Analysts Computers have changed this. Far more efficient to comb through data using a machine than eyeballing statistical data.

Data Mining Definition Data mining in Data is the non-trivial process of identifying valid novel potentially useful and ultimately understandable patterns in data.

Data Mining vs. Data Analysis In terms of software and the marketing thereof Data Mining ! Data Analysis Data Mining implies software uses some intelligence over simple grouping and partitioning of data to infer new information. Data Analysis is more in line with standard statistical software (ie: web stats). These usually present information about subsets and relations within the recorded data set (ie: browser/search engine usage, average visit time, etc. )

What is (not) Data Mining? What is not Data Mining? Look up phone number in phone directory Query a Web search engine for information about “Amazon” What is Data Mining? Certain names are more prevalent in certain US locations (O’Brien, O’Rurke, O’Reilly in Boston area) Group together similar documents returned by search engine according to their context (e.g. Amazon rainforest, Amazon.com,)

Data Mining Techniques Classification Clustering Regression Association Rules

Why Mine Data? Scientific Viewpoint Data collected and stored at enormous speeds (GB/hour) o remote sensors on a satellite o telescopes scanning the skies o microarrays generating gene expression data o scientific simulations generating terabytes of data Traditional techniques infeasible for raw data Data mining may help scientists o in classifying and segmenting data o in Hypothesis Formation

Data Mining Architecture

Related Fields Machine Learnin g Visualization Data Mining and Knowledge Discovery Statistics Databases

Data Mining Process Integration Da t Se & lec Cl tio ea n ni ng DATA Ware house ns fo rm at g Patterns and Rules Transformed Data Target Data ion Knowledge Understanding Ra wD ata Tr a aM ini n Interpretation Knowledge & Evaluation

Major Data Mining Tasks Classification: predicting an item class Associations: e.g. A & B & C occur frequently Visualization: to facilitate human discovery Estimation: predicting a continuous value Deviation Detection: finding changes Link Analysis: finding relationships.

Uses of Data Mining AI/Machine Learning Combinatorial/Game Data Mining Good for analyzing winning strategies to games, and thus developing intelligent AI opponents. (ie: Chess) Business Strategies Market Basket Analysis Identify customer demographics, preferences, and purchasing patterns. Risk Analysis Product Defect Analysis Analyze product defect rates for given plants and predict possible complications (read: lawsuits) down the line.

Uses of Data Mining (Cont.) User Behavior Validation Fraud Detection In the realm of cell phones Comparing phone activity to calling records. Can help detect calls made on cloned phones. Similarly, with credit cards, comparing purchases with historical purchases. Can detect activity with stolen cards.

Uses of Data Mining (Cont.) Health and Science Protein Folding Predicting protein interactions and functionality within biological cells. Applications of this research include determining causes and possible cures for Alzheimers, Parkinson's, and some cancers (caused by protein "misfolds") Extra-Terrestrial Intelligence Scanning Satellite receptions for possible transmissions from other planets. For more information see Stanford’s Folding@home and SETI@home projects. Both involve participation in a widely distributed computer application.

Sources of Data for Mining Databases (most obvious) Text Documents Computer Simulations Social Networks

Advantages of Data Mining Marketing / Retail Finance / Banking Manufacturing Governments

Challenges of Data Mining Scalability Dimensionality Complex and Heterogeneous Data Data Quality Data Ownership and Distribution Privacy Preservation Streaming Data

Conclusion Comprehensive data warehouses that integrate operational data with customer, supplier, and market information have resulted in an explosion of information. Competition requires timely and sophisticated analysis on an integrated view of the data. However, there is a growing gap between more powerful storage and retrieval systems and the users’ ability to effectively analyze and act on the information they contain.

Reference www.google.com www.wikipedia.com www.studymafia.org

Thanks

Queries?

Back to top button