Virtual University of Pakistan Data Warehousing Lecture-31
16 Slides673.50 KB
Virtual University of Pakistan Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research www.nu.edu.pk/cairindex.asp National University of Computers & Emerging Sciences, Islamabad Email: [email protected]
Data Structures in Data Mining Data matrix – Table or database – n records and m attributes, – n m . . . C1,2 C1,3 C2,1 C2,2 C2,3 C2,m C3,1 C3,2 C3,3 C3,m Cn,1 Cn,2 Cn,3 1 S1,2 S1,3 S2,1 1 S2,3 S2,n S3,1 S3,2 1 S3,n Sn,1 Sn,2 Sn,3 Similarity matrix – Symmetric square matrix – n x n or m x m C1,1 . . . C1,m . . . Cn,m S1,n . . . 1
Main types of DATA MINING Supervised Bayesian Modeling Decision Trees Neural Networks Etc. Type and number of classes are known in advance Unsupervised One-way Clustering Two-way Clustering Type and number of classes are NOT known in advance
Clustering: Min-Max Distance Intra-cluster distances are minimized outlier Inter-cluster distances are maximized Salary 20 40 Age 60
How Clustering works?
One-way clustering example Black spots are noise INPUT OUTPUT White spots are missing data
Data Mining Agriculture data clusters INPUT Clustered OUTPUT
Classification Which class? Classifier (model) Unseen Data
How Classification work? Inputs Output Confidence Level
Classification Process (1): Model Construction Relationship between shopping time and items bought Training Data Classification Algorithms (observations, measurements, etc.) NAME Time Items Gender Moin 10 2 M Munir 16 3 M Meher 15 1 F Javed 5 1 M Mahin 20 1 F Akram 20 4 M Classifier (Model) IF time/items 6 THEN gender ‘F’
Classification Process (2): Use the Model in Prediction Classifier Testing Data NAME Time Items Gender Tahir 20 1 M Younas 11 2 M Yasin 3 1 M Unseen Data (Firdous, Time 15 Items 1) Gender?
Clustering vs. Cluster Detection
Clustering vs. Cluster Detection Example A B
The K-Means Clustering
The K-Means Clustering: Example A B 10 10 9 9 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0 0 1 2 3 4 5 6 7 8 9 0 10 10 10 9 9 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 1 2 3 4 5 6 7 8 9 10 0 0 1 2 3 4 D 5 6 7 8 9 10 0 1 2 3 4 5 6 C 7 8 9 10
The K-Means Clustering: Comment