Generalizing Linear Discriminant Analysis
37 Slides873.90 KB
Generalizing Linear Discriminant Analysis
Linear Discriminant Analysis Objective -Project a feature space (a dataset n-dimensional samples) onto a smaller -Maintain the class separation Reason -Reduce computational costs -Minimize overfitting
Linear Discriminant Analysis Want to reduce dimensionality while preserving ability to discriminate Figures from [1]
Linear Discriminant Analysis Could just look at means and find dimension that separates means most: Equation from [1]
Linear Discriminant Analysis Could just look at means and find dimension that separates means most: Equations from [1]
Linear Discriminant Analysis Figure from [1]
Linear Discriminant Analysis Fisher’s solution.
Linear Discriminant Analysis Fisher’s solution Scatter: Equation from [1]
Linear Discriminant Analysis Fisher’s solution Scatter: Maximize: Equations from [1]
Linear Discriminant Analysis Fisher’s solution Figure from [1]
Linear Discriminant Analysis How to get optimum w*?
Linear Discriminant Analysis How to get optimum w*? Must express J(w) as a function of w. Equation from [1]
Linear Discriminant Analysis How to get optimum w*8 Equation from [1]
Linear Discriminant Analysis How to get optimum w* Equations modified from [1]
Linear Discriminant Analysis How to get optimum w* Equation from [1]
Linear Discriminant Analysis How to get optimum w* Equation from [1]
Linear Discriminant Analysis How to get optimum w* Equations from [1]
Linear Discriminant Analysis How to generalize for 2 classes: -Instead of a single projection, we calculate a matrix of projections.
Linear Discriminant Analysis How to generalize for 2 classes: -Instead of a single projection, we calculate a matrix of projections. -Within-class scatter becomes: -Between-class scatter becomes: Equations from [1]
Linear Discriminant Analysis How to generalize for 2 classes Here, W is a projection matrix. Equation from [1]
Linear Discriminant Analysis Limitations of LDA: -Parametric method -Produces at most (C-1) projections Benefits of LDA: -Linear Decision Boundaries Human interpretation Implementation -Good classification results
Flexible Discriminant Analysis
Flexible Discriminant Analysis -Turns the LDA problem into a linear regression problem.
Flexible Discriminant Analysis -Turns the LDA problem into a linear regression problem. -“Differences between LDA and FDA and what criteria can be used to pick one for a given task?” (Tavish)
Flexible Discriminant Analysis -Turns the LDA problem into a linear regression problem. -“Differences between LDA and FDA and what criteria can be used to pick one for a given task?” (Tavish) Linear regression can be generalized into more flexible, nonparametric forms of regression. (Parametric – mean, variance )
Flexible Discriminant Analysis -Turns the LDA problem into a linear regression problem. -“Differences between LDA and FDA and what criteria can be used to pick one for a given task?” (Tavish) Linear regression can be generalized into more flexible, nonparametric forms of regression. (Parametric – mean, variance ) Expands the set of predictors via basis expansions
Flexible Discriminant Analysis Figure from [2]
Penalized Discriminant Analysis
Penalized Discriminant Analysis -Fit an LDA model, but ‘penalize’ the coefficients to be more smooth. Directly curbing ‘overfitting’ problem
Penalized Discriminant Analysis -Fit an LDA model, but ‘penalize’ the coefficients to be more smooth. Directly curbing ‘overfitting’ problem Positively correlated predictors lead to noisy, negatively correlated coefficient estimates, and this noise results in unwanted sampling variance. Example: images
Penalized Discriminant Analysis Images from [2]
Mixture Discriminant Analysis
Mixture Discriminant Analysis -Instead of enlarging (FDA) the set of predictors, or smoothing the coefficients (PDA) for the predictors, and using one Gaussian:
Mixture Discriminant Analysis -Instead of enlarging (FDA) the set of predictors, or smoothing the coefficients (PDA) for the predictors, and using one Gaussian: -Model each class as a mixture of two or more Gaussian components. -All components sharing the same covariance matrix
Mixture Discriminant Analysis Image from [2]
Sources 1. Gutierrez-Osuna, Ricardo– “CSCE 666 Pattern Analysis – Lecture 10” http:// research.cs.tamu.edu/prism/lectures/pr/pr l10.pdf 2. Hastie , Trever, et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 3. Raschka, Sebastian - “Linear Discriminant Analysis bit by bit” http://sebastianraschka.com/Articles/2014 python lda.html
END.