Introductory Demo  caffeinatedrook/MovieRec/MovieRecServlet 1

21 Slides7.37 MB

Introductory Demo http://www.caffeinatedrook.com/MovieRec/MovieRecServlet 1

Problem Statement Data: User-Movie Ratings Input: User number and Movie number Output: Predicted Rating Goal: Predict Ratings with the smallest RMSD possible. (Make Customer happy.) -0.07283353 -0.07283353 0.11694841 0.11694841 -0.0622078 -0.0622078 0.081832446 0.081832446 0.049034953 0.049034953 -0.008441236 -0.008441236 0.004925302 0.004925302 0.001412398 0.001412398 0.05334269 0.05334269 2 0.001291469 0.06918105 -0.08339876 0.12175138 0.17088805 -0.1085485 -0.07531176 -0.083747916 0.12860337 4.312 / 5

Motivation – Netflix “The Netflix Prize seeks to substantially improve the accuracy of predictions about how much someone is going to love a movie based on their movie preferences. The Netflix Prize improves our ability to connect people to the movies they love.” www.netflix.com 3

Impact to Field Better Recommendations Happy Customers Happy Customers More Money Larger Market Share 4

Motivation - Personal One million of them Feature Extraction My uber-competitive nature (aka Justin’s Wife) 5

A problem with the problem statement Tackling the Netflix Challenge requires many hundreds (thousands more?) of hours of computation. Ultimately, it will require the solution to many sub-problems. Sparcity Noise Memory Requirements Movie Similarity User Similarity 6 (more on these a little later)

The Problem Statement Redefined Data: User-Movie Ratings Goal: Discover the relationships between. Movies to other movies Users to other users Movies to users 7

Related Work Netflix Prize forum http://www.netflixprize.com//community/ Lots of info on strategies people are trying. www.netflix.com www.amazon.com www.blockbuster.com www.spout.com Singular value decomposition and least squares solutions, 8 Numerische Mathematik Springer Berlin / Heidelberg Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) Predicting User Preference for Movies using NetFlix database, Dhiraj Goel and Dhruv Batra, Carnegie Mellon University The Netflix Prize, James Bennett Stan Lanning Use of KNN for the Netflix Prize, Ted Hong, Dimitris Tsamis, Stanford University How To Break Anonymity of the Netflix Prize Dataset, Arvind Narayanan, Vitaly Shmatikov

Domain Understanding The success or failure of retailers rely on matching the customer to the product. In the case of online retailers, like Netflix, recommender systems can be built to utilize the vast sums of data generated online. Netflix keeps a record for each user, containing the rating (1-5) for each film the user has rated. 9

Data Selection First, what the data does not contain. It does not contain Movie titles, directors, actors, studio, year Customer age, sex, income, favorite color Some contestants have written web-crawlers to mine this information from the web. 10

Data Selection 11 6: 2031561,1,2004-07-26 1176140,1,2004-02-16 2336133,2,2004-09-05 1521836,1,2004-08-11 117277,3,2004-10-12 326587,3,2004-09-06 1961542,3,2004-04-20 1041552,3,2004-10-19 1678346,3,2005-04-11 643182,2,2004-07-18 2182301,5,2004-08-04 2502669,2,2004-02-10 2211030,4,2004-05-26 603277,3,2004-12-13 214166,2,2005-10-09 . . 17,770 Movies 480,189 users 100,480,507 Ratings 17,770 * 480,189 8,532,958,530 100,480,507 / 8,532,958,530 0.01177 (%98.8 sparse!)

Cleaning and Preprocessing Transformed files from movie-view to user-view. 2) Normalized user ratings via Z-Score normalization. 1) 12

Discovering Patterns Which Software to use? SPSS, SAS, Weka? 8,532,958,530 ratings * 4 bytes / rating 34,131,834,120 bytes 33,331,869 kilobytes 32,550 megabytes 31 gigabytes Too big to hold the entire matrix Too big to hold condensed matrix Too “stupid” to manage memory without paging. 13

Discovering Patterns Which feature selection method to use? Principle Component Analysis Singular Value Decomposition Multifactor Dimensionality Reduction Latent Semantic Analysis 14

Discovering Patterns M 17,770 * 25 D 17,770 * 480,189 444,250 12,004,725 12,448,975 8,532,958,530 Movie: a User: b vab i(Uai x Mbi) U 25 * 480,189 15 1000 .001 1c/5h 25c / 5

A little board work to explain the algorithm 16

Interpretation: Feature 1-movie view Trailer Park Boys: Season 3 Trailer Park Boys: Season 4 The Lord of the Rings: The Fellowship of the Ring: Extended Edition Lord of the Rings: The Return of the King: Extended Edition Lord of the Rings: The Two Towers: Extended Edition Lost: Season 1 Veronica Mars: Season 1 House 4 As Time Goes By: Series 9 3 Gilmore Girls: Season 4 Sweet Potato Pie Legion of the Dead Dark Town Comedy Only in Da Hood Predator Island Bad Bizness Vampiyaz My Big Phat Hip Hop Family Jack O'Lantern Desperate Souls 2 1 0 -1 -2 17

Interpretation: Feature 2-movie view Lost in Translation National Lampoon's Mr. Wong Without You I'm Nothing Punch-Drunk Love Dogville The Royal Tenenbaums Whiteboyz Pornografia Spooks & Creeps Kaaterskill Falls 1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 18 -1 Dragon Ball Z: World Tournament Dragon Ball: Piccolo Jr. Saga: Part 2 Dragon Ball: Tien Shinhan Saga Dragon Ball Z: Fusion Dragon Ball: Red Ribbon Army Saga Dragon Ball Z: Garlic Jr. Dragon Ball: Piccolo Jr. Saga: Part 1 Armageddon Dragon Ball: The Path to Power Pearl Harbor

Interpretation: Feature 3-movie view Nostradamus: A Voice from the Past 1.5 Absolution 1 Ozzy Osbourne: Double O: Unauthorized Monster-a-Go-Go! 0.5 Dark Harvest 2: The Maize 0 Jessica: A Ghost Story Vanilla Sky -0.5 Ivan Vasilievich: Back to the Future -1 American Beauty Still Bout It -1.5 WWE: Rebellion 2002 Battle Athletes: Vol. 3: Go Sailor Moon: Vol. 10: The Trouble With Rini Battle Athletes Victory: Vol. 7: The Last Dance Battle Athletes Victory: Vol. 1: Training Battle Athletes Victory: Vol. 8: The Human Race! ECW: Extreme Evolution: Extreme Championship Wrestling Battle Athletes Victory: Vol. 6: Willpower Fushigi Yugi: The Mysterious Play: Eikoden 19 Lupin the 3rd: Dead or Alive

Interpretation: Nearest Neighbors 18 components Find the nearest neighbors using Euclidean (q 2) distance 20 q 1 1. American Beauty (1999) 2. Fight Club (1999) 3. Reservoir Dogs (1992) 4. Mystic River (2003) q 3 1. American Beauty (1999) 2. Mystic River (2003) 3. Fight Club (1999) 4. Traffic (2000) q 2 1. American Beauty (1999) 2. Fight Club (1999) 3. Mystic River (2003) 4. Reservoir Dogs (1992) q 4 1. American Beauty (1999) 2. Mystic River (2003) 3. Fight Club (1999) 4. Traffic (2000)

Demo – Name a movie! http://www.caffeinatedrook.com/MovieRec/MovieRecServlet 21

Back to top button