Knowledge Transfer via Multiple Model Local Structure Mapping
17 Slides467.00 KB
Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao† Wei Fan‡ Jing Jiang†Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM T. J. Watson Research Center
Standard Supervised Learning training (labeled) test (unlabeled) 85.5% Classifier New York Times New York Times 2/17
In Reality training (labeled) test (unlabeled) 64.1% Classifier Labeled data not Reuters available! New York Times New York Times 3/17
Domain Difference Performance Drop train test ideal setting NYT Classifier New York Times NYT 85.5% New York Times realistic setting Reuters Reuters Classifier NYT 64.1% New York Times 4/17
Other Examples Spam filtering – Public email collection personal inboxes Intrusion detection – Existing types of intrusions unknown types of intrusions Sentiment analysis – Expert review articles blog review articles The aim – To design learning methods that are aware of the training and test domain difference Transfer learning – Adapt the classifiers learnt from the source domain to the new domain 5/17
All Sources of Labeled Information test (comple tely unlabel ed) training (labeled) Reuters ? Classifier New York Times Newsgroup 6/17
A Synthetic Example Training Test (have conflicting concepts) Partially over lapping 7/17
Goal Source Domain Source Target Domain Domain Source Domain To unify knowledge that are consistent with the test domain from multiple source domains 8/17
Summary of Contributions Transfer from multiple source domains – Target domain has no labeled examples Do not need to re-train – Rely on base models trained from each dom ain – The base models are not necessarily develo ped for transfer learning applications 9/17
Locally Weighted Ensemble f i ( x, y ) P (Y y x, Ci ) Training set 1 Training set 2 C1 f 1 ( x, y ) X-feature value y-class label w1 ( x) C2 f 2 ( x, y ) w2 ( x) Test example x k E f ( x, y ) wi ( x) f i ( x, y ) i 1 k Training set k Ck wk (x) i w ( x) 1 f k ( x, y ) i 1 y x arg max y f E ( x, y ) 10/17
Optimal Local Weights Higher Weight C1 0.9 0.1 Test example x C2 0.4 0.6 Optimal weights – Solution to a regression problem – Impossible to get since f is unknown! 11/17 0.8 0.2
Graph-based Heuristics Higher Weight Graph-based weights approximation – Map the structures of a model onto the structure s of the test domain – Weight of a model is proportional to the similarity between its neighborhood graph and the clusteri ng structure around x. 12/17
Data Sets Experiments Setup – Synthetic data sets – Spam filtering: public email collection personal inboxes (u01, u02, u03) (ECML/PKDD 2006) – Text classification: same top-level classification problems with different sub-fields in the training and test sets (Newsgroup, Re uters) – Intrusion detection data: different types of intrusions in training and test sets. Baseline Methods – – – – – One source domain: single models (WNN, LR, SVM) Multiple source domains: SVM on each of the domains Merge all source domains into one: ALL Simple averaging ensemble: SMA Locally weighted ensemble: LWE 13/17
Experiments on Synthetic Data 14/17
1 Experiments on Real Data 0.9 WNN LR SVM SMA LWE 0.8 0.7 0.6 0.5 1 Spam Newsgroup Reuters 0.9 Set 1 Set 2 ALL SMA LWE 0.8 0.7 0.6 0.5 DOS Probing R2L 15/17
Conclusions Locally weighted ensemble framework – transfer useful knowledge from multiple s ource domains Graph-based heuristics to compute we ights – Make the framework practical and effecti ve 16/17
Thanks! Any questions? Welcome to our poster tonight! http://www.ews.uiuc.edu/ jinggao3/kdd08transfer.htm [email protected] 17/17