Knowledge Transfer via Multiple Model Local Structure Mapping

Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao† Wei Fan‡ Jing Jiang†Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM T. J. Watson Research Center

Standard Supervised Learning training (labeled) test (unlabeled) 85.5% Classifier New York Times New York Times 2/17

In Reality training (labeled) test (unlabeled) 64.1% Classifier Labeled data not Reuters available! New York Times New York Times 3/17

Domain Difference Performance Drop train test ideal setting NYT Classifier New York Times NYT 85.5% New York Times realistic setting Reuters Reuters Classifier NYT 64.1% New York Times 4/17

Other Examples Spam filtering – Public email collection personal inboxes Intrusion detection – Existing types of intrusions unknown types of intrusions Sentiment analysis – Expert review articles blog review articles The aim – To design learning methods that are aware of the training and test domain difference Transfer learning – Adapt the classifiers learnt from the source domain to the new domain 5/17

All Sources of Labeled Information test (comple tely unlabel ed) training (labeled) Reuters ? Classifier New York Times Newsgroup 6/17

A Synthetic Example Training Test (have conflicting concepts) Partially over lapping 7/17

Goal Source Domain Source Target Domain Domain Source Domain To unify knowledge that are consistent with the test domain from multiple source domains 8/17

Summary of Contributions Transfer from multiple source domains – Target domain has no labeled examples Do not need to re-train – Rely on base models trained from each dom ain – The base models are not necessarily develo ped for transfer learning applications 9/17

Locally Weighted Ensemble f i ( x, y ) P (Y y x, Ci ) Training set 1 Training set 2 C1 f 1 ( x, y ) X-feature value y-class label w1 ( x) C2 f 2 ( x, y ) w2 ( x) Test example x k E f ( x, y ) wi ( x) f i ( x, y ) i 1 k Training set k Ck wk (x) i w ( x) 1 f k ( x, y ) i 1 y x arg max y f E ( x, y ) 10/17

Optimal Local Weights Higher Weight C1 0.9 0.1 Test example x C2 0.4 0.6 Optimal weights – Solution to a regression problem – Impossible to get since f is unknown! 11/17 0.8 0.2

Graph-based Heuristics Higher Weight Graph-based weights approximation – Map the structures of a model onto the structure s of the test domain – Weight of a model is proportional to the similarity between its neighborhood graph and the clusteri ng structure around x. 12/17

Data Sets Experiments Setup – Synthetic data sets – Spam filtering: public email collection personal inboxes (u01, u02, u03) (ECML/PKDD 2006) – Text classification: same top-level classification problems with different sub-fields in the training and test sets (Newsgroup, Re uters) – Intrusion detection data: different types of intrusions in training and test sets. Baseline Methods – – – – – One source domain: single models (WNN, LR, SVM) Multiple source domains: SVM on each of the domains Merge all source domains into one: ALL Simple averaging ensemble: SMA Locally weighted ensemble: LWE 13/17

Experiments on Synthetic Data 14/17

1 Experiments on Real Data 0.9 WNN LR SVM SMA LWE 0.8 0.7 0.6 0.5 1 Spam Newsgroup Reuters 0.9 Set 1 Set 2 ALL SMA LWE 0.8 0.7 0.6 0.5 DOS Probing R2L 15/17

Conclusions Locally weighted ensemble framework – transfer useful knowledge from multiple s ource domains Graph-based heuristics to compute we ights – Make the framework practical and effecti ve 16/17

Thanks! Any questions? Welcome to our poster tonight! jinggao3/kdd08transfer.htm [email protected] 17/17

