《弱监督机器学习研究新进展(42页).pdf》由会员分享,可在线阅读,更多相关《弱监督机器学习研究新进展(42页).pdf(42页珍藏版)》请在三个皮匠报告上搜索。
1、July 23,2017 Recent Advances in Machine Learning from Weak Supervision Masashi Sugiyama Director,RIKEN Center for Advanced Intelligence Project(AIP)Professor,The University of Tokyo CCAI2017 Sugiyama,Suzuki&Kanamori,Density Ratio Estimation in Machine Learning,Cambridge University Press,2012 Sugiyam
2、a&Kawanabe,Machine Learning in Non-Stationary Environments,MIT Press,2012 Sugiyama,Statistical Reinforcement Learning,Chapman and Hall/CRC,2015 Supervised learning Reinforcement learning Unsupervised learning Textbooks Sugiyama,Introduction to Statistical Machine Learning,Morgan Kaufmann,2015 Quione
3、ro Sugiyama,Schwaighofer&Lawrence,Dataset Shift in Machine Learning,MIT Press,2009.In Japanese,(Chinese&Korean)What Is My Talk about?Machine learning from big data is successful.Great work on large-scale parallel implementation.However,there are various applications where massive labeled data is not
4、 available.Medicine,manufacturing,disaster,infrastructure In this talk,I will introduce our recent advances in classification from limited information.2 Supervised Classification Binary classification from labeled samples:A large number of labeled samples yield better classification performance.Opti
5、mal convergence rate:3 Positive Negative Decision boundary Unsupervised Classification 4 Since collecting labeled samples is costly,lets learn a classifier from unlabeled data.This is equivalent to clustering.To justify this,need the assumption that each cluster corresponds to each class.This is rar
6、ely satisfied in practice.Semi-Supervised Classification Use a large number of unlabeled samples and a small number of labeled samples:Find a decision boundary along cluster structure induced by unlabeled samples:Sometimes very useful!But same weakness as unsupervised classification.5 Positive Negat
7、ive Unlabeled Zhou,Bousquet,Lal,Weston&Schlkopf(NIPS2003)and many Supervised Unsupervised Semi-supervised Classification of Classification 6 Achieving high classification accuracy with low labeling costs is always a big challenge!Labeling cost High Low Accuracy High Low High accuracy&low labeling co
8、st Relation to Deep Learning 7 Linear Kernel Deep Model Additive Supervised Unsupervised Reinforcement Learning Methods Semi-supervised My talk Any learning method and model can be combined!Organization 1.Classification of classification 2.Classification from UU data 3.Classification from PU data 4.
9、Classification from PNU data 5.Classification from complementary labels 6.Introduction RIKEN Center for AIP 8 UU Classification:Setup Given:Two sets of unlabeled data Assumption:Only class-priors are different Goal:Obtain a classifier 9 du Plessis,Niu&Sugiyama(TAAI2013)Optimal UU Classifier Sign of
10、the difference of class-posteriors:Under equal test class-prior ,Sign of is unknown,but just knowing still allows optimal separation!10 du Plessis,Niu&Sugiyama(TAAI2013)Boundary UU Classifier Training Difference of kernel density estimators:Estimate from ,separately.Simple but systematic under-estim
11、ation of .Direct estimation of density-difference:Fit model to directly without estimating .Linear least-squares formulation yields global analytic solution!Direct estimation of sign of density-difference:Most direct approach(following Vapniks principle!).Non-convex optimization is involved(use,e.g.
12、,CCCP).11 Anderson,Hall&Titterington(J.Multivariate Analysis 1994)Kim&Scott(IEEE-TPAMI2010)Sugiyama,Suzuki,Kanamori,du Plessis,Liu&Takeuchi (NIPS2012,NeCo2013)du Plessis,Niu&Sugiyama(TAAI2013)Experiments UU classification with direct estimation of(sign of)density difference works well!12 k-means UU
13、classification Clustering Spectral Ng et al.(NIPS2001)Infomax Sugiyama et al.(ICML2011)5%t-test Misclassification error rate:average(std)UU Classification:Summary Given two unlabeled datasets with different class-priors,we estimate the sign of difference of class-posteriors:Same convergence rate as
14、fully supervised case can be achieved!Unlike classification from label proportions,we do not have to know class priors.13 Quadrianto,Smola,Caetano&Le(JMLR2009)Organization 1.Classification of classification 2.Classification from UU data 3.Classification from PU data 4.Classification from PNU data 5.
15、Classification from complementary labels 6.Introduction RIKEN Center for AIP 14 PU Classification:Setup 15 Given:Positive and unlabeled samples Goal:Obtain an(ordinary)PN classifier Positive Unlabeled(mixture of positives and negatives)Examples:Click vs.non-click Friend vs.non-friend Classification
16、Risk Risk of classifier :Since we do not have N data in the PU setting,the risk cannot be directly estimated.16 Risk for P data Risk for N data :Expectation :Loss :Class-prior probability(assumed known;can be estimated)Scott&Blanchard(AISTATS2009)Blanchard,Lee&Scott(JMLR2010)du Plessis,Niu&Sugiyama(
17、IEICE2014,MLJ2017)PU Unbiased Risk Estimation U-density is a mixture of P-and N-densities:Eliminating the N-density yields Unbiased risk estimation is possible only from PU data!17 Natarajan,Dhillon,Ravikumar&Tewari(NIPS2013)du Plessis,Niu&Sugiyama(ICML2015)Risk for P data Risk for N data Theoretica
18、l Analysis Estimation error bounds:PU(and PN)achieve optimal convergence rate.Comparison:PU bound is smaller than PN if PU can be better than PN provided a large number of PU data!18 Niu,du Plessis,Sakai,Ma&Sugiyama(NIPS2016):#of positive,negative and unlabeled samples Further Correction PN formulat
19、ion:PU formulation:Risk for N data is non-negative by definition,but its approximation from PU samples can be negative due to“difference of approximations”.In particular,for flexible models such as deep nets.19 Risk for P data Risk for N data Kiryo,Niu,du Plessis&Sugiyama(arXiv2017)Non-Negative PU C
20、lassification We constrain the sample approximation term to be non-negative through back-prop training:Now the risk estimator is biased.Is it really good?20 Stochastic gradient iterations Error PU test PN test PU train PN train Overfitting Empirical error goes negative Conv net Theoretical Analysis
21、is still consistent and its bias decreases exponentially:In practice,we can ignore the bias of !Mean-squared error of is not more than the original one.In practice,is more reliable!Risk of for linear models converges with optimal parametric order:Learned function is optimal.21 :#of positive and unla
22、beled samples Experiments With a large number of unlabeled data,non-negative PU can even outperform PN!22 Plain PU test PN test Non-negative PU test Plain PU train PN train Non-negative PU train Stochastic gradient iterations Error Binary CIFAR-10:Positive(airplane,automobile,ship,truck)Negative(bir
23、d,cat,deer,dog,frog,horse)13-layer CNN with ReLU PU Classification:Summary 23 Just separating P and U is biased.To be unbiased,use composite loss for P data.Optimal convergence rate achieved.If ,the same loss for P and U data.If ,optimization becomes convex.For deep nets,roundup the empirical false
24、negative error.Squared Margin Double hinge Logistic Ramp Margin du Plessis,Niu&Sugiyama(ICML2015)du Plessis,Niu&Sugiyama (NIPS2014)Natarajan,Dhillon,Ravikumar&Tewari(NIPS2013)Kiryo,Niu,du Plessis&Sugiyama(arXiv2017)Niu,du Plessis,Sakai,Ma&Sugiyama(NIPS2016)Organization 1.Classification of classifica
25、tion 2.Classification from UU data 3.Classification from PU data 4.Classification from PNU data 5.Classification from complementary labels 6.Introduction RIKEN Center for AIP 24 PNU Classification 25 PNU classification is semi-supervised learning.Lets decompose this into PU,PN,and NU classification:
26、Each can be solved easily.Combine two of them!Positive Negative Unlabeled PU NU PN Sakai,du Plessis,Niu&Sugiyama(ICML2017)PU+NU Classification Natural choice:Combine PU&NU(symmetric).Theoretical risk analysis:When PUNU,PUPNNU or PNPUNU.When NUPU,NUPNPU or PNNUPU.PU+NU is not the best possible combin
27、ation.PU+PN&NU+PN are the best combinations.26 PU NU Niu,du Plessis,Sakai,Ma&Sugiyama(NIPS2016)PN+PU&PN+NU Classification Proposed method:Combine best methods:PN+PU classification:PN+NU classification:27 PU NU PN Theoretical Analysis Generalization error bounds:Unlabeled data always helps without cl
28、uster assumptions!We use unlabeled data for loss evaluation,not for regularization(as manifold smoothing).Label information is extracted from unlabeled data!28 :Empirical version of :#of positive,negative and unlabeled samples Experiments Proposed PN+PU&PN+NU works well!29(Grandvalet&Bengio,NIPS2004
29、)(Belkin et al.,JMLR2006)EntReg Proposed(Niu et al.,ICML2013)(Li et al.,JMLR2013)5%t-test Misclassification error rate:average(std)Organization 1.Classification of classification 2.Classification from UU data 3.Classification from PU data 4.Classification from PNU data 5.Classification from compleme
30、ntary labels 6.Introduction RIKEN Center for AIP 30 Classification from Complementary Labels Complementary label:Pattern does not belong to class .Choosing a complementary class is less laborious than choosing an ordinary class label for large .Goal:Learn a classifier from complementary labels.31 Cl
31、ass 1 Class 2 Decision boundary Class 3 Ishida,Niu&Sugiyama(arXiv2017)Possible Approaches Approach 1:Classification from partial labels Multiple candidate classes are provided for each .Complementary labels are the extreme case of partial labels given to all classes other than .Approach 2:Multi-labe
32、l classification Each can belong to multiple classes.Negative label for and positives for the rest.We want a more direct approach!32 Cour,Sapp&Taskar(JMLR2011)Unbiased Risk Estimation with Complimentary Labels -class classifier:Classification risk:For pairwise symmetric loss,risk is Unbiased risk es
33、timation is possible from !33 :Expectation :1-vs-rest classifier for Ishida,Niu&Sugiyama(arXiv2017)Theoretical Analysis Estimation error:Optimal parametric convergence rate!34 Experiments 35 5%t-test Proposed Partial-label Multi-label Ordinary label Use only 1/(c-1)times less samples since 1 ordinar
34、y label corresponds to(c-1)complementary labels Correct classification rate:average(std)Proposed method works well!Summary 36 We need continuous effort to achieve high classification accuracy with low labeling!UU classification PU classification PNU classification Complementary labels And more!Super
35、vised Unsupervised Semi-supervised Labeling cost High Low Accuracy High Low High accuracy&low labeling costs Organization 1.Classification of classification 2.Classification from UU data 3.Classification from PU data 4.Classification from PNU data 5.Classification from complementary labels 6.Introdu
36、ction RIKEN Center for AIP 37 RIKEN Center for AIP RIKEN founded Center for Advanced Intelligence Project(AIP)in 2016.Our missions:1.Development of next-generation AI technology(understand deep learning and go beyond)2.Acceleration of scientific research(iPS cells,manufacturing,materials)3.Contribut
37、ion to solving socially critical problems(healthcare for super-aged society,disaster resilience,infrastructure management)4.Study of ethical,legal and social issues of AI.5.Human resource development(academia&industry)38 Organization of AIP Center 39 Various application domains(companies,universitie
38、s,research institutes,etc.)Goal-Oriented Technology Research Group:Abstract complex real-world problems into solvable forms(22 teams)Generic Technology Research Group:Develop fundamental theory and algorithms for abstracted problems(18 teams)Artificial Intelligence in Society Research Group:Analyze
39、the influence of AI spreading in society(8 teams)Over 200 researchers!2017 June 1st NEC/Fujitsu/Toshiba Collaboration Centers International Partners US Toyota Technological Institute at Chicago University of Pennsylvania Germany Berlin Big Data Center Technische Universitaet Darmstadt UK Edinburgh C
40、enter for Robotics Finland Aalto University China Peking University Nanjing University Shanghai University Hong Kong University of Science and Technology Korea KAIST Postech Artificial Intelligence Research Institute Singapore National University of Singapore 40 More coming soon!Computational Resour
41、ces 24 x NVIDIA DGX-1(half-precision 4PFLOPS)The largest customer installation of DGX-1 systems in March 2017.Ranked 4th in the Green500 List(June 2017)10.602GFLOPS/W 41 With Dr.Bill Dally(NVIDIA SVP)(Feb.27,2017)https:/blogs.nvidia.co.jp/2017/03/06/fujitsu-ai-supercomputer/Our Office in the Heart of Tokyo!Directly connected to Nihonbashi Station.Walking distance from Tokyo Station.Open discussion space Entrance 15th floor of this bldg.Visit us!