《2-2 超图学习及其应用.pdf》由会员分享,可在线阅读,更多相关《2-2 超图学习及其应用.pdf(44页珍藏版)》请在三个皮匠报告上搜索。
1、代价敏感超图学习及其应用王楠 讲师|01Hypergraph Learning02Cost Sensitive Hypergraph Learning03Applications目录目录 CONTENT|Hypergraph Learning01|Hypergraph Learning|Graphv1v2v3v4v5v6v7v1v2v3v4v5v6v7e1e2e3HypergraphHypergraph Learning|目标公式目标公式最终超图学习的目标公式平滑数据之间的关系经验损失1 Zhou D,Huang J,Schlkopf B.Learning with hypergraphs:C
2、lustering,classification,and embedding,Advances in neural information processing systems.2006:1601-1608.Hypergraph Learning|目标公式目标公式在分类工作中需要被打标的测试数据已打标训练数据的标签信息经验损失Hypergraph Learning|最终超图学习的目标公式平滑数据之间的关系原则:超图上的超点之间连接超边越多,标签越相近。目标公式目标公式经验损失Hypergraph Learning|目标公式目标公式最终超图学习的目标公式有标签数据的经验性损失平滑数据之间的关系V
3、ertex Re-Weighting|异常攻击检测正常行为异常行为Vertex-weighted Hypergraph Learning|Nan Wang,Zizhao Zhang,Xibin Zhao,Quan Miao,Rongrong Ji,Yue Gao:Exploring High-Order Correlations for Industry Anomaly Detection.IEEE Trans.Ind.Electron.66(12):9682-9691(2019)Vertex-weighted Hypergraph Learning|数据权重初始化结合孤立值和相似值计算样例权
4、重1 Y.Zhang,L.Li,J.Zhou,X.Li,and Z.Zhou,“Anomaly detection with partially observed anomalies,”in Proc.Web Conf.,2018,pp.639646.Vertex-weighted Hypergraph Learning|计算未打标计算未打标数据与异常数据与异常簇的相似性簇的相似性孤立森林孤立森林分析分析已标记异常数据已标记异常数据未标记数据未标记数据聚类分析聚类分析计算相似性得分计算相似性得分异常数据更异常数据更容易被最早容易被最早孤立孤立计算孤立性得分计算孤立性得分根据数据质量根据数据质量
5、(相似性和孤立性相似性和孤立性)计算样例权重计算样例权重Vertex-weighted Hypergraph Learning|构建数据权重优化超图模型Vertex-weighted Hypergraph Learning|Compared Methods(1)Anomaly detection with partially observed anomalies(ADOA)1(2)Partial differential equation continuum limits(PDEs)2(3)Isolation forest(iForest)3(4)Oversampling principal
6、component analysis(osPCA)4(5)Nonnegative sparse graph based label propagation(NSGLP)5(6)Vertex-weighted hypergraph learning(V-HL)6Evaluation(1)Accuracy(2)AUC(3)Precision(4)PD(5)PF(6)1-measureOn industry anomaly detection dataset.On ODDS dataset1 Y.Zhang,L.Li,J.Zhou,X.Li,and Z.Zhou,“Anomaly detection
7、 with partially observed anomalies,”in Proc.Web Conf.,2018,pp.639646.2 B.Abbasi,J.Calder,and A.M.Oberman,“Anomaly detection and clas?sification for streaming data using PDEs,”SIAM J.Appl.Math.,vol.78,no.2,pp.921941,2017.3 F.T.Liu,K.M.Ting,and Z.Zhou,“Isolation-based anomaly detection,”ACM Trans.Know
8、l.Discovery From Data,vol.6,no.1,pp.139,2012.4 Y.-R.Yeh,Z.-Y.Lee,and Y.-J.Lee,Anomaly Detection via Over-Sampling Principal Component Analysis.Berlin,Germany:Springer,2009,pp.449458.5 Z.W.Zhang,X.Y.Jing,and T.J.Wang,“Label propagation based semisu?pervised learning for software defect prediction,”Au
9、tomated Softw.Eng.,vol.24,no.1,pp.123,2016.6 L.Su,Y.Gao,X.Zhao,H.Wan,M.Gu,and J.Sun,“Vertex-weighted hypergraph learning for multi-view object classification,”in Proc.Int.Joint Conf.Artif.Intell.,2017,pp.27792785.7 D.Dheeru and E.K.Taniskidou,“UCI machine learning repository,”2017.Online.Available:h
10、ttp:/archive.ics.uci.edu/ml8“Odds library,”2016.Online.Available:http:/odds.cs.stonybrook.eduTesting Data(1)Anomaly detection datasets7(2)ODDS dataset8Cost Sensitive Hypergraph Learning02|Cost Sensitive Hypergraph Learning|错分含缺陷软件错分无缺陷软件软件缺陷预测错分正确数据网络攻击检测错分攻击数据不同类别的错分代价差别明显Cost Sensitive Hypergraph
11、Learning|Cost Sensitive Hypergraph Learning|特征选择代价敏感拉普拉斯得分 11 Mingxia Liu,Linsong Miao,Daoqiang Zhang:Two-Stage Cost-Sensitive Learning for Software Defect Prediction.IEEE TR,63(2):676-686(2014)Cost Sensitive Hypergraph Learning|训练数据选取利用测试数据选取并构建训练数据空间Cost Sensitive Hypergraph Learning|代价敏感超图学习第i个样例
12、的错分代价Cost MatrixCost MatrixHypergraph Learning with Cost Interval Optimization|Dataset:NASA Software Defect Dataset990 defected/4674 non-defectedCriteria:AUCNGSLPNGSLPNGSLP:Label Propagation based Semi-supervised Learning for Software Defect Prediction.Automated Software Engineering.2017.Dataset:CK
13、Metrics Dataset1011 defected/1611 non-defectedCriteria:AUC0.4880.9100.5730.8700.8340.820CSHLCSHL0.694HL0.723HLHypergraph Learning with Cost Interval Optimization|Evaluation on Cost-Sensitive LearningDataset:NASA Software Defect Dataset990 defected/4674 non-defectedMethods:HL:Hypergraph LearningCSHL:
14、Cost-Sensitive HLCriteria:Cost ratio compared with HLCost Sensitive Hypergraph Learning|Compared Methods(1)Non-negative Sparse Graph-Based Label Propagation(NSGLP)1(2)Cost-sensitive Discriminative Dictionary Learning(CDDL)2(3)Multiple Kernel Ensemble Learning(MKEL)3(4)Cost-sensitive Transfer Kernel
15、Canonical Correlation Analysis(CTKCCA)4(5)Anomaly Detection with Partially Observed Anomalies(ADOA)5(6)Vertex-weighted hypergraph learning(VWHL)6(7)Search-based Hypergraph Learning(SHL)7Evaluation(1)Accuracy(2)AUC(3)Precision(4)1-measureOn industry anomaly detection dataset.On ODDS dataset1Z.-W.Zhan
16、g,X.-Y.Jing,T.-J.Wang,Label propagation based semi-supervised learning for software defect prediction,Autom.Softw.Eng.24(1)(2017)4769.2 X.-Y.Jing,S.Ying,Z.-W.Zhang,S.-S.Wu,J.Liu,Dictionary learning based software defect prediction,in:Proceedings of ICSE,2014,pp.414423.3 T.Wang,Z.Zhang,X.Jing,L.Zhang
17、,Multiple kernel ensemble learning for software defect prediction,Autom.Softw.Eng.23(4)(2016)569590.4 Z.Li,X.Jing,F.Wu,X.Zhu,B.Xu,S.Ying,Cost-sensitive transfer kernel canonical correlation analysis for heterogeneous defect prediction,Autom.Softw.Eng.25(2)(2018)201245.5 Y.Zhang,L.Li,J.Zhou,X.Li,Z.Zh
18、ou,Anomaly detection with partially observed anomalies,in:Proceedings of WWW,2018,pp.639646.6 N.Wang,Z.Zhang,X.Zhao,Q.Miao,R.Ji,Y.Gao,Exploring high-order correlations for industry anomaly detection,IEEE Trans.Ind.Electron.66(12)(2019)96829691.7 D.Zhou,J.Huang,B.Schokopf,Learning with hypergraphs:Cl
19、ustering,classification,and embedding,in:Proceedings of NeurIPS,2007,pp.16011608.8 D.Dheeru and E.K.Taniskidou,“UCI machine learning repository,”2017.Online.Available:http:/archive.ics.uci.edu/ml9“Odds library,”2016.Online.Available:http:/odds.cs.stonybrook.eduTesting Data(1)Anomaly detection datase
20、ts8(2)ODDS dataset9Hypergraph Learning with Cost Interval Optimization|Hypergraph Learning with Cost Interval Optimization|错分含错软件模块错分无错软件模块Software Defect Prediction代价代价?错分正常数据错分攻击数据Network Attack Detection代价代价?1:2?1:5?1:2?1:5?Hypergraph Learning with Cost Interval Optimization|相对于获得准确的代价值,获得代价区间相对更
21、容易代价下限代价区间代价上限Hypergraph Learning with Cost Interval Optimization|真正代价真正风险 L(h,)真正最优分类器真实代价未知代理代价代理风险L(h,)代理最优分类器对于代价区间中的所有代价值,风险都要足够的小约束的条件是无限的约束的条件是无限的真正代价:错分代价区间,Hypergraph Learning with Cost Interval Optimization|利用有限的约束来优化问题CISVM1cisLDM2最小化最小化最坏风险最坏风险以及以及平均风险平均风险the worst case risk 代价区间中的代价值都能满
22、足原始公式中的约束平均风险平均风险减少整体失真最坏风险最坏风险1 Liu,X.,and Zhou,Z.2010.Learning with cost intervals.In Proceedings of the 16thACM SIGKDD International Conference on Knowledge Discovery and Data Mining,403412 2 Zhou,Y.,and Zhou,Z.2016.Large margin distribution learning with cost interval and unlabeled data.IEEE Tra
23、nsactions on Knowledge and Data Engineering 28(7):17491763.Hypergraph Learning with Cost Interval Optimization|Hypergraph Construction1e2e3e7v6v5v4v3v2v4e5e8v1vHypergraph Learning with Cost Interval OptimizationCost ResultsCost 2Cost nClassifier with the Smallest Total-Cost Hypergraph Learning with
24、Different ParametersCost Matrix Mapping Mapping Vector Vector HyperedgeHyperedge Weight Weight W WParametersCost Matrix Mapping Mapping Vector Vector HyperedgeHyperedge Weight Weight W WParametersCost Matrix Mapping Mapping Vector Vector HyperedgeHyperedge Weight Weight W WParametersCost=Cost=Cost=T
25、otal-cost withCost=Cost ResultsCost ResultsTotal-Cost 1Total-Cost 2Total-Cost nXibin Zhao,Nan Wang,Heyuan Shi,Hai Wan,Jin Huang,Yue Gao:Hypergraph Learning With Cost Interval Optimization.AAAI 2018:4522-4529Hypergraph Learning with Cost Interval Optimization|Cost Sensitive Learning Hyperedge Weight
26、W Mapping Vector Cost FunctionCost Sensitive Learning Hyperedge Weight W Mapping Vector Cost FunctionCost Sensitive Learning Hyperedge Weight W Mapping Vector Cost Function,Classifier 1Classifier 2Classifier n Hypergraph learning with cost interval optimization Step 1.减小最坏风险:利用多组不同参数值训练超图Hypergraph
27、Learning with Cost Interval Optimization|Hypergraph learning with cost interval optimizationStep 2.减小平均风险:选取不同参数值训练的超图中代价最小的一个作为分类器Classifier 1Classifier 2Classifier nCost ResultNDTotal-costCost ResultNDTotal-costCost ResultNDTotal-costClassifier with smallest total-costHypergraph Learning with Cost
28、 Interval Optimization|Dataset:NASA Software Defect Dataset990 defected/4674 non-defectedCriteria:AUCNGSLPCIHLNGSLPCIHLNGSLP:Label Propagation based Semi-supervised Learning for Software Defect Prediction.Automated Software Engineering.2017.Dataset:CK Metrics Dataset1011 defected/1611 non-defectedCr
29、iteria:AUC0.4880.9100.5730.8700.8340.820CSHLCSHL0.694HL0.723HLHypergraph Learning with Cost Interval Optimization|Evaluation on Cost-Sensitive LearningDataset:NASA Software Defect Dataset990 defected/4674 non-defectedMethods:HL:Hypergraph LearningCSHL:Cost-Sensitive HLCIHL:CSHL with Cost Interval Op
30、timizationCriteria:Cost ratio compared with HLHypergraph Learning with Cost Interval Optimization|Compared Methods(1)Non-negative Sparse Graph-Based Label Propagation(NSGLP)1(2)Cost-sensitive Discriminative Dictionary Learning(CDDL)2(3)CISVM3(4)cisLDM 4Evaluation(1)Cost1Z.-W.Zhang,X.-Y.Jing,T.-J.Wan
31、g,Label propagation based semi-supervised learning for software defect prediction,Autom.Softw.Eng.24(1)(2017)4769.2 X.-Y.Jing,S.Ying,Z.-W.Zhang,S.-S.Wu,J.Liu,Dictionary learning based software defect prediction,in:Proceedings of ICSE,2014,pp.414423.3Liu,X.,and Zhou,Z.2010.Learning with cost interval
32、s.In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,403412.4Zhou,Y.,and Zhou,Z.2016.Large margin distribution learning with cost interval and unlabeled data.IEEE Transactions on Knowledge and Data Engineering 28(7):17491763.5Menzies,T.;Greenwald,J.
33、;and Frank,A.2007.Data mining static code attributes to learn defect predictors.IEEE Transactions on Software Engineering 33(1):213.6Lichman,M.2013.UCI machine learning repository.Testing Data(1)NASA datasets5(2)UCI dataset6Cost-Sensitive Hypergraph Learning with F-measure Optimization|Cost-Sensitiv
34、e Hypergraph Learning with F-measure Optimization|N.Wang,R.Liang,X.Zhao and Y.Gao,Cost-Sensitive Hypergraph Learning With F-Measure Optimization,in IEEE Transactions on Cybernetics,doi:10.1109/TCYB.2021.3126756.Cost-Sensitive Hypergraph Learning with F-measure Optimization|二分类多分类Cost-Sensitive Hyper
35、graph Learning with F-measure Optimization|Cost-Sensitive Hypergraph Learning with F-measure Optimization|Compared Methods(1)Large Margin Graph Quality Judgment(LEAD)1(2)Non-Negative Sparse Graph-Based Label Propagation(NSGLP)2(3)Cost-Sensitive Feature Selection(CSFS)3(4)Biconcave Programming for Ma
36、cro F-Measure Optimization(BEAM-F)4(5)Adaptive-Surrogates 5(6)Evolutionary Cost-Sensitive Deep Belief Network(ECS-DBN)6Evaluation(1)Accuracy(2)AUC(3)G-mean(4)1-measureOn binary-class datasets.1 Y.-F.Li,S.-B.Wang,and Z.-H.Zhou,“Graph quality judgement:A large margin expedition,”in Proc.IJCAI,2016,pp.
37、17251731.2 Z.-W.Zhang,X.-Y.Jing,and T.-J.Wang,“Label propagation based semi-supervised learning for software defect prediction,”Autom.Softw.Eng.,vol.24,no.1,pp.4769,2016.3Liu,C.Xu,Y.Luo,C.Xu,Y.Wen,and D.Tao,“Cost-sensitive fea?ture selection by optimizing F-measures,”IEEE Trans.Image Process.,vol.27
38、,pp.13231335,2018.4 H.Narasimhan,W.Pan,P.Kar,P.Protopapas,and H.G.Ramaswamy,“Optimizing the multiclass F-measure via biconcave programming,”in Proc.IEEE Int.Conf.Data Min.(ICDM),Barcelona,Spain,2016,pp.11011106.5 Q.Jiang,O.Adigun,H.Narasimhan,M.M.Fard,and M.R.Gupta,“Optimizing black-box metrics with
39、 adaptive surrogates,”in Proc.Int.Conf.Mach.Learn.,2020,pp.47844793.6 C.Zhang,K.C.Tan,H.Li,and G.S.Hong,“A cost-sensitive deep belief network for imbalanced classification,”IEEE Trans.Neural Netw.Learn.Syst.,vol.30,no.1,pp.109122,Jan.2019.7 D.Dheeru and E.K.Taniskidou.“UCI Machine Learning Repositor
40、y.”2017.Online.Available:http:/archive.ics.uci.edu/ml8 T.Menzies,R.Krishna,and D.Pryor.“The Promise Repository of Empirical Software Engineering Data.”2015.Online.Available:http:/openscience.us/repoTesting Data(1)UCI machine-learning repository7(2)NASA dataset8(3)CK metric dataset8Cost-Sensitive Hyp
41、ergraph Learning with F-measure Optimization|Compared Methods(1)Multiclass Optimal Margin Distribution Machine(mcODM)1(2)Vertex-Weighted Hypergraph Learning(V-HL)2(3)Cost-Sensitive Feature Selection(CSFS)3(4)Biconcave Programming for Macro F-Measure Optimization(BEAM-F)4(5)Adaptive-Surrogates 5(6)Ev
42、olutionary Cost-Sensitive Deep Belief Network(ECS-DBN)6Evaluation(1)Accuracy(2)AUC(3)G-mean(4)1-measureOn multi-class datasets.1 T.Zhang and Z.-H.Zhou,“Multi-class optimal margin distribution machine,”in Proc.ICML,vol.70,2017,pp.40634071.2 L.Su,Y.Gao,X.Zhao,H.Wan,M.Gu,and J.Sun,“Vertex-weighted hype
43、rgraph learning for multi-view object classification,”in Proc.IJCAI,2017,pp.27792785.3Liu,C.Xu,Y.Luo,C.Xu,Y.Wen,and D.Tao,“Cost-sensitive fea?ture selection by optimizing F-measures,”IEEE Trans.Image Process.,vol.27,pp.13231335,2018.4 H.Narasimhan,W.Pan,P.Kar,P.Protopapas,and H.G.Ramaswamy,“Optimizi
44、ng the multiclass F-measure via biconcave programming,”in Proc.IEEE Int.Conf.Data Min.(ICDM),Barcelona,Spain,2016,pp.11011106.5 Q.Jiang,O.Adigun,H.Narasimhan,M.M.Fard,and M.R.Gupta,“Optimizing black-box metrics with adaptive surrogates,”in Proc.Int.Conf.Mach.Learn.,2020,pp.47844793.6 C.Zhang,K.C.Tan
45、,H.Li,and G.S.Hong,“A cost-sensitive deep belief network for imbalanced classification,”IEEE Trans.Neural Netw.Learn.Syst.,vol.30,no.1,pp.109122,Jan.2019.7 D.Dheeru and E.K.Taniskidou.“UCI Machine Learning Repository.”2017.Online.Available:http:/archive.ics.uci.edu/mlTesting Data(1)UCI machine-learning repository7On group oneOn group twoOn group threeOn group fourApplications03|应用 1:攻击异常检测|攻击异常检测 软件错误预测Defect-Prone Defect-Free 应用2:疾病诊断|非常感谢您的观看|王楠