《Graph Out-of-distribution Generalization.pdf》由会员分享,可在线阅读,更多相关《Graph Out-of-distribution Generalization.pdf(38页珍藏版)》请在三个皮匠报告上搜索。
1、DataFunSummit#2023Graph Out-of-Distribution Generalization From a Causal Perspective演讲人:隋勇铎(Yongduo Sui)中国科学技术大学博士生蚂蚁集团实习生01Background and Motivation03Causal Attention Learning02Related Studies04Adversarial Invariant AugmentationCONTENTDataFunSummit#202301Background and Motivation1.1 Background Grap
2、h data are everywhere Social network Chemical molecule Biological proteinSocial networkMoleculeProtein structure Node classification Link prediction/classification Graph classification Graph learning tasks1.2 Graph Out-of-distribution Issue OOD Issue in Image Classification1 OoD-Bench:Quantifying an
3、d Understanding Two Dimensions of Out-of-Distribution Generalization,CVPR 2022Covariate shiftCorrelation shiftCow&Camel1.2 Graph Out-of-distribution Issue OOD Issue in Graph Classification3 OOD-GNN:Out-of-Distribution Generalized Graph Neural Network,TKDE 20222 Discovering Invariant Rationales for G
4、raph Neural Networks,ICLR 20221.3 Assumption of Graph Generation Stable(Causal)&Environmental Feature Sufficiency&Invariance Assumption Molecule:CyclopropanolScaffold:3-carbon ringMolecule:1,4-cyclohexanediol Scaffold:6-carbon ringStable feature:functional group,e.g.OH,-COOHEnvironmental feature:sca
5、ffold,e.g.carbon ring,carbon chainMolecule:acetic acidScaffold:small sizeMolecule:citric acidScaffold:large size4 Learning Invariant Graph Representations for Out-of-Distribution Generalization,NeurIPS 20225 Learning Substructure Invariance for Out-of-Distribution Molecular Representations,NeurIPS 2
6、0221.4 Our Motivations Possible reasons for poor performance of GNNs on OOD test dataStable features are difficult to captureEnvironmental features are not discrepant enough Stable features are the key to improving the OOD generalization,while the spurious correlation in data makes the model to lear
7、n shortcuts.The scarcity of the training environments reduces the performance of the model in the face of test data in unknown environments.DataFunSummit#202302Related StudiesGraph Kernel&Graph MatchingCalculate the similarity between graphs by decomposing the complete graph data into some substruct
8、ure,and then comparing the substructures on the graph.E.g.,Graphlet kernel(GK),WeisfeilerLehman Kernel(WL).Graph Neural Networks(GNNs)Graph neural networks are based on the message passing mechanism.The graph feature extraction and classification are optimized end-to-end.Representative models includ
9、e GCN,GIN.Pooling-or Attention-based GNNsTo select discriminative subgraphs or features in graph data,pooling or attention-based GNNs are gradually emerging.Representative models include SAGPool,DiffPool,ASAP.Kernel-based methodGNNPooling-or Attention-based GNNs2.1 Graph Classification Methods2.2 OO
10、D GeneralizationInvariant LearningIt learns invariant features by minimizing the empirical risk in various environments.In terms of implementation,the regularization constraint problem is established to facilitate the solution.The representative algorithm is IRM.Stable LearningTo remove the irreleva
11、nt features and spurious correlations,it simulates the unbiased distribution of the data by re-weighting the input samples.The representative algorithm is StableNet.Distributionally Robust Optimization(DRO)It optimizes the worst distribution in an uncertain distribution set to ensure the generalizat
12、ion of the model under the worst distribution.The representative algorithm is Group-DRO.Data AugmentationIt points out that the root cause of the poor generalization is the insufficient scale and diversity of the training data.Hence,it aims to increase the scale and diversity of data,thereby improvi
13、ng the generalization.The representative algorithm is Mixup.2.3 Graph OOD GeneralizationGraph Data AugmentationRong,et al,ICLR2020,Liu,et al,KDD 2022,Han,et al,ICML 2022Graph Invariant LearningWu,et al,ICLR22(a),Wu,et al,ICLR22(b),Li,et al,NeurIPS22.General Generalization AlgorithmsZhang,et al,ICLR
14、18.Sagawa,et al,ICLR 20.Arjovsky,et al,Arxiv 19.Due to the irregularity of graph data,they are difficult to achieve significant performance improvements.They are difficult to improve the environmental discrepancy.They are prone to destroy the stable features in the data,resulting in the insensitivit
15、y to stable features.MethodsLimitationsDataFunSummit#202303Causal Attention Learning3.1 Causal Attention Learning(CAL)Our Idea:Analyzing the GNN modeling from a causal view:Graph data:Stable feature:Environmental feature:Representation:Prediction/Label Structural Causal Model(SCM)for graph classific
16、ation Backdoor path Potential problem The model will learn the“shortcuts”.These shortcuts may not exist outside the training distribution,resulting in poor OOD generalization.Existing Issue:Stable features are difficult to captureCausal Attention for Interpretable and Generalizable Graph Classificat
17、ion,KDD 2022Theory and Method Design:Causal Interventions Backdoor Adjustment3.1 Causal Attention Learning(CAL)Causal Attention for Interpretable and Generalizable Graph Classification,KDD 2022 Causal Attention Learning(CAL)Estimating attentionDisentanglementCausal intervention3.1 Causal Attention L
18、earning(CAL)Causal Attention for Interpretable and Generalizable Graph Classification,KDD 2022 Causal Attention Learning(CAL)Estimating attentionDisentanglementCausal intervention3.1 Causal Attention Learning(CAL)Causal Attention for Interpretable and Generalizable Graph Classification,KDD 2022 Caus
19、al Attention Learning(CAL)Estimating attentionDisentanglementCausal intervention3.1 Causal Attention Learning(CAL)Causal Attention for Interpretable and Generalizable Graph Classification,KDD 2022 Causal Attention Learning(CAL)Estimating attentionDisentanglementCausal intervention3.1 Causal Attentio
20、n Learning(CAL)Causal Attention for Interpretable and Generalizable Graph Classification,KDD 2022Experiments on synthetic datasets 3.1 Causal Attention Learning(CAL)Causal Attention for Interpretable and Generalizable Graph Classification,KDD 2022Further analysis3.1 Causal Attention Learning(CAL)Cau
21、sal Attention for Interpretable and Generalizable Graph Classification,KDD 2022DataFunSummit#202304Adversarial Invariant Augmentation Correlation Shift v.s.Covariate Shift The joint distribution of training and test data as tr,and t,Distribution shift means that tr,t,tr,=tr|trt,=t|t Correlation shif
22、t:tr=t but tr|t|Covariate shift:tr t but tr|=t|4.1 Adversarial Invariant AugmentationGraph Data Generation6 GOOD:A Graph Out-of-Distribution Benchmark,NeurIPS 20221 OoD-Bench:Quantifying and Understanding Two Dimensions of Out-of-Distribution Generalization,CVPR 2022Our scope:these distribution shif
23、ts mainly caused by the environmental features.1 OoD-Bench:Quantifying and Understanding Two Dimensions of Out-of-Distribution Generalization,CVPR 2022Covariate shiftCorrelation shiftCow&Camel4.1 Adversarial Invariant Augmentation Correlation Shift v.s.Covariate Shifte.g.Domain Generalization4.1 Adv
24、ersarial Invariant Augmentation Correlation Shift v.s.Covariate Shifttr=ttr|=t|Unleashing the Power of Graph Data Augmentation on Covariate Distribution Shift,NeurIPS 20234.1 Adversarial Invariant AugmentationExisting Issue:Insufficient discrepancy of environmental features Using data augmentation t
25、o increase the environmental discrepancy Graph Covariate Shift Our IdeaUnleashing the Power of Graph Data Augmentation on Covariate Distribution Shift,NeurIPS 20234.1 Adversarial Invariant Augmentation Principle 1(Environmental Feature Discrepancy):Environmental features should keep discrepant durin
26、g augmentation Principle 2(Stable Feature Consistency):Stable features should keep consistency during augmentation Two Principles for Graph AugmentationUnleashing the Power of Graph Data Augmentation on Covariate Distribution Shift,NeurIPS 20234.1 Adversarial Invariant Augmentation Distributionally
27、Robust OptimizationWasserstein DistanceTransportation CostLagrangian relaxationUnleashing the Power of Graph Data Augmentation on Covariate Distribution Shift,NeurIPS 20234.1 Adversarial Invariant Augmentation Robust surrogate loss:Adversarial AugmentationUnleashing the Power of Graph Data Augmentat
28、ion on Covariate Distribution Shift,NeurIPS 20234.1 Adversarial Invariant Augmentation Adversarial Augmenter&Stable Feature GeneratorUnleashing the Power of Graph Data Augmentation on Covariate Distribution Shift,NeurIPS 20234.1 Adversarial Invariant Augmentation MaximizationUnleashing the Power of
29、Graph Data Augmentation on Covariate Distribution Shift,NeurIPS 20234.1 Adversarial Invariant Augmentation MinimizationUnleashing the Power of Graph Data Augmentation on Covariate Distribution Shift,NeurIPS 20234.1 Adversarial Invariant Augmentation Mask CombinationUnleashing the Power of Graph Data
30、 Augmentation on Covariate Distribution Shift,NeurIPS 20234.1 Adversarial Invariant Augmentation Theoretical DiscussionsConclusion:Our objective optimization can effectively amplify the covariate shift between the training and generated distributions.Unleashing the Power of Graph Data Augmentation o
31、n Covariate Distribution Shift,NeurIPS 20234.1 Adversarial Invariant AugmentationRQ1:Compared to existing efforts,how does AIA perform under covariate shift?Unleashing the Power of Graph Data Augmentation on Covariate Distribution Shift,NeurIPS 20234.1 Adversarial Invariant Augmentation RQ2:Can the
32、proposed AIA achieve the principles of environmental feature discrepancy and stable feature consistency,thereby effectively alleviating the covariate shift?Unleashing the Power of Graph Data Augmentation on Covariate Distribution Shift,NeurIPS 20234.1 Adversarial Invariant Augmentation RQ2:Can the proposed AIA achieve the principles of environmental feature discrepancy and stable feature consistency,thereby effectively alleviating the covariate shift?Unleashing the Power of Graph Data Augmentation on Covariate Distribution Shift,NeurIPS 2023感谢观看