1、Building Very Deep Graph Neural Networks for Representation Learning on GraphsGuohao Li CS PhD Student KAUSTguohao.likaust.edu.saBuilding Very Deep Graph Neural Networks for Representation Learning on GraphsBuilding Very Deep Graph Neural Networks for Representation Learning on Graphs4Discussion:To
2、deep or not to deep2Making GCNs Go as Deep as CNNs:Message Aggregation Functions;Memory Efficiency3Designing GCNs automatically:Sequential Greedy Architecture Search;Latency Constrained;1Skip Connections and Dilated Convolutions on GraphsMaking GCNs Go as Deep as CNNs:Building Very Deep Graph Neural
3、 Networks for Representation Learning on GraphsGeneral Graphs:Social NetworksCitation NetworksLots of real-world applications need to deal with Non-Grid dataDeepGCNs.orgGraph dataBuilding Very Deep Graph Neural Networks for Representation Learning on GraphsGeneral Graphs:Social NetworksCitation Netw
4、orksMoleculesLots of real-world applications need to deal with Non-Grid dataDeepGCNs.orgGraph dataBuilding Very Deep Graph Neural Networks for Representation Learning on GraphsGeneral Graphs:Social NetworksCitation NetworksMoleculesPoint Clouds3D Meshes.Lots of real-world applications need to deal w
5、ith Non-Grid dataDeepGCNs.orgGraph dataBuilding Very Deep Graph Neural Networks for Representation Learning on GraphsKipf,T.N.and Welling,M.,2016.Semi-Supervised Classification with Graph Convolutional Networks.Velikovi,P.,Cucurull,G.,Casanova,A.,Romero,A.,Li,P.and Bengio,Y.,2018.Graph Attention Net
6、works.Wang,Y.,Sun,Y.,Liu,Z.,Sarma,S.E.,Bronstein,M.M.and Solomon,J.M.,2018.Dynamic Graph CNN for Learning on Point Clouds.Hamilton,W.L.,Ying,R.and Leskovec,J.,2017.Inductive Representation Learning on Large Graphs.Most of SOTA GNNs are not deeper than 3 or 4 layers.Building Very Deep Graph Neural Ne
7、tworks for Representation Learning on GraphsWhy GNNs are limited to shallow structures?OverfittingOversmoothingVanishing GradientFigures from https:/ from https:/graphics.stanford.edu/courses/cs468-12-spring/LectureSlides/06_smoothing.pdfFigures from https:/en.wikipedia.org/wiki/OverfittingBuilding
8、Very Deep Graph Neural Networks for Representation Learning on Graphs1Skip Connections and Dilated Convolutions on GraphsMaking GCNs Go as Deep as CNNs:Building Very Deep Graph Neural Networks for Representation Learning on GraphsBuilding Very Deep Graph Neural Networks for Representation Learning o
9、n GraphsTraining Loss of GCNs with varying depthPlainGCNsResGCNsDeeper GCNs dont converge well.Even a 112-layer deep GCN converges well!Building Very Deep Graph Neural Networks for Representation Learning on GraphsDeepGCNs.orgResidual Graph ConnectionsAggregateUpdateSkip connectionAn example:He,Kaim
10、ing,et al.Deep residual learning for image recognition.Proceedings of the IEEE conference on computer vision and pattern recognition.2016.Building Very Deep Graph Neural Networks for Representation Learning on GraphsDeepGCNs.orgDense Graph ConnectionsHuang,Gao,et al.Densely connected convolutional n
11、etworks.Proceedings of the IEEE conference on computer vision and pattern recognition.2017.Building Very Deep Graph Neural Networks for Representation Learning on GraphsDeepGCNs.orgDilated Graph Convolutions666Dilated Convolution on a
12、 regular graph,e.g.2D imageDilated graph Convolution on an irregular graph,e.g.3D point cloudYu,Fisher,and Vladlen Koltun.Multi-scale context aggregation by dilated convolutions.International Conference on Learning Representations.2016.Building Very Deep Graph Neural Networks for Representation Lear
13、ning on GraphsDilated Graph Convolutions=dilation rateDeepGCNs.orgDilated Graph ConvolutionsBuilding Very Deep Graph Neural Networks for Representation Learning on GraphsDeep Graph Convolutional Networks(DeepGCNs)Building Very Deep Graph Neural Networks for Representation Learning on GraphsStanford
14、3D Large-Scale Indoor Spaces Datasethttp:/buildingparser.stanford.edu/dataset.html 700 million pointsNode features:coordinates and colorsNode classification with 13 classesConstruct edges by kNNBuilding Very Deep Graph Neural Networks for Representation Learning on GraphsDeepGCNs.orgGraph Learning o
15、n 3D Point CloudsBuilding Very Deep Graph Neural Networks for Representation Learning on GraphsTable 1.Comparison of ResGCN-28 with state-of-the-art.We outperform other SOTAs in 9 out of 13 classesBuilding Very Deep Graph Neural Networks for Representation Learning on GraphsTable 2.Comparison of Res
16、GCN-28 with DGCNN*(Our shallow baseline model).*We reproduced the results of DGCNN on all classes since the results across all classes were not provided in the DGCNN paper.Consistent improvementsacross all the classes.4%boost in mIOU.Building Very Deep Graph Neural Networks for Representation Learni
17、ng on GraphsDeepGCNs.orgPlainGCN VS.ResGCNBuilding Very Deep Graph Neural Networks for Representation Learning on GraphsDeepGCNs.orgOversmoothing AnalysisTable 3.Analysis of over-smoothing using the Group Distance Ratio(intra group dist./inter group dist.)and the Instance Information Gain(mutual inf
18、ormation between input and final output)Zhou,K.,Huang,X.,Li,Y.,Zha,D.,Chen,R.and Hu,X.Towards deeper graph neural networks with differentiable group normalization.NeurIPS 2020.Building Very Deep Graph Neural Networks for Representation Learning on GraphsDeepGCNs.orgApplication in BiologyTable 4.Node
19、 classification of biological networks.WiderDeeperBy John Morris.Building Very Deep Graph Neural Networks for Representation Learning on GraphsBuilding Very Deep Graph Neural Networks for Representation Learning on Graphs2Making GCNs Go as Deep as CNNs:Message Aggregation Functions;Memory Efficiency
20、Building Very Deep Graph Neural Networks for Representation Learning on GraphsDatasets:Open Graph Benchmark(OGB)OGB datasetsHu,W.,Fey,M.,Zitnik,M.,Dong,Y.,Ren,H.,Liu,B.,Catasta,M.and Leskovec,J.Open graph benchmark:Datasets for machine learning on graphs.NeurIPS 2020.Building Very Deep Graph Neural
21、Networks for Representation Learning on GraphsDeeperGCN-Residual ConnectionsTraining losses of ResGCN+and ResGCN,PlainGCN on ogbn-proteins.Preactivated residual connections work better.NormReLUG-ConvG-ConvNormReLUPost-activated Pre-activated output range of the residual function should to be(,+)ResG
22、CN+ResGCN PlainGCNBuilding Very Deep Graph Neural Networks for Representation Learning on GraphsDeepGCNs.orgAggregation FunctionsAggregation functions perform very differently on different datasets.The same aggregation functionBuilding Very Deep Graph Neural Networks for Representation Learning on G
23、raphsMessage PassingNode FeaturesEdge FeaturesNeighborsFeaturesPermutation Invariant Functione.g.,sum,mean or maxDifferentiable(Learnable)Functione.g.,MLPsDifferentiable(Learnable)Functione.g.,MLPsBy https:/pytorch-geometric.readthedocs.ioDeepGCNs.orgBuilding Very Deep Graph Neural Networks for Repr
24、esentation Learning on GraphsIllustration of Generalized Message Aggregation FunctionsGeneralized mean-max aggregation function:DeepGCNs.orgAggregation FunctionsBuilding Very Deep Graph Neural Networks for Representation Learning on GraphsIllustration of Generalized Message Aggregation FunctionsGene
25、ralized mean-max aggregation function:Generalized mean-max-sum aggregation function:Differentiable aggregation functionsDeepGCNs.orgAggregation FunctionsXu,K.,Hu,W.,Leskovec,J.and Jegelka,S.How powerful are graph neural networks?.ICLR 2018.Building Very Deep Graph Neural Networks for Representation
26、Learning on GraphsDeepGCNs.orgResultsTable 4.DeeperGCN achieves SOTA results on 6 OGB datasets.Building Very Deep Graph Neural Networks for Representation Learning on GraphsDeepGCNs.orgResultsLearning curves of 7-layer DyResGEN with SoftMax_Agg()Building Very Deep Graph Neural Networks for Represent
27、ation Learning on GraphsDeepGCNs.orgDeeperGCN ranked top 1 on several datasets at the time of submission.Results7%7.5%Building Very Deep Graph Neural Networks for Representation Learning on GraphsMemory complexity of training GNNsFull batch:O(LND)L-number of layersN-number of nodes D-number of featu
28、res(assume D is the same for all the layers)-How can we reduce memory complexity?GraphSage,ClusterGCN,so on.Cluster-GCN:O(LND)-O(LBD)B-number of nodes in subgraphs,B O(ND)-Can we reduce the memory complexity in the L dimension?Building Very Deep Graph Neural Networks for Representation Learning on G
29、raphsMemory Efficient GNNsDo not need to store the intermediate node features.O(LND)-O(ND)Forward:Inverse:Reversible GNN:DEQ-GNN:Weight-tied Reversible GNN:Building Very Deep Graph Neural Networks for Representation Learning on GraphsMemory Efficient GNNsForward:Inverse:Reversible GNN:Forward:Invers
30、e:When#group=2:Building Very Deep Graph Neural Networks for Representation Learning on GraphsMemory Efficient GNNsDEQ-GNN:Forward by fix points iterations,backward by implicit differentiationBuilding Very Deep Graph Neural Networks for Representation Learning on GraphsResults:SummaryFig.Performance
31、v.s.GPU memory consumption on the ogbn-proteins dataset for 112 layer deep networks.1.Regular GNNs quickly run out of memory.2.We can train huge overparameterized RevGNNs on a single GPU and achieve the best performance.3.We can train smaller GNNs with weight-tying or DEQ and still reach promising r
32、esults Building Very Deep Graph Neural Networks for Representation Learning on GraphsResults:Complexity AnalysisBuilding Very Deep Graph Neural Networks for Representation Learning on GraphsResults:Constant Memory with RevGNNTrain 1001-layer GNN with only 2.86G peak GPU memory!The deepest GNN by one
33、 order of magnitude.Building Very Deep Graph Neural Networks for Representation Learning on GraphsResults:SOTA with RevGNN(ogbn-proteins)68M parameters(about a half of GPT)Building Very Deep Graph Neural Networks for Representation Learning on GraphsResults:SOTA with RevGNN(ogbn-arxiv)Building Very
34、Deep Graph Neural Networks for Representation Learning on GraphsAblation:Different GNN operators(ogbn-arxiv)RevGNNs are generic and can be applied to different operators.Building Very Deep Graph Neural Networks for Representation Learning on GraphsAblation:Mini-batch Training(ogbn-products)Mini-batc
35、h training further reduces the memory consumption of RevGNN and improves its accuracy.Building Very Deep Graph Neural Networks for Representation Learning on GraphsDeepGCNs.orgOpen SourceAvailable on PyG and DGL 1400 Stars(Pytorch+Tensorflow),800 citations Building Very Deep Graph Neural Networks fo
36、r Representation Learning on Graphs3Designing GCNs automatically:Sequential Greedy Architecture Search;Latency Constrained;Building Very Deep Graph Neural Networks for Representation Learning on GraphsBuilding Very Deep Graph Neural Networks for Representation Learning on GraphsSGAS:Sequential Greed
37、y Architecture SearchFigure 1.Comparison of search-evaluation Kendall coefficients.Architectures with a higher validation accuracy during the search phase may perform worse in the evaluation(see Figure 1).SGAS:Sequential Greedy Architecture SearchBuilding Very Deep Graph Neural Networks for Represen
38、tation Learning on GraphsSGAS:Sequential Greedy Architecture SearchFigure 2.Illustration of Sequential Greedy Architecture Search.Aiming to alleviate this common issue,we introduce sequential greedy architecture search(SGAS),an efficient method for neural architecture search.By dividing the search p
39、rocedure into sub-problems,SGAS chooses and prunes candidate operations in a greedy fashion.SGAS:Sequential Greedy Architecture SearchBuilding Very Deep Graph Neural Networks for Representation Learning on GraphsSGAS:Sequential Greedy Architecture SearchFigure 2.Illustration of Sequential Greedy Arc
40、hitecture Search.SGAS:Sequential Greedy Architecture SearchBuilding Very Deep Graph Neural Networks for Representation Learning on GraphsSGAS:Sequential Greedy Architecture SearchFigure 2.Illustration of Sequential Greedy Architecture Search.1SGAS:Sequential Greedy Architecture SearchBuilding Very D
41、eep Graph Neural Networks for Representation Learning on GraphsSGAS:Sequential Greedy Architecture SearchFigure 2.Illustration of Sequential Greedy Architecture Search.12SGAS:Sequential Greedy Architecture SearchBuilding Very Deep Graph Neural Networks for Representation Learning on GraphsSGAS:Seque
42、ntial Greedy Architecture SearchFigure 2.Illustration of Sequential Greedy Architecture Search.123SGAS:Sequential Greedy Architecture SearchBuilding Very Deep Graph Neural Networks for Representation Learning on GraphsSGAS:Sequential Greedy Architecture SearchFigure 2.Illustration of Sequential Gree
43、dy Architecture Search.123RepeatSGAS:Sequential Greedy Architecture SearchBuilding Very Deep Graph Neural Networks for Representation Learning on GraphsSGAS-Selection CriteriaDeepGCNs.orgTo maintain the optimality,the design of the selection criterion is crucial.Edge Importance:Selection Certainty:S
44、election Stability:Building Very Deep Graph Neural Networks for Representation Learning on GraphsSGAS-Selection CriteriaDeepGCNs.orgCriterion 1:Criterion 2:a standard Min-Max scaling normalizationEdge Importance:Selection Certainty:Selection Stability:Building Very Deep Graph Neural Networks for Rep
45、resentation Learning on GraphsResults SGAS for CNN on CIFAR-10Table 3.Performance comparison with state-of-the-art image classifiers on CIFAR-10.Building Very Deep Graph Neural Networks for Representation Learning on Graphs(a)Normal cell of the best model with SGAS(Cri.1)on CIFAR-10(b)Reduction cell
46、 of the best model with SGAS(Cri.1)on CIFAR-10(c)Normal cell of the best model with SGAS(Cri.2)on CIFAR-10(d)Reduction cell of the best model with SGAS(Cri.2)on CIFAR-10Results SGAS for CNN on CIFAR-10Building Very Deep Graph Neural Networks for Representation Learning on GraphsResults SGAS for CNN
47、on ImageNetTable 4.Performance comparison with state-of-the-art image classifiers on ImageNet.Building Very Deep Graph Neural Networks for Representation Learning on GraphsResults SGAS for CNN on ImageNet(a)Normal cell of the best model with SGAS(Cri.1)on ImageNet(b)Reduction cell of the best model
48、with SGAS(Cri.1)on ImageNet(c)Normal cell of the best model with SGAS(Cri.2)on ImageNet(d)Reduction cell of the best model with SGAS(Cri.2)on ImageNetBuilding Very Deep Graph Neural Networks for Representation Learning on GraphsResults SGAS for GCN on ModelNet(a)Normal cell of the best model with SG
49、AS(Cri.1)on ModelNet(b)Normal cell of the best model with SGAS(Cri.2)on ModelNetTable 1.Comparison with state-of-the-art architectures for 3D object classification on ModelNet40.Building Very Deep Graph Neural Networks for Representation Learning on GraphsResults SGAS for GCN on PPI(a)Normal cell of
50、 the best model with SGAS(Cri.1)on PPI(b)Normal cell of the best model with SGAS(Cri.2)on PPITable 2.Comparison with state-of-the-art architectures for node classification on PPI.Building Very Deep Graph Neural Networks for Representation Learning on GraphsLC-NAS:Latency Constrained Neural Architect
51、ure Search for Point Cloud Networks(arXiv2020,Guohao Li et.al)Latency Constrained NASBuilding Very Deep Graph Neural Networks for Representation Learning on GraphsTable 1.Evaluation Results on ModelNet40.Table 2.Comparison to state-of-the-art methods.Latency Constrained NASBuilding Very Deep Graph N
52、eural Networks for Representation Learning on GraphsBuilding Very Deep Graph Neural Networks for Representation Learning on Graphs4Discussion:To deep or not to deep2Making GCNs Go as Deep as CNNs:Message Aggregation Functions3Designing GCNs automatically:Sequential Greedy Architecture Search;Latency
53、 Constrained;1Skip Connections and Dilated Convolutions on GraphsMaking GCNs Go as Deep as CNNs:Building Very Deep Graph Neural Networks for Representation Learning on GraphsDiscussionsOver-smoothing assumption is too strong(e.g.ignoring weights and activations)Depth and diameter(1001 layers GNN on
54、ogbn-proteins with a graph diameter as 9)Depth and width(compounding scaling rule)Depth and datasets(benefit more on geometric graphs,3D,proteins,molecules but less on citation networks)OOD split on OGB is challenging,need other techniques to help(transfer learning,zero-shot learning)Building Very D
55、eep Graph Neural Networks for Representation Learning on GraphsAcknowledgementMatthias MllerAli ThabetGuocheng Qian Itzel C.DelgadilloAbdulellah AbualshourBernard GhanemChenxin XiongVladlen koltunSilvio GiancolaNeil SmithJesus ZarzarMengmeng XuKezhi KongGuohao Li CS PhD Student KAUSTguohao.likaust.edu.saThanksBuilding Very Deep Graph Neural Networks for Representation Learning on Graphs