1-5 分布外鲁棒图学习的一些新进展.pdf

编号：102377

PDF 45页 8.63MB 下载积分：VIP专享

下载报告请您先登录！

1-5 分布外鲁棒图学习的一些新进展.pdf

1、SOME ADVANCES IN OUT-OF-DISTRIBUTION GRAPH LEARNINGYatao Bianhttps:/ AI Lab|01DrugOOD:A testbed for graph OOD learning02Subgraph based invariant graph learningCONTENT|DrugOOD:Background01|Drug Discovery is a Long and Experience Process|It takes more than 10 years and$1B to develop a new drugGaudelet

2、,T.,Day,B.,Jamasb,A.R.,Soman,J.,Regep,C.,Liu,G.,.&Taylor-King,J.P.(2021).Utilizing graph machine learning within drug discovery and development.Briefings in bioinformatics,22(6),bbab159Figure from Gaudelet et al.Big Opportunity for Artificial Intelligence|q A massive of data has been generated in th

3、e biomedical domainq Many data are Graph-StructuredGaudelet,T.,Day,B.,Jamasb,A.R.,Soman,J.,Regep,C.,Liu,G.,.&Taylor-King,J.P.(2021).Utilizing graph machine learning within drug discovery and development.Briefings in bioinformatics,22(6),bbab159ChEMBL Dataset(Figure from ChEMBLs homepage.)Illustratio

4、n of a molecular and protein,as well as their graph representation.(Figure from Gaudelet et al.)Big Opportunity for Artificial Intelligence|p A lots of AI techniques have been adopted in Drug Discovery(Drug AI)1 https:/zitniklab.hms.harvard.edu/drugml/Applications of machine learning to drug discove

5、ry and development(Figure from 1)Evaluating Drug AI algorithms|p Several benchmarks have been proposed to bridge the gap between the ML community and real-world drug discovery.p TDC:Therapeutics Data Algorithm Developmentp FS-Mol:few shot learning for moleculesp 1 Stanley,Megan,et al.Fs-mol:A few-sh

6、ot learning dataset of molecules.Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track(Round 2).2021.2 https:/tdcommons.ai/Statistics of FS-Mol(Table from 1)Illustration of TDC dataset(Figure from 2)Evaluating Drug AI algorithms:Issues|p Providing fixed datas

7、ets,cannot keep up-to-date with the depository websites Evaluating Drug AI algorithms:Issues|p Overlook the real-world presence of distribution shift problempUnrealistic for real-world settingpPerformance are over-optimistic under conventional splits.TrainingTestingThe training distribution differs

8、from the test distribution,always caused serious performance degradation.Evaluating Drug AI algorithms:Issues|p Overlook annotations of the real-world presence of noise:pMeasurement typepConfidence levelpConfidence Score:0.79Confidence Score:0.41Label:activeLabel:activeData contains non-negligible n

9、oiseDrugOOD Dataset Curator and Benchmark|p A systematic OOD dataset curator and benchmark for AI-aided drug discovery which comes with an open-source Python package that fully automates the data curation process and OOD benchmarking process.p In contrast to only providing fixed datasets,DrugOOD off

10、ers automated dataset curator with:p user-friendly customization scripts,rich domain annotations aligned with biochemistry knowledge,realistic noise annotations and rigorous benchmarking of SOTA OOD algorithms.DrugOOD:Details01|DrugOOD：Overview of Dataset Curator|p Automated OOD Dataset Curator with

11、 Real-world Domain and Noise Annotationsp Five domain definitions(scaffold,assay,molecule size,protein,protein family)reflect the real distribution offset scenarios.Three noise levels(core,refined,general)can anchor different noise levelsDrugOOD：Noise|p Noisy annotations(Core,Refined,General)The fil

12、ter configurations for three noise levelsStatistical information of some datasets p Confidence scorep Value relationp etcDrugOOD:Domains|p Domain definition and Split(assay,scaffold,size,protein,protein family)DrugOOD:Customization|p Automated OOD Dataset Curatorp Fully customizable for users.p 96 r

13、ealized datasets are provided Curation configuration example DrugOOD:Benchmarking|p Rigorous OOD benchmarkingp Six SOTA OOD algorithms with various backbonesDrugOOD:Benchmarks|The benchmark tests revealed that the in-distribution out-of-distribution(ID-OOD)classification performance(AUC score)on Dru

14、gOOD datasets by more than 20%,verifying the authenticity and challenge of the domain definition and noise calibration methods in this dataset.Table 6:The in-distribution(ID)vs out-of-distribution(OOD)of datasets with measurement type of IC50 trained with ERM.We adopt the AUROC to estimate model per

15、formance;the higher score is better.All datasets show performance drops due to distribution shift,with substantially better ID performance than OOD performance.DrugOOD:Algorithm Configuration|p Rigorous OOD benchmarkingp Six SOTA OOD algorithms with various backbonesThe out-of-distribution(OOD)perfo

16、rmance of baseline models trained with different OOD algorithms on the DrugOOD-lbap-ic50 dataset.Algorithm configuration example DrugOOD Dataset and Benchmark:Summary|p Automated dataset curator:fully customizable pipeline for curating OOD datasets for AI-aided drug discovery from the large-scale bi

17、oassay deposition website ChEMBL.p Rich domain annotations:various approaches to generate specific domains that are aligned with the domain knowledge of biochemistry.p Realistic noise annotations:annotate real-world noise according to the measurement confidence score,“cut-off”noise,etc.,offering a v

18、aluable testbed for learning under real-world noise.p Rigorous OOD benchmarking:benchmark six SOTA OOD algorithms with various backbones for the 96 realized dataset instances and gain insight into OOD learning under noise for AI-aided drug discovery.DrugOOD Dataset and Benchmark|p Paper:https:/arxiv

19、.org/pdf/2201.09637.pdfp Code:https:/ Project:https:/drugood.github.io/iDrug-AI driven drug discovery platform|https:/ based Invariant Graph Learning02|Predictive Subgraph Is Important for Understanding Graph OOD Learning|JNK3&GSK3 activeJNK3 activeGSK3 active=+(Jin et.al.,2020)(Duvenaud et.al.,2015

20、)|Recognizing Predictive Substructures with Subgraph Information Bottleneckmin!(,)max!(,)p We leverage the idea from Information Bottleneckmax!#,#$(,#$)p Maximize mutual information between the label and the subgraph,#$p Minimize mutual information between the graph and the subgraph,(,#$)With Junchi

21、 Yu et al.Graph information bottleneck for subgraph recognition.ICLR 2021With Junchi Yu et al.Graph information bottleneck for subgraph recognition.ICLR 2021|min!(,)max!(,)max!#,#$(,#$)Maximize mutual information between the label and the subgraph,#$,#$%&(%&)$#$%=:*+()$#$,-)With Junchi Yu et al.Grap

22、h information bottleneck for subgraph recognition.ICLR 2021With Junchi Yu et al.Graph information bottleneck for subgraph recognition.ICLR 2021Mutual Information between Label and Subgraph|Mutual Information between Label and Subgraph!#1,00,1.0,10,10,1MLPGCN“Bottleneck”SelectAggregatemax!(,)|p Minim

23、ize mutual information between the graph and the subgraph,(,#$)p DONSKER-VARADHAN(Donsker&Varadhan,1983)representation of the KL-divergence(Sampling)With Junchi Yu et al.Graph information bottleneck for subgraph recognition.ICLR 2021With Junchi Yu et al.Graph information bottleneck for subgraph reco

24、gnition.ICLR 2021Mutual Information between Graph and Subgraphmax!#,#$(,#$)&():statistics network|MLPGCN“Bottleneck”AggregateT-stepinner optimization%&(%&)&(,#$%)log%&(%,/0&1&(!%,!#()!()From the same graph in batchFrom the different graph in batch!$():statistics network|Bi-level Optimization Schemem

25、in%!#,$()*,+,)=./($()*,01+23,()*.,=argmax%23(,()*)min!(,)min!(,)%$(,),inner loop|Graph Information Bottleneck:Overall FrameworkMLP!#1,00,1.0,10,10,1MLPGCN“Bottleneck”SelectAggregateAggregateT-stepinner optimizationExperiments|i).Improvement of graph classification.ii).Graph interpretation.iii).Graph

26、 denoising.GSAT:Interpretable and Generalizable Graph Learning via Stochastic Attention|GIB needs to inject sparsity or connectivity constraints which may impose bias.Miao et.al.,2022 propose GSAT to inject stochasticity when handling .(,)*+)GIB objective:Miao,Siqi,Miaoyuan Liu,and Pan Li.Interpreta

27、ble and Generalizable Graph Learning via Stochastic Attention Mechanism.ICML 2022Interpretable and Generalizable Graph Learning via Stochastic Attention|Stochastic attention:GIB objective:Interpretable and Generalizable Graph Learning via Stochastic Attention|GIB is able to handle certain distributi

28、on shiftsCausal illustration of the OOD generalization ability of GSAT Interpretable and Generalizable Graph Learning via Stochastic Attention|Finding key subgraphs in Spurious-MotifInterpretable and Generalizable Graph Learning via Stochastic Attention|Finding key subgraphs in Mutag via GSATDIR:Dis

29、covering Invariant Rationale for Graph Neural Networks|Wu et.al.,2022 generalize the idea of Invariant Rationalization(Chang et.al.,2020)to discover an invariant subgraph for OOD generalization.Causal illustration of discovering invariant rationale(DIR)Wu,Ying-Xin,Xiang Wang,An Zhang,Xiangnan He,and

30、 Tat-Seng Chua.Discovering invariant rationales for graph neural networks.ICLR 2022DIR Framework|DIR Principle:GREA:Graph Rationalization with Environment-based Augmentations|Liu et.al.,2022 propose a new data augmentation strategy(GREA)to improve the rationalization process.Liu,Gang,Tong Zhao,Jiaxin Xu,Tengfei Luo,and Meng Jiang.Graph Rationalization with Environment-based Augmentations.KDD 2022|Tencent Trustworthy AI Team:可信AI组|研究蓝图|可信AI组:未来规划01DrugOOD:A testbed for graph OOD learning02Subgraph based invariant graph learningSummary|非常感谢您的观看|

友情提示

1、下载报告失败解决办法
2、PDF文件下载后，可能会被浏览器默认打开，此种情况可以点击浏览器菜单，保存网页到桌面，就可以正常下载了。
3、本站不支持迅雷下载，请使用电脑自带的IE浏览器，或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩，下载后原文更清晰。

本文（1-5 分布外鲁棒图学习的一些新进展.pdf）为本站（云闲）主动上传，三个皮匠报告文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若此文所含内容侵犯了您的版权或隐私，请立即通知三个皮匠报告文库（点击联系客服），我们立即给予删除！

温馨提示：如果因为网速或其他原因下载失败请重新下载，重复下载不扣分。