《_必应搜索问答系统在全球化过程中的挑战与方法 (1)&mdash.pdf》由会员分享,可在线阅读,更多相关《_必应搜索问答系统在全球化过程中的挑战与方法 (1)&mdash.pdf(42页珍藏版)》请在三个皮匠报告上搜索。
1、必应搜索问答系统在全球化过程中的挑战与方法公明 博士 微软(亚洲)互联网工程院 研发总监演讲人介绍演讲人介绍3必应搜索问答的应用场景与系统概况搜索问答系统全球化过程中的核心挑战与解决方法核心DL模型的语言扩展(language scaling)搜索问答系统展望必应搜索问答的应用场景与系统概况搜索问答系统全球化过程中的核心挑战与解决方法核心DL模型的语言扩展(language scaling)搜索问答系统展望What is Bing QnA?-What is the classification of jazz music-Which type of drugs treat toothache-
2、Why do dogs howl during day time-How to lose weight without exercise-Can pregnant women eat honey-Is it common for cats to sneeze-price of Toyota Camry-gallbladder symptoms in men-best sleep positions-fastest runners in history-add calendar to computer screen-my indoor cat keeps sneezing-ESPN,Gmail,
3、Yahoo-buy women shoes-amazon promo code 20 off-burberry laptop bag images-BMI calculator-reference,Tom Cruise.xExplicit Question IntentImplicit Question IntentSave users time for diggingBing Question-and-Answering Feature Provide short,direct but precise answer when users search for info.on BingOne
4、of top impact features to bring intelligent searching experience on BingPlaced at the top position of the Bing search result pageBing Question-and-Answering Feature Provide short,direct but precise answer when users search for info.on BingOne of top impact features to bring intelligent searching exp
5、erience on BingPlaced at the top position of the Bing search result pageGeneric Passage(68%)List(20%)Table(2%)What is Bing QnA?Passage QnA(90%)Knowledge-base QnA(10%)Bing QnA Approaches OverviewApproach 1:Knowledge Base QnAApproach 1:Knowledge Base QnAquery understanding structured query(e.g.SPARQL)
6、graph search factLimitation:hard dependencies on QU capability and KB coverageApproach 2:Passage QnAApproach 2:Passage QnAweb search for relevant documents MRC for relevant passage and answer span extractionAdvantages:high coverage&applicable to various question typesBing Passage QnA Ranking Systems
7、 Online Version:real-time Offline Version:batch mode必应搜索问答的应用场景与系统概况搜索问答系统全球化过程中的核心挑战与解决方法核心DL模型的语言扩展(language scaling)搜索问答系统展望Scale Bing QnA to 100+Languages Attract global users and thus grow Bing revenue Provide QnA experience to all Bing users2015en-US2017en-*2019FR/DE201620182020UniversalQnA pl
8、ays vital role in competing with competitors.There are both opportunity and challenges.11Challenges for Universal QnATop Challenge 2Top Challenge 2Lack of language specific training dataTop Challenge 1Top Challenge 1Huge maintenance and refresh cost for lots of models (m Markets*n Tasks)Other Challe
9、nge 3Other Challenge 3Efficient model online servingUniversal Pipeline/Models for all languages/markets Cross-lingual pretrained model+Zero-shot training Collect more training data for few-shot training(augmented data+real labeled data)12Algorithm&Platform Inference OptimizationUniversal QnA Model B
10、uilding Overview13Cross-lingual PretrainingENARESFine-tuning Zero-shot Few-shot Multi-task,Semi-supervised,Adversarial,Task Specific training dataUniversal QnA ModelsParallel CorpusMonolingual CorpusMapping to the same semantic spaceCross-lingual Pretrained LM ModelsSOTA Cross-Lingual Pretrained LM1
11、4mBERTMMLM(Multillingual Maksed Language Modeling)XLMMMLM+TLM(Translation Language Modeling)XLM-RobertAMMLM Unicoder MMLM+TLMCross-lingual Word RecoveryCross-lingual Paraphrase ClassificationCross-lingual Masked Language Model InfoXLM MMLM,TLMXLCO(Cross-lingual contrast learning)Advanced Universal Q
12、nA Model Building Advanced Universal QnA Model Building FrameworkFramework15Domain/Task Adaptation for Query-Passage Relevance16Suchin Gururangan,Ana Marasovic,Swabha Swayamdipta,Kyle Lo,Iz Beltagy,Doug Downey,and Noah A Smith.Dont stop pretraining:Adapt language models to domains and tasks.ACL 2020
13、.Baseline:Zero-shot fine-tune(AUC:76.2)Treatment:+Continual Training(AUC:79.3,+3.1)List QnATable QnASemi-Structured Data(like List,Table)on the Web provides a rich information source for question answering.17Domain/Task Adaptation:Semi-Structured QAChallenges Challenges(Compared with plain text):Com
14、pared with plain text):It contains rich structure relations.PLM mainly targets plain text modeling(such as LM task)18Domain/Task Adaptation:Semi-Structured QADomain/Task Adaptation:Semi-Structured QAGraGraph based S Semi-S Structured LM(LM(GraSSLMGraSSLM)Unified Graph based representation for both L
15、ist and TableNovel Neighbor Prediction Objective(NPO)pretraining taskXingyao Zhang,Linjun Shou,Jian Pei,Ming Gong,Lijie Wen and Daxin Jiang.A Graph Representation of A Graph Representation of Semi-structured Data for Web Question AnsweringSemi-structured Data for Web Question Answering.COLING20.19Be
16、at SOTA table pretrained model TAPAS from Google(ACL 2020)by large margin.Xingyao Zhang,Linjun Shou,Jian Pei,Ming Gong,Lijie Wen and Daxin Jiang.A Graph Representation of A Graph Representation of Semi-structured Data for Web Question AnsweringSemi-structured Data for Web Question Answering.COLING20
17、(in submission)New SOTA20Domain/Task Adaptation:Semi-Structured QAChallenges to build Universal MRC Challenges to build Universal MRC ModelModelNeed precise boundary detection(harder than classification tasks)Lots of direct answers are language/market specific entities or phrases(not easy to transfe
18、r among different languages)21Direct AnswerHighlightingKnowledge Infusion for Multilingual MRCF.Yuan,Linjun Shou,X.Bai,M.Gong,Y.Liang,N.Duan,Y.Fu,and Daxin Jiang.Enhancing Answer Boundary Detection for Multilingual MRCEnhancing Answer Boundary Detection for Multilingual MRC.ACL20.Junhao Liu,Linjun S
19、hou,Jian Pei,Ming Gong,Min Yang,and Daxin Jiang.Cross-lingual Cross-lingual MRC with Language Branch Knowledge DistillationMRC with Language Branch Knowledge Distillation.COLING20.SOTA Results22“Knowledge infusion”-entity,relations,n-grams,constraints with new task Phrase Level MaskingKnowledge Infu
20、sion for Multilingual MRC23Advanced Universal QnA Model Building Advanced Universal QnA Model Building FrameworkFrameworkData Augmentation Approaches OverviewSource training dataSource modelUser feedbackEntity dataUnlabeled target dataTarget modelTarget training dataAuxiliary DataTarget training dat
21、aTarget training dataMachine TranslationSemi-supervised LearningWeakly-supervised LearningData GeneratorData Augmentation:Machine Translation25 F.Yuan,Linjun Shou,X.Bai,M.Gong,Y.Liang,N.Duan,Y.Fu,and Daxin Jiang.Enhancing Answer Boundary Detection for Multilingual MRC.ACL 2020.Qin Libo,Ni Minheng,Zh
22、ang Yue,Che Wanxiang.CoSDA-ML:Multi-Lingual Code-Switching Data Augmentation for Zero-Shot Cross-Lingual NLP.IJCAI 2020.Motivation:Large scale unlabeled data in target languages are relatively easy to obtain.Data Augmentation:Unlabeled Target DataUnlabeled Data in RussianLabeled Data in RussianExamp
23、les Examples Unlabeled Data in Target Unlabeled Data in Target Languages Languages Query Classification/Slot FillingUser Searched Queries in BingQnA Relevance pairs extracted from Bing.Document Classification Documents from the Web,etc.26S.Liang,M.Gong,J.Pei,L.Shou,W.Zuo,X.Zuo,and D.Jiang.Reinforced
24、 Iterative Knowledge Distillation for Cross-Lingual Named Entity Recognition.KDD21.RL Selective KD 27 Our Approach:Reinforced Iterative Knowledge DistillationSelect high quality data from noisy soft labelsData Augmentation:Unlabeled Target DataSOTA SOTA ResultsResults28Data Augmentation:Unlabeled Ta
25、rget Data Our Approach:Reinforced Iterative Knowledge DistillationData Augmentation:User Feedback Data MethodMethodFeedback DataFeedback DataMetrics(AUC/ACC)Metrics(AUC/ACC)Baseline0m71.43/66.77FBQA1.0m75.77/68.77(+4.34/+2.00+4.34/+2.00)4.0m77.93/71.27(+6.50/+4.50+6.50/+4.50)MotivationMotivation Lev
26、erage user feedback(Expert)to generate training data automatically(Low Cost)ChallengeChallenge User clicks may not suggest user satisfactionApproachApproach*User implicit feedback modeling for data auto-labelingWeakly supervised approach for QA model training*Linjun Shou,Shining Bo,Feixiang Cheng,Mi
27、ng Gong and Daxin Jiang.Mining Implicit Relevance Feedback from User Behavior for Web Question Answering.KDD20 Long paper.29Data Augmentation:Generated Data 30Use generation model to synthesize training dataDenoising training approach to overcome noise Yingmei Guo,Linjun Shou,Jian Pei,Ming Gong,Ming
28、xing Xu,Zhiyong Wu and Daxin Jiang.Learning from Multiple Noisy Augmented Data Sets for Better Cross-Lingual Spoken Language Understanding.EMNLP 2021.Data Augmentation:Generated Data 31 Yingmei Guo,Linjun Shou,Jian Pei,Ming Gong,Mingxing Xu,Zhiyong Wu and Daxin Jiang.Learning from Multiple Noisy Aug
29、mented Data Sets for Better Cross-Lingual Spoken Language Understanding.EMNLP 2021.Use generation model to synthesize training dataDenoising training approach to overcome noise 32Advanced Universal QnA Model Building Advanced Universal QnA Model Building FrameworkFrameworkV1Vanilla KDV2Multi-Teacher
30、 KD*V3Reinforced Teacher Selection for KD+Advanced Knowledge Distillation for Online Serving 33TeacherStudentTrainingV1Vanilla KDV2Multi-Teacher KD*V3Reinforced Teacher Selection for KD+Distillation from multiple teacher models Unbiased knowledge distillationDistillation Pretraining Advanced Knowled
31、ge Distillation for Online Serving*Ze Yang,Linjun Shou,Ming Gong,Wutao Lin,Daxin Jiang.Model Compression with Two-stage Multi-teacher Knowledge Distillation for Web Question Answering System.WSDM,2020.34TeacherStudentTrainingV1Vanilla KDV2Multi-Teacher KD*V3Reinforced Teacher Selection for KD+Distil
32、lation from multiple teacher models Unbiased knowledge distillationDistillation Pretraining MethodsMethodsTest 1Test 1Test 2Test 2Test Test 3 3Baseline(V2)87.682.184.6RL-KD(V3)89.2(+1.6+1.6)84.5(+2.4+2.4)85.5(+0.9+0.9)Advanced Knowledge Distillation for Online Serving*Ze Yang,Linjun Shou,Ming Gong,W
33、utao Lin,Daxin Jiang.Model Compression with Two-stage Multi-teacher Knowledge Distillation for Web Question Answering System.WSDM,2020.+Fei Yuan,Linjun Shou,Jian Pei,Wutao Lin,Ming Gong,Daxin Jiang.Reinforced Multi-Teacher Selection for Knowledge Distillation.EMNLP,2020.35TeacherStudentTrainingHow t
34、o serve model on various platforms for different scenarios?Agile Model Online Serving36GPU ClusterCPU ClusterQnA shipped to all Bing markets(100+languages,230+regions)QnA shipped to all Bing markets(100+languages,230+regions)Urdu:linkQuery:?怀?怀?怀?翾?English Translation:What are the benefits of eating
35、 apricots?Greek:linkQuery:English Translation:why the color of the sky is blueRussian:linkQuery:English Translation:Apricot BenefitsRomanianRomanian:linkQuery:primul mprat din ChinaEnglish Translation:the first emperor of chinaTurkish:linkQuery:beyolu gezilecek yerlerEnglish Translation:places to vi
36、sit in BeyogluPolishPolish:linkQuery:10 najlepszych programow antywirusowychEnglish Translation:top 10 Antivirus Programs37必应搜索问答的应用场景与系统概况搜索问答系统全球化过程中的核心挑战与解决方法核心DL模型的语言扩展(language scaling)搜索问答系统展望Future Work for Language Scaling More linguistic features for the cross-lingual model e.g.language fam
37、ily,syntax information Language agnostic+Language specific representations e.g.language specific knowledge infusion Cross-lingual model efficient training Expand 100+languages to 7000+languagesHow to mitigate“catastrophic forgetting”Leverage language family to smartly select data for labeling,using
38、largerlanguages to boost smaller onesDataDataModelModelOur Tutorials for Language Scaling Language Scaling:Applications,Challenges and Approach Link:https:/languagescalingkdd.github.io/Scaling out NLP Applications to 100+Languages Link:https:/languagescaling.github.io/Welcome to more discussions! wechat:yiming1013