《2020年终大会-自然语言处理:6-2.pdf》由会员分享,可在线阅读,更多相关《2020年终大会-自然语言处理:6-2.pdf(34页珍藏版)》请在三个皮匠报告上搜索。
1、阿里多语言翻译模型的前沿探索 以及技术实践 Machine Translation for 45582 Language Pairs 张志锐 阿里巴巴达摩院 MNMT: Multilingual Neural Machine Translation - Motivation, definition and challenges Advanced Research - Interlanguage based new architecture - Iterative repairing based data augmentation - Integration of pre-training mo
2、dels - Speedup strategies MNMT Applications at Alibaba Outline Motivation Mission: To break language barrier for worldwide communication! Some facts of world languages (from Wikipedia) There are over 6,000 languages spoken in the world today. A few hundred languages are recognized as being in use on
3、 the Internet. 200 languages can cover over 80% of the world in their native tongue. 200 languages can communicate with 99.5% of the population in the US. If N=214; then N(N-1)=45,582 Translate between 214 languages Data preparing, training and maintaining 45,582 MT models are too expensive! From http:/ Multilingual NMT Proposal: Multilingual NMT One model translates all language pairs EN FR ES DE