skip to main content
10.1145/3447548.3470791acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
abstract

Language Scaling: Applications, Challenges and Approaches

Published:14 August 2021Publication History

ABSTRACT

Language scaling aims to deploy Natural Language Processing (NLP) applications economically across many countries/regions with different languages. Language scaling has been heavily invested by industry since many parties want to deploy their applications/services to global markets. At the same time, scaling out NLP applications to various languages, essentially a data science problem, remains a grand challenge due to the huge differences in the morphology, syntaxes, and pragmatics among different languages. We present a comprehensive survey and tutorial on language scaling. We start with a clear problem description for language scaling and an intuitive discussion on the overall challenges. Then, we outline two major categories of approaches to language scaling, namely, model transfer and data transfer. We present a taxonomy to summarize various methods in literature. A large part of the tutorial is organized to address various types of NLP applications. Finally, we discuss several important challenges in this area and future directions.

References

  1. Zuyi Bao, Rui Huang, C. Li, and Kenny Zhu. 2019. Low-Resource Sequence Labeling via Unsupervised Multilingual Contextualized Representations. ArXiv abs/1910.10893 (2019).Google ScholarGoogle Scholar
  2. Xilun Chen, Ahmed Hassan Awadallah, Hany Hassan, Wei Wang, and Claire Cardie. 2019. Multi-Source Cross-Lingual Model Transfer: Learning What to Share. In ACL.Google ScholarGoogle Scholar
  3. Zewen Chi, Li Dong, Furu Wei, Wenhui Wang, Xian-Ling Mao, and He yan Huang. 2020. Cross-Lingual Natural Language Generation via Pre-Training. ArXiv abs/1909.10481 (2020).Google ScholarGoogle Scholar
  4. Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Shijin Wang, and Guoping Hu. 2019. Cross-Lingual Machine Reading Comprehension. In EMNLP-IJCNLP. 1586--1595.Google ScholarGoogle Scholar
  5. Y. Fang, S. Wang, Z. Gan, S. Sun, and JJ. Liu. 2020. FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding. ArXiv abs/2009.05166 (2020).Google ScholarGoogle Scholar
  6. Roman Grundkiewicz, Marcin Junczys-Dowmunt, and Kenneth Heafield. 2019. Neural Grammatical Error Correction Systems with Unsupervised Pre-training on Synthetic Data. In BEA@ACL.Google ScholarGoogle Scholar
  7. Jiang Guo, Wanxiang Che, David Yarowsky, Haifeng Wang, and Ting Liu. 2015. Cross-lingual dependency parsing based on distributed representations. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 1234--1244.Google ScholarGoogle ScholarCross RefCross Ref
  8. Masahiro Kaneko, Masato Mita, Shun Kiyono, Jun Suzuki, and Kentaro Inui. 2020. Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction. ArXiv abs/2005.00987 (2020).Google ScholarGoogle Scholar
  9. Patrick Lewis, Barlas O?uz, Ruty Rinott, Sebastian Riedel, and Holger Schwenk. 2019. MLQA: Evaluating cross-lingual extractive question answering. arXiv preprint arXiv:1910.07475 (2019).Google ScholarGoogle Scholar
  10. Shining Liang, Linjun Shou, Jian Pei, Ming Gong, WanLi Zuo, and Daxin Jiang. 2020. CalibreNet: Calibration Networks for Multilingual Sequence Labeling. ArXiv abs/2011.05723 (2020).Google ScholarGoogle Scholar
  11. Junhao Liu, Linjun Shou, Jian Pei, Ming Gong, Min Yang, and Daxin Jiang. 2020. Cross-lingual Machine Reading Comprehension with Language Branch Knowledge Distillation. ArXiv abs/2010.14271 (2020).Google ScholarGoogle Scholar
  12. Yinhan Liu and Jiatao Gu et al. 2020. Multilingual Denoising Pre-training for Neural Machine Translation. ArXiv abs/2001.08210 (2020).Google ScholarGoogle Scholar
  13. Zihan Liu, Genta Indra Winata, Zhaojiang Lin, Peng Xu, and Pascale Fung. 2020. Attention-Informed Mixed-Language Training for Zero-shot Cross-lingual Taskoriented Dialogue Systems. ArXiv abs/1911.09273 (2020).Google ScholarGoogle Scholar
  14. Ryan McDonald, Slav Petrov, and Keith B Hall. 2011. Multi-source transfer of delexicalized dependency parsers. (2011).Google ScholarGoogle Scholar
  15. L. Qin, Minheng Ni, Y. Zhang, and W. Che. 2020. CoSDA-ML: Multi-Lingual Code-Switching Data Augmentation for Zero-Shot Cross-Lingual NLP. In IJCAI.Google ScholarGoogle Scholar
  16. Michael Sejr Schlichtkrull and Anders Søgaard. 2017. Cross-lingual dependency parsing with late decoding for truly low-resource languages. arXiv preprint arXiv:1701.01623 (2017).Google ScholarGoogle Scholar
  17. Jörg Tiedemann and Zeljko Agi?. 2016. Synthetic treebanking for cross-lingual dependency parsing. Journal of Artificial Intelligence Research 55 (2016), 209--248.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Yuxuan Wang, Wanxiang Che, Jiang Guo, Yijia Liu, and Ting Liu. 2019. Crosslingual BERT transformation for zero-shot dependency parsing. arXiv preprint arXiv:1909.06775 (2019).Google ScholarGoogle Scholar
  19. Q. Wu, Zijia Lin, Börje F. Karlsson, B. Huang, and Jianguang Lou. 2020. UniTrans : Unifying Model Transfer and Data Transfer for Cross-Lingual Named Entity Recognition with Unlabeled Data. In IJCAI.Google ScholarGoogle Scholar
  20. Q. Wu, Zijia Lin, Börje F. Karlsson, Jian-Guang Lou, and B. Huang. 2020. Single- /Multi-Source Cross-Lingual NER via Teacher-Student Learning on Unlabeled Data in Target Language. ArXiv abs/2004.12440 (2020).Google ScholarGoogle Scholar
  21. Linting Xue and Noah Constant et al. 2020. mT5: A massively multilingual pre-trained text-to-text transformer. ArXiv abs/2010.11934 (2020).Google ScholarGoogle Scholar
  22. Ikumi Yamashita, Satoru Katsumata, Masahiro Kaneko, Aizhan Imankulova, and Mamoru Komachi. 2020. Cross-lingual Transfer Learning for Grammatical Error Correction. In COLING.Google ScholarGoogle Scholar
  23. Z. Yang, R. Salakhutdinov, and William W. Cohen. 2016. Multi-Task Cross-Lingual Sequence Tagging from Scratch. ArXiv abs/1603.06270 (2016).Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Fei Yuan, Linjun Shou, X. Bai, Ming Gong, Yaobo Liang, N. Duan, Y. Fu, and Daxin Jiang. 2020. Enhancing Answer Boundary Detection for Multilingual Machine Reading Comprehension. In ACL.Google ScholarGoogle Scholar
  25. Wei Zhao, Liang Wang, Kewei Shen, Ruoyu Jia, and Jingming Liu. 2019. Improving Grammatical Error Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled Data. In NAACL-HLT.Google ScholarGoogle Scholar
  26. Wangchunshu Zhou, Tao Ge, C. Mu, Ke Xu, Furu Wei, and M. Zhou. 2020. Improving Grammatical Error Correction with Machine Translation Pairs. ArXiv abs/1911.02825 (2020).Google ScholarGoogle Scholar

Index Terms

  1. Language Scaling: Applications, Challenges and Approaches

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
      August 2021
      4259 pages
      ISBN:9781450383325
      DOI:10.1145/3447548

      Copyright © 2021 Owner/Author

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 14 August 2021

      Check for updates

      Qualifiers

      • abstract

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24
    • Article Metrics

      • Downloads (Last 12 months)11
      • Downloads (Last 6 weeks)2

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader