Incident-Driven Machine Translation and Name Tagging for Low-resource Languages

Hermjakob, Ulf; Li, Qiang; Marcu, Daniel; May, Jonathan; Mielke, Sebastian J.; Pourdamghani, Nima; Pust, Michael; Shi, Xing; Knight, Kevin; Levinboim, Tomer; Murray, Kenton; Chiang, David; Zhang, Boliang; Pan, Xiaoman; Lu, Di; Lin, Ying; Ji, Heng

doi:10.1007/s10590-017-9207-1

Incident-Driven Machine Translation and Name Tagging for Low-resource Languages

Published: 23 October 2017

Volume 32, pages 59–89, (2018)
Cite this article

Machine Translation

Ulf Hermjakob¹,
Qiang Li¹,
Daniel Marcu¹,
Jonathan May¹,
Sebastian J. Mielke¹,
Nima Pourdamghani¹,
Michael Pust¹,
Xing Shi¹,
Kevin Knight¹,
Tomer Levinboim²,
Kenton Murray²,
David Chiang²,
Boliang Zhang³,
Xiaoman Pan³,
Di Lu³,
Ying Lin³ &
…
Heng Ji³

780 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

We describe novel approaches to tackling the problem of natural language processing for low-resource languages. The approaches are embodied in systems for name tagging and machine translation (MT) that we constructed to participate in the NIST LoReHLT evaluation in 2016. Our methods include universal tools, rapid resource and knowledge acquisition, rapid language projection, and joint methods for MT and name tagging.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive bibliometric and content analysis of artificial intelligence in language learning: tracing between the years 2017 and 2023

Article Open access 01 April 2024

Abdur Rahman, Antony Raj, … Mohamed Sahul Hameed

Artificial intelligence with American values and Chinese characteristics: a comparative analysis of American and Chinese governmental AI policies

Article Open access 25 June 2022

Emmie Hine & Luciano Floridi

Evaluating the Feasibility of ChatGPT in Healthcare: An Analysis of Multiple Clinical and Research Scenarios

Article Open access 04 March 2023

Marco Cascella, Jonathan Montomoli, … Elena Bignami

Notes

References

Alvarez A, Levin L, Frederking R, Good J, Peterson E (2005) Semi-automated elicitation corpus generation. In: Proceedings of MT Summit X
Baldwin T, Pool J, Colowick S (2010) PanLex and LEXTRACT: translating all words of all languages of the world. In: Proceedings of the 23rd international conference on computational linguistics
Bond F, Paik K (2012) A survey of Wordnets and their licenses. In: Proceedings of the 6th global WordNet conference
Chiu JP, Nichols E (2016) Named entity recognition with bidirectional LSTM-CNNs. Trans ACI. arXiv:1511.08308
Creutz M, Lagus K (2005) Unsupervised morpheme segmentation and morphology induction from text corpora using Morfessor 1.0. Helsinki University of Technology, Helsinki
Google Scholar
Dryer MS, Haspelmath M (eds) (2013) WALS Online
Engesath T, Yakup M, Dwyer A (2009) Greetings from the Teklimakan: a handbook of modern Uyghur. University of Kansas Scholarworks, Lawrence
Google Scholar
Ge T, Dou Q, Pan X, Ji H, Cui L, Chang B, Sui Z, Zhou M (2015) Aligning coordinated text streams through burst information network construction and decipherment. In: arXiv preprint arXiv:1609.08237
Graves A, Jaitly N, Mohamed Ar (2013) Hybrid speech recognition with deep bidirectional LSTM. In: 2013 IEEE workshop on automatic speech recognition and understanding (ASRU). IEEE, pp 273–278
Grishman R, Sundheim B (1996) Message understanding conference-6: a brief history. In: Proceedings of COLING
Heafield K, Lavie A (2010) Combining machine translation output with open source: the Carnegie Mellon multi-engine machine translation scheme. Prague Bull. Math. Linguist. 93:27–36
Article Google Scholar
Ji H (2009) Mining name translations from comparable corpora by creating bilingual information networks. In: Proceedings of ACL-IJCNLP workshop on building and using comparable corpora
Ji H, Grishman R (2007) Collaborative entity extraction and translation. In: Proceedings of international conference on recent advances in natural language processing
Ji H, Grishman R (2011) Knowledge base population: Successful approaches and challenges. In: Proceedings of ACL
Jiampojamarn S, Bhargava A, Dou Q, Dwyer K, Kondrak G (2009) Directl: A language-independent approach to transliteration. In: Proceedings of named entities workshop
Kamholz D, Pool J, Colowick S (2014) Panlex: building a resource for panlingual lexical translation. In: Proceedings of the ninth international conference on language resources and evaluation
Lample G, Ballesteros M, Kawakami K, Subramanian S, Dyer C (2016) Neural architectures for named entity recognition. In: Proceedings the 2016 conference of the North American chapter of the association for computational linguistics—human language technologies (NAACL-HLT 2016)
Liang P, Taskar B, Klein D (2006) Alignment by agreement. In: Proceedings of NAACL, pp 104–111
Lin Y, Pan X, Deri A, Ji H, Knight K (2016) Leveraging entity linking and related language projection to improve name transliteration. In: Proceedings of ACL workshop on named entities
Lu D, Pan X, Pourdamghani N, Chang SF, Ji H, Knight K (2016) A multi-media approach to cross-lingual entity knowledge transfer. In: Proceedings of ACI
de Melo G (2014) Etymological wordnet: tracing the history of words. In: Proceeddings of the conference on language resources
de Melo G, Weikum G (2009) Towards a universal Wordnet by learning from combined evidence. In: Proceedings of The conference on information and knowledge management
de Melo G, Weikum G (2010) Towards universal multilingual knowledge bases. In: Proceedings of the 5th global Wordnet conference
NIST (2005) http://www.itl.nist.gov/iad/mig/tests/ace/2005/
Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of ACL
Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1):19–51
Article MATH Google Scholar
Pan X, Cassidy T, Hermjakob U, Ji H, Knight K (2015) Unsupervised entity linking with abstract meaning representation. In: Proceedings of NAACL-HLT
Pan X, Zhang B, May J, Nothman J, Knight K, Ji H (2017) Cross-lingual name tagging and linking for 282 languages. In: Proceedings of ACL
Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics, pp 311–318
Pourdamghani N, Knight K (2017) Deciphering related languages. In: Proceedings of EMNLP
Probst K, Brown RD, Carbonell JG, Lavie A, Levin L (2001) Design and implementation of controlled elicitation for machine translation of low-density languages. In: Machine Translation Summit VIII
Searle JR (1980) Minds, brains, and programs. Behav Brain Sci 3(03):417–424
Article Google Scholar
Tiimiir H, Lee A (2003) Modern Uyghur grammar (morphology). Yildiz, Istanbul
Google Scholar
Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the conference on computer vision and pattern recognition
Yu D, Pan X, Zhang B, Huang L, Lu D, Whitehead S, Ji H (2016) RPI_BLENDER TAC-KBP2016 system description. In: Proceedings of text analysis conference (TAC2016)
Zakir H (2010) Introduction to modern Uighur. H. Zakir, New York
Google Scholar
Zhang B, Pan X, Wang T, Vaswani A, Ji H, Knight K, Marcu D (2016) Name tagging for low-resource incident languages based on expectation-driven learning. In: Proceedings of NAACL-HLT

Download references

Acknowledgements

We would like to thank other ELISA team members who contributed to resource construction and system preparation before the evaluation: Chris Callison-Burch (UPenn), Aliya Deri (USC) and Ashish Vaswani (Google). We thank Billy Wagner from Next Century for running the LTDE to produce name tagging runs. This work was supported by the U.S. Defense Advanced Research Projects Agency (DARPA) LORELEI Program No. HR0011-15-C-0115 and ARL/ARO MURI W911NF-10-1-0533. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.

Author information

Authors and Affiliations

Information Sciences Institute, University of Southern California, Marina del Rey, CA, 90292, USA
Ulf Hermjakob, Qiang Li, Daniel Marcu, Jonathan May, Sebastian J. Mielke, Nima Pourdamghani, Michael Pust, Xing Shi & Kevin Knight
Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA
Tomer Levinboim, Kenton Murray & David Chiang
Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY, 12180, USA
Boliang Zhang, Xiaoman Pan, Di Lu, Ying Lin & Heng Ji

Authors

Ulf Hermjakob
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Li
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Marcu
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan May
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian J. Mielke
View author publications
You can also search for this author in PubMed Google Scholar
Nima Pourdamghani
View author publications
You can also search for this author in PubMed Google Scholar
Michael Pust
View author publications
You can also search for this author in PubMed Google Scholar
Xing Shi
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Knight
View author publications
You can also search for this author in PubMed Google Scholar
Tomer Levinboim
View author publications
You can also search for this author in PubMed Google Scholar
Kenton Murray
View author publications
You can also search for this author in PubMed Google Scholar
David Chiang
View author publications
You can also search for this author in PubMed Google Scholar
Boliang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoman Pan
View author publications
You can also search for this author in PubMed Google Scholar
Di Lu
View author publications
You can also search for this author in PubMed Google Scholar
Ying Lin
View author publications
You can also search for this author in PubMed Google Scholar
Heng Ji
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kevin Knight.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hermjakob, U., Li, Q., Marcu, D. et al. Incident-Driven Machine Translation and Name Tagging for Low-resource Languages. Machine Translation 32, 59–89 (2018). https://doi.org/10.1007/s10590-017-9207-1

Download citation

Received: 02 June 2017
Accepted: 06 October 2017
Published: 23 October 2017
Issue Date: June 2018
DOI: https://doi.org/10.1007/s10590-017-9207-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Incident-Driven Machine Translation and Name Tagging for Low-resource Languages

Abstract

Access this article

Similar content being viewed by others

A comprehensive bibliometric and content analysis of artificial intelligence in language learning: tracing between the years 2017 and 2023

Artificial intelligence with American values and Chinese characteristics: a comparative analysis of American and Chinese governmental AI policies

Evaluating the Feasibility of ChatGPT in Healthcare: An Analysis of Multiple Clinical and Research Scenarios

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Incident-Driven Machine Translation and Name Tagging for Low-resource Languages

Abstract

Access this article

Similar content being viewed by others

A comprehensive bibliometric and content analysis of artificial intelligence in language learning: tracing between the years 2017 and 2023

Artificial intelligence with American values and Chinese characteristics: a comparative analysis of American and Chinese governmental AI policies

Evaluating the Feasibility of ChatGPT in Healthcare: An Analysis of Multiple Clinical and Research Scenarios

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation