The ARIEL-CMU situation frame detection pipeline for LoReHLT16: a model translation approach

Littell, Patrick; Tian, Tian; Xu, Ruochen; Sheikh, Zaid; Mortensen, David; Levin, Lori; Tyers, Francis; Hayashi, Hiroaki; Horwood, Graham; Sloto, Steve; Tagtow, Emily; Black, Alan; Yang, Yiming; Mitamura, Teruko; Hovy, Eduard

doi:10.1007/s10590-017-9205-3

The ARIEL-CMU situation frame detection pipeline for LoReHLT16: a model translation approach

Published: 27 October 2017

Volume 32, pages 105–126, (2018)
Cite this article

Machine Translation

Patrick Littell ORCID: orcid.org/0000-0002-7173-0225¹,
Tian Tian¹,
Ruochen Xu¹,
Zaid Sheikh¹,
David Mortensen¹,
Lori Levin¹,
Francis Tyers²,
Hiroaki Hayashi¹,
Graham Horwood³,
Steve Sloto¹,
Emily Tagtow¹,
Alan Black¹,
Yiming Yang¹,
Teruko Mitamura¹ &
…
Eduard Hovy¹

423 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

The LoReHLT16 evaluation challenged participants to extract Situation Frames (SFs)—structured descriptions of humanitarian need situations—from monolingual Uyghur text. The ARIEL-CMU SF detector combines two classification paradigms, a manually curated keyword-spotting system and a machine learning classifier. These were applied by translating the models on a per-feature basis, rather than translating the input text. The resulting combined model provides the accuracy of human insight with the generality of machine learning, and is relatively tractable to human analysis and error correction. Other factors contributing to success were automatic dictionary creation, the use of phonetic transcription, detailed, hand-written morphological analysis, and naturalistic glossing for error analysis by humans. The ARIEL-CMU SF pipeline produced the top-scoring LoReHLT16 situation frame detection systems for the metrics SFType, SFType+Place+Need, SFType+Place+Relief, and SFType+Place+Urgency, at each of the three checkpoints.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recent advances in Apertium, a free/open-source rule-based machine translation platform for low-resource languages

Article Open access 18 October 2021

Tanmai Khanna, Jonathan N. Washington, … Hèctor Alòs i Font

Overview of JOKER – CLEF-2023 Track on Automatic Wordplay Analysis

Fully automatic multi-language translation with a catalogue of phrases: successful employment for the Swiss avalanche bulletin

Article 14 September 2015

Kurt Winkler & Tobias Kuhn

Notes

www.nist.gov/multimodal-information-group/lorehlt-2016-evaluations.
It should be noted, however, that it does not vary monotonically with precision (or recall or F1), as the SFE and precision values in Tables 5 and 6 will show.
http://cc-cedict.org.
http://reliefweb.int.
http://crisis.net.
www.ushahidi.com.
www.opensource.gov.
www.crowdflower.com.
http://github.com/dmort27/epitran.
http://svn.code.sf.net/p/apertium/svn/incubator/apertium-uig.
This is thus not lemmatization per se—the lemma of all of these is qatar, with -liq being a suffix—but rather an attempt to find the most appropriate corresponding word in the lexicons, whether it is a lemma or not.
http://cldr.unicode.org.
http://code.google.com/archive/p/word2vec/.
Compared to our SFType detection systems, the features in our English Status-detection decision trees focused comparatively more on functional words (e.g., words more often indicative of tense, aspect, or modality) than content words. We did not believe these words would translate well using a lexical feature-translation approach, so we did not submit any of these results as part of a primary submission.
The error correction was performed on both models, but in the keyword model it was more straightforward to fix (i.e., by simply removing the keyword) and to know that the fix had worked.

References

Baker M (1985) The mirror principle and morphosyntactic explanation. Linguistic Inquiry 16:373–415
Google Scholar
Beesley KR, Karttunen L (2003) Finite state morphology. CSLI Publications, Stanford
Google Scholar
Bharadwaj A, Mortensen D, Dyer C, Carbonell J (2016) Phonologically aware neural model for named entity recognition in low resource transfer settings. In: Proceedings of the 2016 conference on empirical methods in natural language processing. Association for Computational Linguistics, Austin, Texas, pp 1462–1472
Brown PE, Pietra VJD, Pietra SAD, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(1):263–312
Dyer C, Chahuneau V, Smith NA (2013) A simple, fast, and effective reparameterization of IBM Model 2. In: Proceedings of the 2013 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Atlanta, Georgia, pp 644–648
Forcada ML, Ginestí-Rosell M, Nordfalk J, O’Regan J, Ortiz-Rojas S, Pérez-Ortiz JA, Sánchez-Martínez F, Ramírez-Sánchez G, Tyers FM (2011) Apertium: a free/open-source platform for rule-based machine translation. Mach Transl 25(2):127–144
Article Google Scholar
Frost R, Launchbury J (1989) Constructing natural language interpreters in a lazy functional language. Comput J 32:108–121
Article Google Scholar
Hutton G (1992) Higher-order functions for parsing. J Funct Progr 2:323–343
Article MathSciNet MATH Google Scholar
Hutton G, Meijer E (1988) Monadic parser combinators. J Funct Progr 8:437–444
Article MATH Google Scholar
Lewis MP, Simons GF, Fennig CD (2015) Ethnologue: languages of the world, 18th edn. SIL International, Dallas, Texas
Linden K, Silfverberg M, Axelson E, Hardwick S, Pirinen T (2011) HFST-framework for compiling and applying morphologies. Commun Comput Inf Sci 100:67–85
Google Scholar
Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. CoRR abs/1301.3781
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013b) Distributed representations of words and phrases and their compositionality. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems, vol 26. Curran Associates, Inc., pp 3111–3119
Olteanu A, Castillo C, Diaz F, Vieweg S (2014) Crisislex: a lexicon for collecting and filtering microblogged communications in crises. In: Proceedings of the AAAI conference on weblogs and social media (ICWSM’14), Ann Arbor, MI, USA
Renduchintala A, Knowles R, Koehn P, Eisner J (2016) Creating interactive macaronic interfaces for language learning. In: Proceedings of ACL-2016 System Demonstrations, Association for Computational Linguistics, Berlin, Germany, pp 133–138
Stenetorp P, Pyysalo S, Topić G, Ohta T, Ananiadou S, Tsujii J (2012) brat: a web-based tool for NLP-assisted text annotation. In: Proceedings of the demonstrations at the 13th conference of the European chapter of the Association for Computational Linguistics, Association for Computational Linguistics, Avignon, France, pp 102–107
Strassel S, Tracey J (2014) LORELEI language packs: data, tools, and resources for technology development in low resource languages. In: LREC 2016: 10th edition of the language resources and evaluation conference, Portoroz, pp 3273–3280
Strassel S, Bies A, Tracey J (2017) Situational awareness for low resource languages: the LORELEI situation frame annotation task. In: SMERP2017: first international workshop on exploitation of social media for emergency relief and preparedness, Aberdeen
Temnikova I, Castillo C, Vieweg S (2015) Emterms 1.0: a terminological resource for crisis tweets. In: Proceedings of the international conference on information systems for crisis response and management (ISCRAM’15), Kristiansand, Norway
Washington JN, Ipasov IS, Tyers FM (2014) Finite-state morphological transducers for three Kypchak languages. In: Proceedings of the 9th conference on language resources and evaluation, LREC2014
Xu R, Yang Y, Liu H, Hsi A (2016) Cross-lingual text classification via model translation with limited dictionaries. In: Proceedings of the 25th ACM international on conference on information and knowledge management, ACM, pp 95–104

Download references

Acknowledgements

This project was sponsored by the Defense Advanced Research Projects Agency (DARPA) Information Innovation Office (I2O), program: Low Resource Languages for Emergent Incidents (LORELEI), issued by DARPA/I2O under Contract No. HR0011-15-C-0114.

Author information

Authors and Affiliations

Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, USA
Patrick Littell, Tian Tian, Ruochen Xu, Zaid Sheikh, David Mortensen, Lori Levin, Hiroaki Hayashi, Steve Sloto, Emily Tagtow, Alan Black, Yiming Yang, Teruko Mitamura & Eduard Hovy
School of Linguistics, National Research University<<Higher School of Economics>>, Moscow, Russia
Francis Tyers
Leidos, Inc., Reston, VA, USA
Graham Horwood

Authors

Patrick Littell
View author publications
You can also search for this author in PubMed Google Scholar
Tian Tian
View author publications
You can also search for this author in PubMed Google Scholar
Ruochen Xu
View author publications
You can also search for this author in PubMed Google Scholar
Zaid Sheikh
View author publications
You can also search for this author in PubMed Google Scholar
David Mortensen
View author publications
You can also search for this author in PubMed Google Scholar
Lori Levin
View author publications
You can also search for this author in PubMed Google Scholar
Francis Tyers
View author publications
You can also search for this author in PubMed Google Scholar
Hiroaki Hayashi
View author publications
You can also search for this author in PubMed Google Scholar
Graham Horwood
View author publications
You can also search for this author in PubMed Google Scholar
Steve Sloto
View author publications
You can also search for this author in PubMed Google Scholar
Emily Tagtow
View author publications
You can also search for this author in PubMed Google Scholar
Alan Black
View author publications
You can also search for this author in PubMed Google Scholar
Yiming Yang
View author publications
You can also search for this author in PubMed Google Scholar
Teruko Mitamura
View author publications
You can also search for this author in PubMed Google Scholar
Eduard Hovy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Patrick Littell.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Littell, P., Tian, T., Xu, R. et al. The ARIEL-CMU situation frame detection pipeline for LoReHLT16: a model translation approach. Machine Translation 32, 105–126 (2018). https://doi.org/10.1007/s10590-017-9205-3

Download citation

Received: 31 May 2017
Accepted: 03 October 2017
Published: 27 October 2017
Issue Date: June 2018
DOI: https://doi.org/10.1007/s10590-017-9205-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The ARIEL-CMU situation frame detection pipeline for LoReHLT16: a model translation approach

Abstract

Access this article

Similar content being viewed by others

Recent advances in Apertium, a free/open-source rule-based machine translation platform for low-resource languages

Overview of JOKER – CLEF-2023 Track on Automatic Wordplay Analysis

Fully automatic multi-language translation with a catalogue of phrases: successful employment for the Swiss avalanche bulletin

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The ARIEL-CMU situation frame detection pipeline for LoReHLT16: a model translation approach

Abstract

Access this article

Similar content being viewed by others

Recent advances in Apertium, a free/open-source rule-based machine translation platform for low-resource languages

Overview of JOKER – CLEF-2023 Track on Automatic Wordplay Analysis

Fully automatic multi-language translation with a catalogue of phrases: successful employment for the Swiss avalanche bulletin

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation