Skip to main content
Log in

The ARIEL-CMU situation frame detection pipeline for LoReHLT16: a model translation approach

  • Published:
Machine Translation

Abstract

The LoReHLT16 evaluation challenged participants to extract Situation Frames (SFs)—structured descriptions of humanitarian need situations—from monolingual Uyghur text. The ARIEL-CMU SF detector combines two classification paradigms, a manually curated keyword-spotting system and a machine learning classifier. These were applied by translating the models on a per-feature basis, rather than translating the input text. The resulting combined model provides the accuracy of human insight with the generality of machine learning, and is relatively tractable to human analysis and error correction. Other factors contributing to success were automatic dictionary creation, the use of phonetic transcription, detailed, hand-written morphological analysis, and naturalistic glossing for error analysis by humans. The ARIEL-CMU SF pipeline produced the top-scoring LoReHLT16 situation frame detection systems for the metrics SFType, SFType+Place+Need, SFType+Place+Relief, and SFType+Place+Urgency, at each of the three checkpoints.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. www.nist.gov/multimodal-information-group/lorehlt-2016-evaluations.

  2. It should be noted, however, that it does not vary monotonically with precision (or recall or F1), as the SFE and precision values in Tables 5 and 6 will show.

  3. http://cc-cedict.org.

  4. http://reliefweb.int.

  5. http://crisis.net.

  6. www.ushahidi.com.

  7. www.opensource.gov.

  8. www.crowdflower.com.

  9. http://github.com/dmort27/epitran.

  10. http://svn.code.sf.net/p/apertium/svn/incubator/apertium-uig.

  11. This is thus not lemmatization per se—the lemma of all of these is qatar, with -liq being a suffix—but rather an attempt to find the most appropriate corresponding word in the lexicons, whether it is a lemma or not.

  12. http://cldr.unicode.org.

  13. http://code.google.com/archive/p/word2vec/.

  14. Compared to our SFType detection systems, the features in our English Status-detection decision trees focused comparatively more on functional words (e.g., words more often indicative of tense, aspect, or modality) than content words. We did not believe these words would translate well using a lexical feature-translation approach, so we did not submit any of these results as part of a primary submission.

  15. The error correction was performed on both models, but in the keyword model it was more straightforward to fix (i.e., by simply removing the keyword) and to know that the fix had worked.

References

  • Baker M (1985) The mirror principle and morphosyntactic explanation. Linguistic Inquiry 16:373–415

    Google Scholar 

  • Beesley KR, Karttunen L (2003) Finite state morphology. CSLI Publications, Stanford

    Google Scholar 

  • Bharadwaj A, Mortensen D, Dyer C, Carbonell J (2016) Phonologically aware neural model for named entity recognition in low resource transfer settings. In: Proceedings of the 2016 conference on empirical methods in natural language processing. Association for Computational Linguistics, Austin, Texas, pp 1462–1472

  • Brown PE, Pietra VJD, Pietra SAD, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(1):263–312

  • Dyer C, Chahuneau V, Smith NA (2013) A simple, fast, and effective reparameterization of IBM Model 2. In: Proceedings of the 2013 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Atlanta, Georgia, pp 644–648

  • Forcada ML, Ginestí-Rosell M, Nordfalk J, O’Regan J, Ortiz-Rojas S, Pérez-Ortiz JA, Sánchez-Martínez F, Ramírez-Sánchez G, Tyers FM (2011) Apertium: a free/open-source platform for rule-based machine translation. Mach Transl 25(2):127–144

    Article  Google Scholar 

  • Frost R, Launchbury J (1989) Constructing natural language interpreters in a lazy functional language. Comput J 32:108–121

    Article  Google Scholar 

  • Hutton G (1992) Higher-order functions for parsing. J Funct Progr 2:323–343

    Article  MathSciNet  MATH  Google Scholar 

  • Hutton G, Meijer E (1988) Monadic parser combinators. J Funct Progr 8:437–444

    Article  MATH  Google Scholar 

  • Lewis MP, Simons GF, Fennig CD (2015) Ethnologue: languages of the world, 18th edn. SIL International, Dallas, Texas

  • Linden K, Silfverberg M, Axelson E, Hardwick S, Pirinen T (2011) HFST-framework for compiling and applying morphologies. Commun Comput Inf Sci 100:67–85

    Google Scholar 

  • Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. CoRR abs/1301.3781

  • Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013b) Distributed representations of words and phrases and their compositionality. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems, vol 26. Curran Associates, Inc., pp 3111–3119

  • Olteanu A, Castillo C, Diaz F, Vieweg S (2014) Crisislex: a lexicon for collecting and filtering microblogged communications in crises. In: Proceedings of the AAAI conference on weblogs and social media (ICWSM’14), Ann Arbor, MI, USA

  • Renduchintala A, Knowles R, Koehn P, Eisner J (2016) Creating interactive macaronic interfaces for language learning. In: Proceedings of ACL-2016 System Demonstrations, Association for Computational Linguistics, Berlin, Germany, pp 133–138

  • Stenetorp P, Pyysalo S, Topić G, Ohta T, Ananiadou S, Tsujii J (2012) brat: a web-based tool for NLP-assisted text annotation. In: Proceedings of the demonstrations at the 13th conference of the European chapter of the Association for Computational Linguistics, Association for Computational Linguistics, Avignon, France, pp 102–107

  • Strassel S, Tracey J (2014) LORELEI language packs: data, tools, and resources for technology development in low resource languages. In: LREC 2016: 10th edition of the language resources and evaluation conference, Portoroz, pp 3273–3280

  • Strassel S, Bies A, Tracey J (2017) Situational awareness for low resource languages: the LORELEI situation frame annotation task. In: SMERP2017: first international workshop on exploitation of social media for emergency relief and preparedness, Aberdeen

  • Temnikova I, Castillo C, Vieweg S (2015) Emterms 1.0: a terminological resource for crisis tweets. In: Proceedings of the international conference on information systems for crisis response and management (ISCRAM’15), Kristiansand, Norway

  • Washington JN, Ipasov IS, Tyers FM (2014) Finite-state morphological transducers for three Kypchak languages. In: Proceedings of the 9th conference on language resources and evaluation, LREC2014

  • Xu R, Yang Y, Liu H, Hsi A (2016) Cross-lingual text classification via model translation with limited dictionaries. In: Proceedings of the 25th ACM international on conference on information and knowledge management, ACM, pp 95–104

Download references

Acknowledgements

This project was sponsored by the Defense Advanced Research Projects Agency (DARPA) Information Innovation Office (I2O), program: Low Resource Languages for Emergent Incidents (LORELEI), issued by DARPA/I2O under Contract No. HR0011-15-C-0114.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patrick Littell.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Littell, P., Tian, T., Xu, R. et al. The ARIEL-CMU situation frame detection pipeline for LoReHLT16: a model translation approach. Machine Translation 32, 105–126 (2018). https://doi.org/10.1007/s10590-017-9205-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-017-9205-3

Keywords

Navigation