Combining rule-based and statistical mechanisms for low-resource named entity recognition

Gabbard, Ryan; DeYoung, Jay; Lignos, Constantine; Freedman, Marjorie; Weischedel, Ralph

doi:10.1007/s10590-017-9208-0

Combining rule-based and statistical mechanisms for low-resource named entity recognition

Published: 20 December 2017

Volume 32, pages 31–43, (2018)
Cite this article

Machine Translation

1055 Accesses
10 Citations
1 Altmetric
Explore all metrics

Abstract

We describe a multifaceted approach to named entity recognition that can be deployed with minimal data resources and a handful of hours of non-expert annotation. We describe how this approach was applied in the 2016 LoReHLT evaluation and demonstrate that both statistical and rule-based approaches contribute to our performance. We also demonstrate across many languages the value of selecting the sentences to be annotated when training on small amounts of data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Czech Named Entity Corpus

Named Entity Recognition Through Learning from Experts

A Survey of Low-Resource Named Entity Recognition

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

IL3_dictionary.xml; LDC-provided.
xinjiang_places.pdf with link to Wikipedia; LDC-provided.
link in CategoryII_list.pdf; LDC-provided.
parallel_grammar.pdf; LDC-provided.
This is the numerical stability parameter typically used in AdaGrad implementations.
The word shape feature collapsed all consecutive letters in a name to a single letter to attempt to identify punctuation patterns. For example, the name Bob would have the shape a, while @Bob would have the shape @a.
Arabic and Mandarin were also provided but we exclude them from our experiments here due to data processing issues. Yoruba is excluded because it had too little data for meaningful experiments. Hausa was excluded because the data did not annotate the gpe type. LDC catalog numbers were 2014E115, 2015E70, and 2016E{29,87,91,93,95,97,99,103}.

References

Bonadiman D, Severyn A, Moschitti A (2015) Deep neural networks for named entity recognition in Italian. CLiC it 51–55
Collins M, Singer Y (1999) Unsupervised models for named entity classification. In: Proceedings of the joint SIGDAT conference on empirical methods in natural language processing and very large corpora, pp 100–110
Duchi JC, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12:2121–2159
MathSciNet MATH Google Scholar
Lafferty JD, McCallum A, Pereira FCN (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the eighteenth international conference on machine learning, Morgan Kaufmann Publishers Inc., San Francisco, CA, ICML ’01, pp 282–289
Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C (2016) Neural architectures for named entity recognition. CoRR abs/1603.01360, http://arxiv.org/abs/1603.01360
Li W, McCallum A (2003) Rapid development of hindi named entity recognition using conditional random fields and feature induction. In: ACM transactions on Asian language information processing, pp 290–294
Linguistic Data Consortium (2016) LORELEI IL3 incident language pack for year 1 Eval. LDC2016E57
Nadeau D, Turney PD, Matwin S (2006) Unsupervised named-entity recognition: generating gazetteers and resolving ambiguity. In: Proceedings of the 19th international conference on advances in artificial intelligence: Canadian Society for Computational Studies of Intelligence, Springer, Berlin, Heidelberg, AI’06, pp 266–277
Ramshaw LA, Marcus MP (1999) Text chunking using transformation-based learning. In: Armstrong S, Church K, Isabelle P, Manzi S, Tzoukermann E, Yarowsky D (eds) Natural language processing using very large corpora. Springer, The Netherlands, Dordrecht, pp 157–176
Chapter Google Scholar
Riaz K (2010) Rule-based named entity recognition in urdu. In: Proceedings of the 2010 named entities workshop, Association for computational linguistics, Stroudsburg, PA, NEWS ’10, pp 126–135
Settles B (2010) Active learning literature survey. In: Computer sciences technical report, University of Wisconsin-Madison
Sun H, Grishman R, Wang Y (2016) Domain adaptation with active learning for named entity recognition. In: Sun X, Liu A, Chao HC, Bertino E (eds) Cloud computing and security: second international conference. Revised Selected Papers, Part II, Springer International Publishing, Cham, ICCCS 2016, Nanjing, China, 29–31 July 2016, pp 611–622
Sundheim BM (1995) Overview of results of the MUC-6 evaluation. In: Proceedings of the 6th conference on message understanding, Association for Computational Linguistics, Stroudsburg, PA, MUC-6 ’95, pp 13–31
Tjong Kim Sang EF, De Meulder F (2003) Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the seventh conference on natural language learning at HLT-NAACL 2003 - Vol4, Association for Computational Linguistics, Stroudsburg, PA, CoNLL ’03, pp 142–147
Wick M (2016) Geonames ontology. http://www.geonames.org/about.html
Xu H, Marcus M, Ungar L, Yang C (2017) Unsupervised morphology learning with statistical paradigms, unpublished manuscript
Zhang B, Pan X, Wang T, Vaswani A, Ji H, Knight K, Marcu D (2016) Name tagging for low-resource incident languages based on expectation-driven learning. In: Proceedings of ACL 2016

Download references

Acknowledgements

This material is based upon work supported by the the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR0011-15-C-0113. The views, opinions and/or findings expressed are those of the author and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government. (Approved for Public Release by DARPA on Aug 29, 2017 (DISTAR Approval #28392) , Distribution Unlimited)

Author information

Authors and Affiliations

Raytheon BBN Technologies, 10 Moulton St., Cambridge, MA, 02138, USA
Ryan Gabbard, Jay DeYoung, Constantine Lignos, Marjorie Freedman & Ralph Weischedel

Authors

Ryan Gabbard
View author publications
You can also search for this author inPubMed Google Scholar
Jay DeYoung
View author publications
You can also search for this author inPubMed Google Scholar
Constantine Lignos
View author publications
You can also search for this author inPubMed Google Scholar
Marjorie Freedman
View author publications
You can also search for this author inPubMed Google Scholar
Ralph Weischedel
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Ryan Gabbard.

Additional information

All work described in this article was performed at Raytheon BBN Technologies. Authors Freedman, Gabbard, Lignos, and Weischedel are currently affiliated with the University of Southern California Information Sciences Institute, 4676 Admiralty Way, Suite 1001, Marina del Rey, 90292, USA.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gabbard, R., DeYoung, J., Lignos, C. et al. Combining rule-based and statistical mechanisms for low-resource named entity recognition. Machine Translation 32, 31–43 (2018). https://doi.org/10.1007/s10590-017-9208-0

Download citation

Received: 31 May 2017
Accepted: 07 October 2017
Published: 20 December 2017
Issue Date: June 2018
DOI: https://doi.org/10.1007/s10590-017-9208-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Combining rule-based and statistical mechanisms for low-resource named entity recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Czech Named Entity Corpus

Named Entity Recognition Through Learning from Experts

A Survey of Low-Resource Named Entity Recognition

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now