High-Precision Person Name Extraction from Turkish Texts Using Wikipedia

Küçük, Dilek; Küçük, Doğan

doi:10.1007/978-3-319-19581-0_31

Dilek Küçük¹⁸ &
Doğan Küçük¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9103))

Included in the following conference series:

International Conference on Applications of Natural Language to Information Systems

1832 Accesses

Abstract

In this paper, we focus on person name extraction from diverse text types in Turkish and have compiled a large set of person names from Turkish Wikipedia. After automated post-processing to clean and extend it, we have performed extraction experiments using this resource on data sets of considerable sizes and achieved high precision rates. Next, we have shown that the use of non-local dependencies together with this Wikipedia resource improves recall, and hence F-Measure, considerably. Finally, we have tested the contribution of the resource and the scheme based on non-local dependencies to the person name extraction performance of a full-fledged named entity recognizer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Combining Knowledge and CRF-Based Approach to Named Entity Recognition in Russian

A Hybrid Approach for Persian Named Entity Recognition

Article 15 March 2017

Research Trends for Named Entity Recognition in Hindi Language

References

Bayraktar, Ö., Taşkaya-Temizel, T.: Person name extraction from Turkish financial news text using local grammar based approach. In: Proceedings of the International Symposium on Computer and Information Sciences, pp. 1–4 (2008)
Google Scholar
Cucerzan, S., Yarowsky, D.: Language independent named entity recognition combining morphological and contextual evidence. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 90–99 (1999)
Google Scholar
Çelikkaya, G., Torunoğlu, D., Eryiğit, G.: Named entity recognition on real data: a preliminary investigation for Turkish. In: Proceedings of the 7th International Conference on Application of Information and Communication Technologies (2013)
Google Scholar
Krishnan, V., Manning, C.D.: An effective two-stage model for exploiting non-local dependencies in named entity recognition. In: Proceedings of the International Conference on Computational Linguistics, pp. 1121–1128 (2006)
Google Scholar
Küçük, D.: Utilizing annotated Wikipedia article titles to improve a rule-based named entity recognizer for Turkish. In: Larsen, H.L., Martin-Bautista, M.J., Vila, M.A., Andreasen, T., Christiansen, H. (eds.) FQAS 2013. LNCS, vol. 8132, pp. 683–691. Springer, Heidelberg (2013)
Chapter Google Scholar
Küçük, D.: Automatic compilation of language resources for named entity recognition in Turkish by utilizing Wikipedia article titles. Comput. Stand. Interfaces 41, 1–9 (2015)
Article Google Scholar
Küçük, D., Jacquet, G., Steinberger, R.: Named entity recognition on Turkish tweets. In: Proceedings of the Language Resources and Evaluation Conference, pp. 450–454 (2014)
Google Scholar
Küçük, D., Steinberger, R.: Experiments to improve named entity recognition on Turkish tweets. In: Proceedings of the EACL Workshop on Language Analysis for Social Media, pp. 71–78 (2014)
Google Scholar
Küçük, D., Yazıcı, A.: Named entity recognition experiments on Turkish texts. In: Andreasen, T., Yager, R.R., Bulskov, H., Christiansen, H., Larsen, H.L. (eds.) FQAS 2009. LNCS, vol. 5822, pp. 524–535. Springer, Heidelberg (2009)
Chapter Google Scholar
Küçük, D., Yazıcı, A.: Exploiting information extraction techniques for automatic semantic video indexing with an application to Turkish news videos. Knowl.-Based Syst. 24(6), 844–857 (2011)
Article Google Scholar
Küçük, D., Yazıcı, A.: A hybrid named entity recognizer for Turkish. Expert Syst. Appl. 39(3), 2733–2742 (2012)
Article Google Scholar
Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recognition. In: Proceedings of the 13th Conference on Computational Natural Language Learning, pp. 147–155 (2009)
Google Scholar
Say, B., Zeyrek, D., Oflazer, K., Özge, U.: Development of a corpus and a treebank for present-day written Turkish. In: Proceedings of the 11th International Conference of Turkish Linguistics (2002)
Google Scholar
Şeker, G.A., Eryiğit, G.: Initial explorations on using CRFs for Turkish named entity recognition. In: Proceedings of the International Conference on Computational Linguistics, pp. 2459–2474 (2012)
Google Scholar
Tür, G., Hakkani-Tür, D., Oflazer, K.: A statistical information extraction system for Turkish. Nat. Lang. Eng. 9(2), 181–210 (2003)
Article Google Scholar
Yeniterzi, R.: Exploiting morphology in Turkish named entity recognition system. In: Proceedings of the ACL Student Session, pp. 105–110 (2011)
Google Scholar
Zesch, T., Müller, C., Gurevych, I.: Extracting lexical semantic knowledge from Wikipedia and Wiktionary. In: Proceedings of the Language Resources and Evaluation Conference, pp. 1646–1652 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

TÜBİTAK Energy Institute, Ankara, Turkey
Dilek Küçük
Gazi University, Ankara, Turkey
Doğan Küçük

Authors

Dilek Küçük
View author publications
You can also search for this author in PubMed Google Scholar
Doğan Küçük
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dilek Küçük .

Editor information

Editors and Affiliations

Technische Universität Darmstadt, Darmstadt, Germany
Chris Biemann
Universität Passau, Passau, Germany
Siegfried Handschuh
Universität Passau, Passau, Germany
André Freitas
University of Salford, Salford, United Kingdom
Farid Meziane
Conservatoire National des Arts et Métiers, Paris, France
Elisabeth Métais

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Küçük, D., Küçük, D. (2015). High-Precision Person Name Extraction from Turkish Texts Using Wikipedia. In: Biemann, C., Handschuh, S., Freitas, A., Meziane, F., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2015. Lecture Notes in Computer Science(), vol 9103. Springer, Cham. https://doi.org/10.1007/978-3-319-19581-0_31

Download citation

DOI: https://doi.org/10.1007/978-3-319-19581-0_31
Published: 04 June 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19580-3
Online ISBN: 978-3-319-19581-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics