Named Entity Recognition Experiments on Turkish Texts

Küçük, Dilek; Yazıcı, Adnan

doi:10.1007/978-3-642-04957-6_45

Named Entity Recognition Experiments on Turkish Texts

Dilek Küçük²³ &
Adnan Yazıcı²⁴

Conference paper

897 Accesses
22 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5822))

Abstract

Named entity recognition (NER) is one of the main information extraction tasks and research on NER from Turkish texts is known to be rare. In this study, we present a rule-based NER system for Turkish which employs a set of lexical resources and pattern bases for the extraction of named entities including the names of people, locations, organizations together with time/date and money/percentage expressions. The domain of the system is news texts and it does not utilize important clues of capitalization and punctuation since they may be missing in texts obtained from the Web or the output of automatic speech recognition tools. The evaluation of the system is performed on news texts along with other genres encompassing child stories and historical texts, but as expected in case of manually engineered rule-based systems, it suffers from performance degradation on these latter genres of texts since they are distinct from the target domain of news texts. Furthermore, the system is evaluated on transcriptions of news videos leading to satisfactory results which is an important step towards the employment of NER during automatic semantic annotation of videos in Turkish. The current study is significant for its being the first rule-based approach to the NER task on Turkish texts with its evaluation on diverse text types.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvistica Investigationes 30(1), 3–26 (2007)
Article Google Scholar
Cucerzan, S., Yarowsky, D.: Language independent named entity recognition combining morphological and contextual evidence. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (1999)
Google Scholar
Tür, G., Hakkani-Tür, D., Oflazer, K.: A statistical information extraction system for Turkish. Natural Language Engineering 9(2), 181–210 (2003)
Article Google Scholar
Bayraktar, Ö., Taşkaya-Temizel, T.: Person name extraction from Turkish financial news text using local grammar based approach. In: Proceedings of the International Symposium on Computer and Information Sciences (2008)
Google Scholar
Traboulsi, H., Cheng, D., Ahmad, K.: Text corpora, local grammars and prediction. In: Proceedings of the Language Resources and Evaluation Conference (2006)
Google Scholar
Küçük, D., Yazıcı, A.: Identification of coreferential chains in video texts for semantic annotation of news videos. In: Proceedings of the International Symposium on Computer and Information Sciences (2008)
Google Scholar
Grishman, R.: Information extraction. In: Mitkov, R. (ed.) The Oxford Handbook of Computational Linguistics. Oxford Univ. Press, Oxford (2003)
Google Scholar
Collins, M., Singer, Y.: Unsupervised models for named entity classification. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (1999)
Google Scholar
Say, B., Zeyrek, D., Oflazer, K., Özge, U.: Development of a corpus and a treebank for present-day written Turkish. In: Proceedings of the 11th International Conference of Turkish Linguistics (2002)
Google Scholar
Grishman, R., Sundheim, B.: Message understanding conference-6: A brief history. In: Proceedings 16th International Conference on Computational Linguistic (1996)
Google Scholar
Ilgaz, R.: Bacaksız Kamyon Sürücüsü. Çınar Publications (2003)
Google Scholar
Ilgaz, R.: Bacaksız Tatil Köyünde. Çınar Publications (2003)
Google Scholar
Tanpınar, A.H.: Beş Şehir. Dergah Publications (2007)
Google Scholar
Küçük, D., Yazıcı, A.: Rule-based named entity recognition from Turkish texts. In: Proceedings of the International Symposium on Innovations in Intelligent Systems and Applications (2009)
Google Scholar
Declerck, T., Kuper, J., Saggion, H., Samiotou, A., Wittenburg, P., Contreras, J.: Contribution of NLP to the content indexing of multimedia documents. In: Enser, P.G.B., Kompatsiaris, Y., O’Connor, N.E., Smeaton, A., Smeulders, A.W.M. (eds.) CIVR 2004. LNCS, vol. 3115, pp. 610–618. Springer, Heidelberg (2004)
Google Scholar
Saggion, H., Cunningham, H., Bontcheva, K., Maynard, D., Hamza, O., Wilks, Y.: Multimedia indexing through multi-source and multi-language information extraction: the MUMIS project. Data and Knowledge Engineering 48, 247–264 (2004)
Article Google Scholar
Basili, R., Cammisa, M., Donati, E.: RitroveRAI: A web application for semantic indexing and hyperlinking of multimedia news. In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729, pp. 97–111. Springer, Heidelberg (2005)
Chapter Google Scholar
Dowman, M., Tablan, V., Cunningham, H., Popov, B.: Web-assisted annotation, semantic indexing and search of television and radio news. In: Proceedings of the International Conference on World Wide Web (2005)
Google Scholar
Küçük, D., Yazıcı, A.: Employing named entities for semantic retrieval of news videos in Turkish. In: Proceedings of the International Symposium on Computer and Information Sciences (2009)
Google Scholar
Maynard, D., Tablan, V., Ursu, C., Cunningham, H., Wilks, Y.: Named entity recognition from diverse text types. In: Proceedings of the Conference on Recent Advances in Natural Language Processing (2001)
Google Scholar
Sekine, S.: Extended named entity ontology with attribute information. In: Proceedings of the Language Resources and Evaluation Conference (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Power Electronics Group, TÜBİTAK - Uzay Institute, 06531, Ankara, Turkey
Dilek Küçük
Department of Computer Engineering, Middle East Technical University, 06531, Ankara, Turkey
Adnan Yazıcı

Authors

Dilek Küçük
View author publications
You can also search for this author in PubMed Google Scholar
Adnan Yazıcı
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Roskilde University, Universitetsvej 1, 4000, Roskilde, Denmark
Troels Andreasen & Henrik Bulskov &
Iona College, Machine Intelligence Institute, 10801, New Rochelle, NY, USA
Ronald R. Yager
Computer Science Dept., Research group PLIS: Programming, Roskilde University, Universitetsvej 1, 4000, Roskilde, Denmark
Henning Christiansen
Department of Computer Science and Engineering, Aalborg University Esbjerg, Niels Bohrs Vej 8, 6700, Esbjerg, Denmark
Henrik Legind Larsen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Küçük, D., Yazıcı, A. (2009). Named Entity Recognition Experiments on Turkish Texts. In: Andreasen, T., Yager, R.R., Bulskov, H., Christiansen, H., Larsen, H.L. (eds) Flexible Query Answering Systems. FQAS 2009. Lecture Notes in Computer Science(), vol 5822. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04957-6_45

Download citation

DOI: https://doi.org/10.1007/978-3-642-04957-6_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04956-9
Online ISBN: 978-3-642-04957-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics