Abstract
Named entity recognition (NER) is one of the main information extraction tasks and research on NER from Turkish texts is known to be rare. In this study, we present a rule-based NER system for Turkish which employs a set of lexical resources and pattern bases for the extraction of named entities including the names of people, locations, organizations together with time/date and money/percentage expressions. The domain of the system is news texts and it does not utilize important clues of capitalization and punctuation since they may be missing in texts obtained from the Web or the output of automatic speech recognition tools. The evaluation of the system is performed on news texts along with other genres encompassing child stories and historical texts, but as expected in case of manually engineered rule-based systems, it suffers from performance degradation on these latter genres of texts since they are distinct from the target domain of news texts. Furthermore, the system is evaluated on transcriptions of news videos leading to satisfactory results which is an important step towards the employment of NER during automatic semantic annotation of videos in Turkish. The current study is significant for its being the first rule-based approach to the NER task on Turkish texts with its evaluation on diverse text types.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvistica Investigationes 30(1), 3–26 (2007)
Cucerzan, S., Yarowsky, D.: Language independent named entity recognition combining morphological and contextual evidence. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (1999)
Tür, G., Hakkani-Tür, D., Oflazer, K.: A statistical information extraction system for Turkish. Natural Language Engineering 9(2), 181–210 (2003)
Bayraktar, Ö., Taşkaya-Temizel, T.: Person name extraction from Turkish financial news text using local grammar based approach. In: Proceedings of the International Symposium on Computer and Information Sciences (2008)
Traboulsi, H., Cheng, D., Ahmad, K.: Text corpora, local grammars and prediction. In: Proceedings of the Language Resources and Evaluation Conference (2006)
Küçük, D., Yazıcı, A.: Identification of coreferential chains in video texts for semantic annotation of news videos. In: Proceedings of the International Symposium on Computer and Information Sciences (2008)
Grishman, R.: Information extraction. In: Mitkov, R. (ed.) The Oxford Handbook of Computational Linguistics. Oxford Univ. Press, Oxford (2003)
Collins, M., Singer, Y.: Unsupervised models for named entity classification. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (1999)
Say, B., Zeyrek, D., Oflazer, K., Özge, U.: Development of a corpus and a treebank for present-day written Turkish. In: Proceedings of the 11th International Conference of Turkish Linguistics (2002)
Grishman, R., Sundheim, B.: Message understanding conference-6: A brief history. In: Proceedings 16th International Conference on Computational Linguistic (1996)
Ilgaz, R.: Bacaksız Kamyon Sürücüsü. Çınar Publications (2003)
Ilgaz, R.: Bacaksız Tatil Köyünde. Çınar Publications (2003)
Tanpınar, A.H.: Beş Şehir. Dergah Publications (2007)
Küçük, D., Yazıcı, A.: Rule-based named entity recognition from Turkish texts. In: Proceedings of the International Symposium on Innovations in Intelligent Systems and Applications (2009)
Declerck, T., Kuper, J., Saggion, H., Samiotou, A., Wittenburg, P., Contreras, J.: Contribution of NLP to the content indexing of multimedia documents. In: Enser, P.G.B., Kompatsiaris, Y., O’Connor, N.E., Smeaton, A., Smeulders, A.W.M. (eds.) CIVR 2004. LNCS, vol. 3115, pp. 610–618. Springer, Heidelberg (2004)
Saggion, H., Cunningham, H., Bontcheva, K., Maynard, D., Hamza, O., Wilks, Y.: Multimedia indexing through multi-source and multi-language information extraction: the MUMIS project. Data and Knowledge Engineering 48, 247–264 (2004)
Basili, R., Cammisa, M., Donati, E.: RitroveRAI: A web application for semantic indexing and hyperlinking of multimedia news. In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729, pp. 97–111. Springer, Heidelberg (2005)
Dowman, M., Tablan, V., Cunningham, H., Popov, B.: Web-assisted annotation, semantic indexing and search of television and radio news. In: Proceedings of the International Conference on World Wide Web (2005)
Küçük, D., Yazıcı, A.: Employing named entities for semantic retrieval of news videos in Turkish. In: Proceedings of the International Symposium on Computer and Information Sciences (2009)
Maynard, D., Tablan, V., Ursu, C., Cunningham, H., Wilks, Y.: Named entity recognition from diverse text types. In: Proceedings of the Conference on Recent Advances in Natural Language Processing (2001)
Sekine, S.: Extended named entity ontology with attribute information. In: Proceedings of the Language Resources and Evaluation Conference (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Küçük, D., Yazıcı, A. (2009). Named Entity Recognition Experiments on Turkish Texts. In: Andreasen, T., Yager, R.R., Bulskov, H., Christiansen, H., Larsen, H.L. (eds) Flexible Query Answering Systems. FQAS 2009. Lecture Notes in Computer Science(), vol 5822. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04957-6_45
Download citation
DOI: https://doi.org/10.1007/978-3-642-04957-6_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04956-9
Online ISBN: 978-3-642-04957-6
eBook Packages: Computer ScienceComputer Science (R0)