Skip to main content

Named Entity Recognition Experiments on Turkish Texts

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5822))

Abstract

Named entity recognition (NER) is one of the main information extraction tasks and research on NER from Turkish texts is known to be rare. In this study, we present a rule-based NER system for Turkish which employs a set of lexical resources and pattern bases for the extraction of named entities including the names of people, locations, organizations together with time/date and money/percentage expressions. The domain of the system is news texts and it does not utilize important clues of capitalization and punctuation since they may be missing in texts obtained from the Web or the output of automatic speech recognition tools. The evaluation of the system is performed on news texts along with other genres encompassing child stories and historical texts, but as expected in case of manually engineered rule-based systems, it suffers from performance degradation on these latter genres of texts since they are distinct from the target domain of news texts. Furthermore, the system is evaluated on transcriptions of news videos leading to satisfactory results which is an important step towards the employment of NER during automatic semantic annotation of videos in Turkish. The current study is significant for its being the first rule-based approach to the NER task on Turkish texts with its evaluation on diverse text types.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvistica Investigationes 30(1), 3–26 (2007)

    Article  Google Scholar 

  2. Cucerzan, S., Yarowsky, D.: Language independent named entity recognition combining morphological and contextual evidence. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (1999)

    Google Scholar 

  3. Tür, G., Hakkani-Tür, D., Oflazer, K.: A statistical information extraction system for Turkish. Natural Language Engineering 9(2), 181–210 (2003)

    Article  Google Scholar 

  4. Bayraktar, Ö., Taşkaya-Temizel, T.: Person name extraction from Turkish financial news text using local grammar based approach. In: Proceedings of the International Symposium on Computer and Information Sciences (2008)

    Google Scholar 

  5. Traboulsi, H., Cheng, D., Ahmad, K.: Text corpora, local grammars and prediction. In: Proceedings of the Language Resources and Evaluation Conference (2006)

    Google Scholar 

  6. Küçük, D., Yazıcı, A.: Identification of coreferential chains in video texts for semantic annotation of news videos. In: Proceedings of the International Symposium on Computer and Information Sciences (2008)

    Google Scholar 

  7. Grishman, R.: Information extraction. In: Mitkov, R. (ed.) The Oxford Handbook of Computational Linguistics. Oxford Univ. Press, Oxford (2003)

    Google Scholar 

  8. Collins, M., Singer, Y.: Unsupervised models for named entity classification. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (1999)

    Google Scholar 

  9. Say, B., Zeyrek, D., Oflazer, K., Özge, U.: Development of a corpus and a treebank for present-day written Turkish. In: Proceedings of the 11th International Conference of Turkish Linguistics (2002)

    Google Scholar 

  10. Grishman, R., Sundheim, B.: Message understanding conference-6: A brief history. In: Proceedings 16th International Conference on Computational Linguistic (1996)

    Google Scholar 

  11. Ilgaz, R.: Bacaksız Kamyon Sürücüsü. Çınar Publications (2003)

    Google Scholar 

  12. Ilgaz, R.: Bacaksız Tatil Köyünde. Çınar Publications (2003)

    Google Scholar 

  13. Tanpınar, A.H.: Beş Şehir. Dergah Publications (2007)

    Google Scholar 

  14. Küçük, D., Yazıcı, A.: Rule-based named entity recognition from Turkish texts. In: Proceedings of the International Symposium on Innovations in Intelligent Systems and Applications (2009)

    Google Scholar 

  15. Declerck, T., Kuper, J., Saggion, H., Samiotou, A., Wittenburg, P., Contreras, J.: Contribution of NLP to the content indexing of multimedia documents. In: Enser, P.G.B., Kompatsiaris, Y., O’Connor, N.E., Smeaton, A., Smeulders, A.W.M. (eds.) CIVR 2004. LNCS, vol. 3115, pp. 610–618. Springer, Heidelberg (2004)

    Google Scholar 

  16. Saggion, H., Cunningham, H., Bontcheva, K., Maynard, D., Hamza, O., Wilks, Y.: Multimedia indexing through multi-source and multi-language information extraction: the MUMIS project. Data and Knowledge Engineering 48, 247–264 (2004)

    Article  Google Scholar 

  17. Basili, R., Cammisa, M., Donati, E.: RitroveRAI: A web application for semantic indexing and hyperlinking of multimedia news. In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729, pp. 97–111. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  18. Dowman, M., Tablan, V., Cunningham, H., Popov, B.: Web-assisted annotation, semantic indexing and search of television and radio news. In: Proceedings of the International Conference on World Wide Web (2005)

    Google Scholar 

  19. Küçük, D., Yazıcı, A.: Employing named entities for semantic retrieval of news videos in Turkish. In: Proceedings of the International Symposium on Computer and Information Sciences (2009)

    Google Scholar 

  20. Maynard, D., Tablan, V., Ursu, C., Cunningham, H., Wilks, Y.: Named entity recognition from diverse text types. In: Proceedings of the Conference on Recent Advances in Natural Language Processing (2001)

    Google Scholar 

  21. Sekine, S.: Extended named entity ontology with attribute information. In: Proceedings of the Language Resources and Evaluation Conference (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Küçük, D., Yazıcı, A. (2009). Named Entity Recognition Experiments on Turkish Texts. In: Andreasen, T., Yager, R.R., Bulskov, H., Christiansen, H., Larsen, H.L. (eds) Flexible Query Answering Systems. FQAS 2009. Lecture Notes in Computer Science(), vol 5822. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04957-6_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04957-6_45

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04956-9

  • Online ISBN: 978-3-642-04957-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics