Skip to main content

Sentence Boundary Detection in Turkish

  • Conference paper
Advances in Information Systems (ADVIS 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3261))

Included in the following conference series:

  • 1433 Accesses

Abstract

In this paper, we describe a solution method for sentence boundary detection in Turkish. The method exploits simple heuristic knowledge of Turkish syllabication and its phonetic rules for disambiguation of dots. The test accuracy of the algorithm is measured as 96.02%. The main contribution of this study is considered as presenting a new lexicon free method for differentiating EOS (end of sentence) dots from the ones that are used for other purposes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Tür, G.: A Statistical Information Extraction System. PhD Thesis, Bilkent University, Ankara, Turkey (2000)

    Google Scholar 

  2. Aberdeen, J., Burger, J., Day, D., Hirschman, L., Robinson, P., Vilain, M.: Mitre: Description of the alembic system used for muc-6. In: he Proceedings of the Sixth Message Under-standing Conference (MUC-6), Columbia, Maryland. Morgan Kaufmann, San Francisco (1995)

    Google Scholar 

  3. Reynar, J.C., Ratnaparkhi, A.: A maximum entropy approach to identifying sentence boundaries. In: Proceedings of the Fifth A CL Conference on Applied Natural Language Processing (ANLP 1997), Washington, D.C (1997)

    Google Scholar 

  4. Riley, M.D.: Some applications of tree-based modeling to speech and language indexing. In: Proceedings of the DARPA Speech and Natural Language Workshop, pp. 339–352. Morgan Kaufman, San Francisco (1989)

    Chapter  Google Scholar 

  5. Palmer, D.D., Hearst, M.A.: Adaptive multilingual sentence boundary disambiguation. Computational Linguistics (1997)

    Google Scholar 

  6. Mikheev, A.: Tagging Sentence Boundaries. Language Technology Group, University of Edinburgh (1997)

    Google Scholar 

  7. Oflazer, K., Say, B., Hakkani-Tür, D., Tur, G.: Building a Turkish Treebank. In: Abeillé, A. (ed.) Chapter in Building and Using Parsed Corpora. Kluwer Academic Publishers, Dordrecht (2003)

    Google Scholar 

  8. Ziegenhain, U., Arranz, V., Bisani, M., Bonafonte, A., Castell, C., Conejero, D., Hartikainen, E., Maltese, G., Oflazer, K., Rabie, A., Razumikin, D., Shammass, S., Zong, C.: The LC-STAR: Lexica and Corpora for Speech-to-Speech Translation Technologies. Technical Report, IST-2001-32216, Siemens AG, CT IC 5, München, Germany (2003), http://www.lc-star.com

  9. Hakkani-Tür, D.Z., Oflazer, K., Tür, G.: Statistical Morphological Disambiguation for Agglutinative Languages. Computers and the Humanities (2002)

    Google Scholar 

  10. Dalkılıç, M.E., Dalkılıç, G.B.: Türkçe’nin önemli bazı istatistiksel özellikleri. İstatistik Araştırma Dergisi 1(1), 113–130 (2002)

    Google Scholar 

  11. Barton, G., Edward: Computational Complexity in Two-Level morphology. In: ACL Proceedings, 24th Annual Meeting (1986)

    Google Scholar 

  12. Hankamer, J.: Turkish generative morphology and morphological parsing. In: Second International Conference on Turkish Linguistics. Istanbul, Turkey (1984)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dinçer, B.T., Karaoğlan, B. (2004). Sentence Boundary Detection in Turkish. In: Yakhno, T. (eds) Advances in Information Systems. ADVIS 2004. Lecture Notes in Computer Science, vol 3261. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30198-1_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30198-1_26

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23478-4

  • Online ISBN: 978-3-540-30198-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics