Abstract
In this paper, we describe a solution method for sentence boundary detection in Turkish. The method exploits simple heuristic knowledge of Turkish syllabication and its phonetic rules for disambiguation of dots. The test accuracy of the algorithm is measured as 96.02%. The main contribution of this study is considered as presenting a new lexicon free method for differentiating EOS (end of sentence) dots from the ones that are used for other purposes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Tür, G.: A Statistical Information Extraction System. PhD Thesis, Bilkent University, Ankara, Turkey (2000)
Aberdeen, J., Burger, J., Day, D., Hirschman, L., Robinson, P., Vilain, M.: Mitre: Description of the alembic system used for muc-6. In: he Proceedings of the Sixth Message Under-standing Conference (MUC-6), Columbia, Maryland. Morgan Kaufmann, San Francisco (1995)
Reynar, J.C., Ratnaparkhi, A.: A maximum entropy approach to identifying sentence boundaries. In: Proceedings of the Fifth A CL Conference on Applied Natural Language Processing (ANLP 1997), Washington, D.C (1997)
Riley, M.D.: Some applications of tree-based modeling to speech and language indexing. In: Proceedings of the DARPA Speech and Natural Language Workshop, pp. 339–352. Morgan Kaufman, San Francisco (1989)
Palmer, D.D., Hearst, M.A.: Adaptive multilingual sentence boundary disambiguation. Computational Linguistics (1997)
Mikheev, A.: Tagging Sentence Boundaries. Language Technology Group, University of Edinburgh (1997)
Oflazer, K., Say, B., Hakkani-Tür, D., Tur, G.: Building a Turkish Treebank. In: Abeillé, A. (ed.) Chapter in Building and Using Parsed Corpora. Kluwer Academic Publishers, Dordrecht (2003)
Ziegenhain, U., Arranz, V., Bisani, M., Bonafonte, A., Castell, C., Conejero, D., Hartikainen, E., Maltese, G., Oflazer, K., Rabie, A., Razumikin, D., Shammass, S., Zong, C.: The LC-STAR: Lexica and Corpora for Speech-to-Speech Translation Technologies. Technical Report, IST-2001-32216, Siemens AG, CT IC 5, München, Germany (2003), http://www.lc-star.com
Hakkani-Tür, D.Z., Oflazer, K., Tür, G.: Statistical Morphological Disambiguation for Agglutinative Languages. Computers and the Humanities (2002)
Dalkılıç, M.E., Dalkılıç, G.B.: Türkçe’nin önemli bazı istatistiksel özellikleri. İstatistik Araştırma Dergisi 1(1), 113–130 (2002)
Barton, G., Edward: Computational Complexity in Two-Level morphology. In: ACL Proceedings, 24th Annual Meeting (1986)
Hankamer, J.: Turkish generative morphology and morphological parsing. In: Second International Conference on Turkish Linguistics. Istanbul, Turkey (1984)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dinçer, B.T., Karaoğlan, B. (2004). Sentence Boundary Detection in Turkish. In: Yakhno, T. (eds) Advances in Information Systems. ADVIS 2004. Lecture Notes in Computer Science, vol 3261. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30198-1_26
Download citation
DOI: https://doi.org/10.1007/978-3-540-30198-1_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23478-4
Online ISBN: 978-3-540-30198-1
eBook Packages: Computer ScienceComputer Science (R0)