Skip to main content

A Comparative Evaluation of a New Unsupervised Sentence Boundary Detection Approach on Documents in English and Portuguese

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3878))

  • 1497 Accesses

Abstract

In this paper, we describe a new unsupervised sentence boundary detection system and present a comparative study evaluating its performance against different systems found in the literature that have been used to perform the task of automatic text segmentation into sentences for English and Portuguese documents. The results achieved by this new approach were as good as those of the previous systems, especially considering that the method does not require any additional training resources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Lyman, P., Varian, H.R.: How much information. Retrieved on [01/19/2004] (2003), from http://www.sims.berkeley.edu/how-much-info-2003

  2. Kiss, T., Strunk, J.: Multilingual unsupervised sentence boundary detection (Under Review), http://www.linguistics.rub.de/~strunk/ks2005FINAL.pdf

  3. Silla Jr., C.N., Kaestner, C.A.A.: An analysis of sentence boundary detection systems for English and Portuguese documents. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 135–141. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  4. Reynar, J., Ratnaparkhi, A.: A maximum entropy approach to identifying sentence boundaries. In: Proceedings of the Fifth Conference on Applied Natural Language Processing, pp. 16–19 (1997)

    Google Scholar 

  5. Palmer, D.D., Hearst, M.A.: Adaptive multilingual sentence boundary disambiguation. Computational Linguistics 23(2), 241–267 (1997)

    Google Scholar 

  6. Kiss, T., Strunk, J.: Scaled log likelihood ratios for the detection of abbreviations in text corpora. In: Proceedings of COLING 2002, Taipei, pp. 1228–1232 (2002)

    Google Scholar 

  7. Kiss, T., Strunk, J.: Viewing sentence boundary detection as collocation identification. In: Proceedings of KONVENS 2002, Saarbrücken, pp. 75–82 (2002)

    Google Scholar 

  8. Nunberg, G.: The Linguistics of Punctuation. In: CSLI Lecture Notes. Center for the Study of Language and Information, Stanford, California, vol. 18 (1990)

    Google Scholar 

  9. Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74 (1993)

    Google Scholar 

  10. Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)

    Google Scholar 

  11. Aluisio, S.M., Pinheiro, G.M., Finger, M., Nunes, M.G.V., Tagnin, S.E.: The Lacio-Web Project: Overview and issues in Brazilian Portuguese corpora creation. In: Proceedings of Corpus Linguistics 2003, pp. 14–21 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Strunk, J., Silla, C.N., Kaestner, C.A.A. (2006). A Comparative Evaluation of a New Unsupervised Sentence Boundary Detection Approach on Documents in English and Portuguese. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2006. Lecture Notes in Computer Science, vol 3878. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11671299_16

Download citation

  • DOI: https://doi.org/10.1007/11671299_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-32205-4

  • Online ISBN: 978-3-540-32206-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics