Skip to main content

Discourse Segmentation for Spanish Based on Shallow Parsing

  • Conference paper
Advances in Artificial Intelligence (MICAI 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6437))

Included in the following conference series:

Abstract

Nowadays discourse parsing is a very prominent research topic. However, there is not a discourse parser for Spanish texts. The first stage in order to develop this tool is discourse segmentation. In this work, we present DiSeg, the first discourse segmenter for Spanish, which uses the framework of Rhetorical Structure Theory and is based on lexical and syntactic rules. We describe the system and we evaluate its performance against a gold standard corpus, obtaining promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Marcu, D.: The Theory and Practice of Discourse Parsing Summarization. Institute of Technology, Massachusetts (2000a)

    MATH  Google Scholar 

  2. Marcu, D.: The Rhetorical Parsing of Unrestricted Texts: A Surface-based Approach. Computational Linguistics 26(3), 395–448 (2000b)

    Article  MathSciNet  Google Scholar 

  3. Sumita, K., Ono, K., Chino, T., Ukita, T., Amano, S.: A discourse structure analyzer for Japonese text. In: International Conference on Fifth Generation Computer Systems, pp. 1133–1140 (1992)

    Google Scholar 

  4. Pardo, T.A.S., Nunes, M.G.V., Rino, L.M.F.: DiZer: An Automatic Discourse Analyzer for Brazilian Portuguese. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 224–234. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  5. Pardo, T.A.S., Nunes, M.G.V.: On the Development and Evaluation of a Brazilian Portuguese Discourse Parser. Journal of Theoretical and Applied Computing 15(2), 43–64 (2008)

    Google Scholar 

  6. Mann, W.C., Thompson, S.A.: Rhetorical structure theory: Toward a functional theory of text organization. Text 8(3), 243–281 (1988)

    Article  Google Scholar 

  7. Tofiloski, M., Brooke, J., Taboada, M.: A Syntactic and Lexical-Based Discourse Segmenter. In: 47th Annual Meeting of the Association for Computational Linguistics, Singapur (2009)

    Google Scholar 

  8. Soricut, R., Marcu, D.: Sentence Level Discourse Parsing Using Syntactic and Lexical Information. In: 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Edmonton, Canada, pp. 149–156 (2003)

    Google Scholar 

  9. Mazeiro, E., Pardo, T.A.S., Nunes, M.G.V.: Identificação automática de segmentos discursivos: o uso do parser PALAVRAS. Série de Relatórios do Núcleo Interinstitucional de Lingüística Computacional (NILC). São Carlos, São Paulo (2007)

    Google Scholar 

  10. Taboada, M., Mann, W.C.: Applications of rhetorical structure theory. Discourse Studies 8(4), 567–588 (2005)

    Article  Google Scholar 

  11. Hovy, E.: Automated discourse generation using discourse structure relations. Artificial Intelligence 63, 341–385 (1993)

    Article  Google Scholar 

  12. Dale, R., Hovy, E., Rösner, D., Stock, O.: Aspects of Automated Natural Language Generation. Springer, Berlin (1992)

    Book  MATH  Google Scholar 

  13. O’Donnell, M., Mellish, C., Oberlander, J., Knott, A.: ILEX: An architecture for a dynamic Hypertext generation system. Natural Language Engineering 7, 225–250 (2001)

    Google Scholar 

  14. Radev, D.: A common theory of information fusion from multiple text sources. Step one: Cross document structure. In: Dybkjær, L., Hasida, K., Traum, D. (eds.) 1st SIGdial Workshop on Discourse and Dialogue, Hong-Kong, pp. 74–83 (2000)

    Google Scholar 

  15. Pardo, T.A.S., Rino, L.H.M.: DMSumm: Review and assessment. In: Ranchhod, E., Mamede, N.J. (eds.) PorTAL 2002. LNCS (LNAI), vol. 2389, pp. 263–274. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  16. Ghorbel, H., Ballim, A., Coray, G.: ROSETTA: Rhetorical and Semantic Environment for Text Alignment. In: Rayson, P., Wilson, A., McEnery, A.M., Hardie, A., Khoja, S. (eds.) Proceedings of Corpus Linguistics, Lancaster, pp. 224–233 (2001)

    Google Scholar 

  17. Marcu, D., Carlson, L., Watanabe, M.: The automatic translation of discourse structures. In: 1st Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL 2000), Seattle, vol. 1, pp. 9–17 (2000)

    Google Scholar 

  18. Carlson, L., Marcu, D.: Discourse Tagging Reference Manual. ISI Technical Report ISITR-545. University of Southern California, Los Angeles (2001)

    Google Scholar 

  19. da Cunha, I., Iruskieta, M.: La influencia del anotador y las técnicas de traducción en el desarrollo de árboles retóricos. Un estudio en español y euskera. In: 7th Brazilian Symposium in Information and Human Language Technology (STIL). Universidade de São Paulo, São Carlos (2009)

    Google Scholar 

  20. Alonso, L.: Representing discourse for automatic text summarization via shallow NLP techniques. PhD thesis. Universitat de Barcelona, Barcelona (2005)

    Google Scholar 

  21. Atserias, J., Casas, B., Comelles, E., González, M., Padró, L.l., Padró, M.: FreeLing 1.3: Syntactic and semantic services in an open-source NLP library. In: 5th International Conference on Language Resources and Evaluation. ELRA (2006)

    Google Scholar 

  22. Afantenos, S., Denis, P., Muller, P., Danlos, L.: Learning Recursive Segments for Discourse Parsing. In: Proceedings of the Seventh Conference on International Language Resources and Evaluation (2010)

    Google Scholar 

  23. da Cunha, I., Fernández, S., Velázquez-Morales, P., Vivaldi, J., SanJuan, E., Torres-Moreno, J.-M.: A New Hybrid Summarizer Based on Vector Space Model, Statistical Physics and Linguistics. In: Gelbukh, A., Kuri Morales, Á.F. (eds.) MICAI 2007. LNCS (LNAI), vol. 4827, pp. 872–882. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

da Cunha, I., SanJuan, E., Torres-Moreno, JM., Lloberes, M., Castellón, I. (2010). Discourse Segmentation for Spanish Based on Shallow Parsing. In: Sidorov, G., Hernández Aguirre, A., Reyes García, C.A. (eds) Advances in Artificial Intelligence. MICAI 2010. Lecture Notes in Computer Science(), vol 6437. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16761-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-16761-4_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-16760-7

  • Online ISBN: 978-3-642-16761-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics