Skip to main content

Probabilistic Approaches for Modeling Text Structure and Their Application to Text-to-Text Generation

  • Chapter
Book cover Empirical Methods in Natural Language Generation (EACL 2009, ENLG 2009)

Abstract

Since the early days of generation research, it has been acknowledged that modeling the global structure of a document is crucial for producing coherent, readable output. However, traditional knowledge-intensive approaches have been of limited utility in addressing this problem since they cannot be effectively scaled to operate in domain-independent, large-scale applications. Due to this difficulty, existing text-to-text generation systems rarely rely on such structural information when producing an output text. Consequently, texts generated by these methods do not match the quality of those written by humans – they are often fraught with severe coherence violations and disfluencies.

In this chapter, I will present probabilistic models of document structure that can be effectively learned from raw document collections. This feature distinguishes these new models from traditional knowledge intensive approaches used in symbolic concept-to-text generation. Our results demonstrate that these probabilistic models can be directly applied to content organization, and suggest that these models can prove useful in an even broader range of text-to-text applications than we have considered here.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Althaus, E., Karamanis, N., Koller, A.: Computing locally coherent discourses. In: Proceedings of the ACL, pp. 399–406 (2004)

    Google Scholar 

  2. Barzilay, R., Lapata, M.: Modeling local coherence: An entity-based approach. Computational Linguistics 34(1), 1–34 (2008)

    Article  Google Scholar 

  3. Barzilay, R., Lee, L.: Catching the drift: Probabilistic content models, with applications to generation and summarization. In: HLT-NAACL, pp. 113–120 (2004)

    Google Scholar 

  4. Chen, H., Branavan, S., Barzilay, R., Karger, D.R.: Content modeling using latent permutations. JAIR, 129–163 (2009)

    Google Scholar 

  5. Elsner, M., Austerweil, J., Charniak, E.: A unified local and global model for discourse coherence. In: Proceedings of HLT-NAACL, pp. 436–443 (2007)

    Google Scholar 

  6. Fligner, M., Verducci, J.: Distance based ranking models. Journal of the Royal Statistical Society, Series B 48(3), 359–369 (1986)

    MathSciNet  MATH  Google Scholar 

  7. Foltz, P.W., Kintsch, W., Landauer, T.K.: Textual coherence using latent semantic analysis. Discourse Processes 25(2&3), 285–307 (1998)

    Article  Google Scholar 

  8. Grosz, B., Joshi, A.K., Weinstein, S.: Centering: A framework for modeling the local coherence of discourse. Computational Linguistics 21(2), 203–225 (1995)

    Google Scholar 

  9. Grosz, B.J., Sidner, C.L.: Attention, intentions, and the structure of discourse. Computational Linguistics 12(3), 175–204 (1986)

    Google Scholar 

  10. Halliday, M.A.K., Hasan, R.: Cohesion in English. Longman, London (1976)

    Google Scholar 

  11. Harris, Z.: Discourse and sublanguage. In: Kittredge, R., Lehrberger, J. (eds.) Sublanguage: Studies of Language in Restricted Semantic Domains, pp. 231–236. Walter de Gruyter, Berlin (1982)

    Google Scholar 

  12. Hasler, L.: An investigation into the use of centering transitions for summarisation. In: Proceedings of the 7th Annual CLUK Research Colloquium., pp. 100–107. University of Birmingham (2004)

    Google Scholar 

  13. Karamanis, N.: Exploring entity-based coherence. In: Proceedings of CLUK4, Sheffield, UK, pp. 18–26 (2001)

    Google Scholar 

  14. Karamanis, N., Poesio, M., Mellish, C., Oberlander, J.: Evaluating centering-based metrics of coherence for text structuring using a reliably annotated corpus. In: Proceedings of the ACL, pp. 391–398 (2004)

    Google Scholar 

  15. Kittredge, R., Korelsky, T., Rambow, O.: On the need for domain communication language. Computational Intelligence 7(4), 305–314 (1991)

    Article  Google Scholar 

  16. Lapata, M.: Probabilistic text structuring: Experiments with sentence ordering. In: Proceedings of the ACL, pp. 545–552 (2003)

    Google Scholar 

  17. Lin, C.Y.: ROUGE: A package for automatic evaluation of summaries. In: Proceedings of ACL, pp. 74–81 (2004)

    Google Scholar 

  18. Mani, I., Maybury, M.T.: Advances in Automatic Text Summarization. The MIT Press, Cambridge (1999)

    Google Scholar 

  19. Mann, W.C., Thompson, S.A.: Rhetorical structure theory: Toward a functional theory of text organization. TEXT 8(3), 243–281 (1988)

    Google Scholar 

  20. Marcu, D.: The rhetorical parsing of natural language texts. In: Proceedings of the ACL/EACL, pp. 96–103 (1997)

    Google Scholar 

  21. Marcu, D.: The Theory and Practice of Discourse Parsing and Summarization. MIT Press, Cambridge (2000)

    MATH  Google Scholar 

  22. McKeown, K.R.: Text Generation: Using Discourse Strategies and Focus Constraints to Generate Natural Language Text. Cambridge University Press, Cambridge (1985)

    Book  Google Scholar 

  23. Miltsakaki, E., Kukich, K.: The role of centering theory’s rough-shift in the teaching and evaluation of writing skills. In: Proceedings of the ACL, pp. 408–415 (2000)

    Google Scholar 

  24. Poesio, M., Stevenson, R., Eugenio, B.D., Hitzeman, J.: Centering: a parametric theory and its instantiations. Computational Linguistics 30(3), 309–363 (2004)

    Article  Google Scholar 

  25. Rambow, O.: Domain communication knowledge. In: Fifth International Workshop on Natural Language Generation, pp. 87–94 (1990)

    Google Scholar 

  26. Sauper, C., Barzilay, R.: Automatically generating wikipedia articles: A structure-aware approach. In: Proceedings of the ACL/IJCNLP, pp. 208–216 (2009)

    Google Scholar 

  27. Sproat, R.: Morphology and Computation. MIT Press, Cambridge (1993)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Barzilay, R. (2010). Probabilistic Approaches for Modeling Text Structure and Their Application to Text-to-Text Generation. In: Krahmer, E., Theune, M. (eds) Empirical Methods in Natural Language Generation. EACL ENLG 2009 2009. Lecture Notes in Computer Science(), vol 5790. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15573-4_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15573-4_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15572-7

  • Online ISBN: 978-3-642-15573-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics