Probabilistic Approaches for Modeling Text Structure and Their Application to Text-to-Text Generation

Barzilay, Regina

doi:10.1007/978-3-642-15573-4_1

Regina Barzilay²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5790))

Included in the following conference series:

1228 Accesses
1 Citations

Abstract

Since the early days of generation research, it has been acknowledged that modeling the global structure of a document is crucial for producing coherent, readable output. However, traditional knowledge-intensive approaches have been of limited utility in addressing this problem since they cannot be effectively scaled to operate in domain-independent, large-scale applications. Due to this difficulty, existing text-to-text generation systems rarely rely on such structural information when producing an output text. Consequently, texts generated by these methods do not match the quality of those written by humans – they are often fraught with severe coherence violations and disfluencies.

In this chapter, I will present probabilistic models of document structure that can be effectively learned from raw document collections. This feature distinguishes these new models from traditional knowledge intensive approaches used in symbolic concept-to-text generation. Our results demonstrate that these probabilistic models can be directly applied to content organization, and suggest that these models can prove useful in an even broader range of text-to-text applications than we have considered here.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Althaus, E., Karamanis, N., Koller, A.: Computing locally coherent discourses. In: Proceedings of the ACL, pp. 399–406 (2004)
Google Scholar
Barzilay, R., Lapata, M.: Modeling local coherence: An entity-based approach. Computational Linguistics 34(1), 1–34 (2008)
Article Google Scholar
Barzilay, R., Lee, L.: Catching the drift: Probabilistic content models, with applications to generation and summarization. In: HLT-NAACL, pp. 113–120 (2004)
Google Scholar
Chen, H., Branavan, S., Barzilay, R., Karger, D.R.: Content modeling using latent permutations. JAIR, 129–163 (2009)
Google Scholar
Elsner, M., Austerweil, J., Charniak, E.: A unified local and global model for discourse coherence. In: Proceedings of HLT-NAACL, pp. 436–443 (2007)
Google Scholar
Fligner, M., Verducci, J.: Distance based ranking models. Journal of the Royal Statistical Society, Series B 48(3), 359–369 (1986)
MathSciNet MATH Google Scholar
Foltz, P.W., Kintsch, W., Landauer, T.K.: Textual coherence using latent semantic analysis. Discourse Processes 25(2&3), 285–307 (1998)
Article Google Scholar
Grosz, B., Joshi, A.K., Weinstein, S.: Centering: A framework for modeling the local coherence of discourse. Computational Linguistics 21(2), 203–225 (1995)
Google Scholar
Grosz, B.J., Sidner, C.L.: Attention, intentions, and the structure of discourse. Computational Linguistics 12(3), 175–204 (1986)
Google Scholar
Halliday, M.A.K., Hasan, R.: Cohesion in English. Longman, London (1976)
Google Scholar
Harris, Z.: Discourse and sublanguage. In: Kittredge, R., Lehrberger, J. (eds.) Sublanguage: Studies of Language in Restricted Semantic Domains, pp. 231–236. Walter de Gruyter, Berlin (1982)
Google Scholar
Hasler, L.: An investigation into the use of centering transitions for summarisation. In: Proceedings of the 7th Annual CLUK Research Colloquium., pp. 100–107. University of Birmingham (2004)
Google Scholar
Karamanis, N.: Exploring entity-based coherence. In: Proceedings of CLUK4, Sheffield, UK, pp. 18–26 (2001)
Google Scholar
Karamanis, N., Poesio, M., Mellish, C., Oberlander, J.: Evaluating centering-based metrics of coherence for text structuring using a reliably annotated corpus. In: Proceedings of the ACL, pp. 391–398 (2004)
Google Scholar
Kittredge, R., Korelsky, T., Rambow, O.: On the need for domain communication language. Computational Intelligence 7(4), 305–314 (1991)
Article Google Scholar
Lapata, M.: Probabilistic text structuring: Experiments with sentence ordering. In: Proceedings of the ACL, pp. 545–552 (2003)
Google Scholar
Lin, C.Y.: ROUGE: A package for automatic evaluation of summaries. In: Proceedings of ACL, pp. 74–81 (2004)
Google Scholar
Mani, I., Maybury, M.T.: Advances in Automatic Text Summarization. The MIT Press, Cambridge (1999)
Google Scholar
Mann, W.C., Thompson, S.A.: Rhetorical structure theory: Toward a functional theory of text organization. TEXT 8(3), 243–281 (1988)
Google Scholar
Marcu, D.: The rhetorical parsing of natural language texts. In: Proceedings of the ACL/EACL, pp. 96–103 (1997)
Google Scholar
Marcu, D.: The Theory and Practice of Discourse Parsing and Summarization. MIT Press, Cambridge (2000)
MATH Google Scholar
McKeown, K.R.: Text Generation: Using Discourse Strategies and Focus Constraints to Generate Natural Language Text. Cambridge University Press, Cambridge (1985)
Book Google Scholar
Miltsakaki, E., Kukich, K.: The role of centering theory’s rough-shift in the teaching and evaluation of writing skills. In: Proceedings of the ACL, pp. 408–415 (2000)
Google Scholar
Poesio, M., Stevenson, R., Eugenio, B.D., Hitzeman, J.: Centering: a parametric theory and its instantiations. Computational Linguistics 30(3), 309–363 (2004)
Article Google Scholar
Rambow, O.: Domain communication knowledge. In: Fifth International Workshop on Natural Language Generation, pp. 87–94 (1990)
Google Scholar
Sauper, C., Barzilay, R.: Automatically generating wikipedia articles: A structure-aware approach. In: Proceedings of the ACL/IJCNLP, pp. 208–216 (2009)
Google Scholar
Sproat, R.: Morphology and Computation. MIT Press, Cambridge (1993)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA, 02139
Regina Barzilay

Authors

Regina Barzilay
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Humanities, Department of Communication and Information Sciences (DCI), Tilburg University, P.O.Box 90153, 5000 LE, Tilburg, The Netherlands
Emiel Krahmer
Human Media Interaction (HMI), Department of Electrical Engineering, Mathematics and Computer Science (EEMCS), University of Twente, P.O. Box 217, 7500 AE, Enschede, The Netherlands
Mariët Theune

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Barzilay, R. (2010). Probabilistic Approaches for Modeling Text Structure and Their Application to Text-to-Text Generation. In: Krahmer, E., Theune, M. (eds) Empirical Methods in Natural Language Generation. EACL ENLG 2009 2009. Lecture Notes in Computer Science(), vol 5790. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15573-4_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-15573-4_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15572-7
Online ISBN: 978-3-642-15573-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics