A Bottom-Up Approach to Sentence Ordering for Multi-Document Summarization

Bollegala, Danushka; Okazaki, Naoaki; Ishizuka, Mitsuru

doi:10.1007/978-3-642-28569-1_12

Danushka Bollegala⁵,
Naoaki Okazaki⁶ &
Mitsuru Ishizuka⁷

Part of the book series: Theory and Applications of Natural Language Processing ((NLP))

Abstract

In Chap. 1, multi-document summarization is introduced as a potential solution to the information explosion problem. A major challenge in creating a summary from information extracted from multiple sources is to decide the order in which those information must be presented in the summary. Incorrect ordering of information selected from multiple sources would lead to misunderstandings. In this chapter, we discuss the challenges involved when ordering information selected from multiple sources and present several approaches to overcome those challenges. We also introduce several semi-automatic evaluation measures to empirically evaluate an ordering of sentences created by an algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recent advances in document summarization

Article 28 March 2017

Multi-Document Extractive Summarization as a Non-linear Combinatorial Optimization Problem

Robust Single-Document Summarizations and a Semantic Measurement of Quality

Notes

1.
Using the frequencies of words instead of the binary (0, 1) values as vector elements, did not have a positive impact in our experiments. We think this is because, compared to a document, a sentence typically has a lesser number of words, and a word does not appear many times in a single sentence.
2.
http://www.pascal-network.org
3.
www.mturk.com
4.
http://lr-www.pi.titech.ac.jp/tsc/tsc3-en.html
5.
http://research.nii.ac.jp/ntcir/index-en.html
6.
http://www.csie.ntu.edu.tw/~cjlin/libsvm/

References

Barzilay, R., Lee, L.: Catching the drift: probabilistic content models, with applications to generation and summarization. In: HLT-NAACL 2004: Proceedings of the Main Conference, Boston, pp. 113–120 (2004)
Google Scholar
Barzilay, R., Elhadad, N., McKeown, K.: Inferring strategies for sentence ordering in multidocument news summarization. J. Artif. Intell. Res. 17, 35–55 (2002)
Google Scholar
Bollegala, D., Okazaki, N., Ishizuka, M.: A bottom-up approach to sentence ordering for multi-document summarization. Inf. Process. Manag. 46(1), 89–109 (2010)
Google Scholar
Bos, J., Maekert, K.: Recognising textual entailment with logical inference. In: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT-EMNLP 2005), Vancouver, pp. 628–635 (2005)
Google Scholar
Carbonell, J., Goldstein, J.: The use of mmr, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retreival, Melbourne, pp. 335–336 (1998)
Google Scholar
Dagan, I., Glickman, O.: Probabilistic textual entailment: generic applied modeling of language variability. In: Proceedings of PASCAL Workshop on Learning Methods for Text Understanding and Mining, Grenoble (2004)
Google Scholar
Duboue, P., McKeown, K.: Empirically estimating order constraints for content planning in generation. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (ACL’01), Toulouse, pp. 172–179 (2001)
Google Scholar
Duboue, P., McKeown, K.: Content planner construction via evolutionary algorithms and a corpus-based fitness function. In: Proceedings of the Second International Natural Language Generation Conference (INLG’02), New York, pp. 89–96 (2002)
Google Scholar
Elhadad, N., McKeown, K.: Towards generating patient specific summaries of medical articles. In: Proceedings of the NAACL 2001 Workshop on Automatic Summarization, Pittsburgh (2001)
Google Scholar
Filatova, E., Hovy, E.: Assining time-stamps to event-clauses. In: Proceedings of the 2001 ACL Workshop on Temporal and Spatial Information Processing, Toulouse (2001)
Google Scholar
Ji, P.D., Pulman, S.: Sentence ordering with manifold-based classification in multi-document summarization. In: Proceedings of Empherical Methods in Natural Language Processing, Sydney, pp. 526–533 (2006)
Google Scholar
Karamanis, N., Manurung, H.M.: Stochastic text structuring using the principle of continuity. In: Proceedings of the Second International Natural Language Generation Conference (INLG’02). Columbia University, New York, pp. 81–88 (2002)
Google Scholar
Kendall, M.G.: A new measure of rank correlation. Biometrika 30, 81–93 (1938)
Google Scholar
Lapata, M.: Probabilistic text structuring: experiments with sentence ordering. In: Proceedings of the Annual Meeting of ACL 2003, Sapporo, pp. 545–552 (2003)
Google Scholar
Lapata, M.: Automatic evaluation of information ordering. Comput. Linguist. 32(4), 471–484 (2006)
Google Scholar
Lapata, M., Lascarides, A.: Learning sentence-internal temporal relations. J. Artif. Intell. Res. 27, 85–117 (2006)
Google Scholar
Lin, C., Hovy, E.: Neats:a multidocument summarizer. In: Proceedings of the Document Understanding Workshop (DUC) (2001)
Google Scholar
Mani, I., Wilson, G.: Robust temporal processing of news. In: Proceedings of the 38th Annual Meeting of ACL (ACL 2000), Hong Kong, pp. 69–76 (2000)
Google Scholar
Mani, I., Schiffman, B., Zhang, J.: Inferring temporal ordering of events in news. In: Proceedings of North American Chapter of the ACL on Human Language Technology (HLT-NAACL 2003), Edmonton, pp. 55–57 (2003)
Google Scholar
Mann, W., Thompson, S.: Rhetorical structure theory: toward a functional theory of text organization. Text 8(3), 243–281 (1988)
Google Scholar
McKeown, K., Klavans, J., Hatzivassiloglou, V., Barzilay, R., Eskin, E.: Towards multidocument summarization by reformulation: progress and prospects. In: AAAI/IAAI, Orlando, pp. 453–460 (1999)
Google Scholar
Okazaki, N., Matsuo, Y., Ishizuka, M.: Improving chronological sentence ordering by precedence relation. In: Proceedings of 20th International Conference on Computational Linguistics (COLING 04), Geneva, pp. 750–756 (2004)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, pp. 311–318 (2002)
Google Scholar
Platt, J.: Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In: Smola, J., et al. (eds.) Advances in Large Margin Classifiers, pp. 61–74. MIT, Cambridge (2000)
Google Scholar
Radev, D.R., McKeown, K.: Generating natural language summaries from multiple on-line sources. Comput. Linguist. 24(3), 469–500 (1999)
Google Scholar
Reiter, E., Dale, R.: Building Natural Language Generation Systems. Cambridge University Press, Cambridge/New York (2000)
Google Scholar
Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast – but is it good? evaluating non-expert annotations for natural language tasks. In: EMNLP’08, Honolulu (2008)
Google Scholar
Tukey, J.W.: Exploratory Data Analysis. Addison-Wesley, Reading (1977)
Google Scholar
Xia, F., Liu, T.Y., Wang, J., Zhang, W., Li, H.: Listwise approach to learning to rank: theory and algorithm. In: ICML 2008, Helsinki, pp. 1192–1199 (2008)
Google Scholar
Zanzotto, F.M., Moschitti, A.: Automatic learning of textual entailments with cross-pair similarities. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL, Sydney, pp. 401–408 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Information Science and Technology, The University of Tokyo, 7-3-1, Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
Danushka Bollegala
Department of System Information Sciences, Graduate School of Information Sciences, Tohoku University, 6-3-09 Aramakiaza-Aoba, Aoba-ku, Sendai, 980-8579, Japan
Naoaki Okazaki
Department of Information and Communication Engineering, Graduate School of Information Science and Technology, The University of Tokyo, 7-3-1, Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
Mitsuru Ishizuka

Authors

Danushka Bollegala
View author publications
You can also search for this author in PubMed Google Scholar
Naoaki Okazaki
View author publications
You can also search for this author in PubMed Google Scholar
Mitsuru Ishizuka
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Danushka Bollegala .

Editor information

Editors and Affiliations

Universite Sorbonne Nouvelle, LATTICE-CNRS, Ecole Normale Superieure and, rue d'Ulm 45, Paris, 75005, France
Thierry Poibeau
, Information & Communication Technologies, Universitat Pompeu Fabra, C/ Tanger 122-140, Barcelona, 08018, Spain
Horacio Saggion
Institute for Computer Science, Polish Acadmey of Science, ul. Jana Kazimierza 5, Warsaw, 01-248, Poland
Jakub Piskorski
Department of Computer Science, University of Helsinki, Gustaf Hällströmin katu 2, Helsinki, 00014, Finland
Roman Yangarber

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bollegala, D., Okazaki, N., Ishizuka, M. (2013). A Bottom-Up Approach to Sentence Ordering for Multi-Document Summarization. In: Poibeau, T., Saggion, H., Piskorski, J., Yangarber, R. (eds) Multi-source, Multilingual Information Extraction and Summarization. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28569-1_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-28569-1_12
Published: 12 July 2012
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28568-4
Online ISBN: 978-3-642-28569-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics