Abstract
In this work we investigate the use of graphs for multi-document summarization. We adapt the traditional Relationship Map approach to the multi-document scenario and, in a hybrid approach, we consider adding CST (Cross-document Structure Theory) relations to this adapted model. We also investigate some measures derived from graphs and complex networks for sentence selection. We show that the superficial graph-based methods are promising for the task. More importantly, some of them perform almost as good as a deep approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Afantenos, S.D., Doura, I., Kapellou, E., Karkaletsis, V.: Exploiting Cross-Document Relations for Multi-document Evolving Summarization. In: Proceedings of the 3rd Hellenic Conference on Artificial Intelligence, Samos Island, Greece, May 5-8, pp. 410–419 (2004)
Albert, R., Barabási, A.L.: Statistical mechanics of complex networks. Reviews of Modern Physics 74(1), 47–97 (2002)
Antiqueira, L.: Desenvolvimento de Técnicas Baseadas em Redes Complexas para Sumarização Extrativa de Textos. MSc Dissertation. Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo. São Carlos/SP, Brazil, p. 124 (March 2007)
Antiqueira, L., Oliveira Jr., O.N., Costa, L.F., Nunes, M.G.V.: A Complex Network Approach to Text Summarization. Information Sciences 179(5), 584–599 (2009)
Cardoso, P.C.F., Maziero, E.G., Castro Jorge, M.L.R., Seno, E.M.R., Di Felippo, A., Rino, L.H.M., Nunes, M.G.V., Pardo, T.A.S.: CSTNews - A Discourse-Annotated Corpus for Single and Multi-Document Summarization of News Texts in Brazilian Portuguese. In: Proceedings of the 3rd RST Brazilian Meeting, Cuiabá/MT, Brazil, October 26, pp. 88–105 (2011)
Cardoso, P.C.F., Pardo, T.A.S., Nunes, M.G.V.: Métodos para Sumarização Automática Multidocumento Usando Modelos Semântico-Discursivos. In: Proceedings of the 3rd RST Brazilian Meeting, Cuiabá/MT, Brazil, October 26, pp. 59–74 (2011)
Castro Jorge, M.L.R., Pardo, T.A.S.: Experiments with CST-based Multidocument Summarization. In: Proceedings of the ACL Workshop TextGraphs-5: Graph-based Methods for Natural Language Processing, Uppsala, Sweden, July 16, pp. 74–82 (2010)
Castro Jorge, M.L.R.: Sumarização automática multidocumento: seleção de conteúdo com base no Modelo CST (Cross-document Structure Theory). MSc Dissertation. Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo. São Carlos/SP, Brazil, p. 86 (April 2010)
Castro Jorge, M.L.R., Agostini, V., Pardo, T.A.S.: Multi-document Summarization Using Complex and Rich Features. In: Anais do VIII Encontro Nacional de Inteligência Artificial, Natal/RN, Brazil, July 19-22, pp. 1–12 (2011)
Castro Jorge, M.L.R., Pardo, T.A.S.: A Generative Approach for Multi-Document Summarization using the Noisy Channel Model. In: Proceedings of the 3rd RST Brazilian Meeting, Cuiabá/MT, Brazil, October 26, pp. 75–87 (2011)
Erkan, G., Radev, D.: LexRank: Graph-based Lexical Centrality as Salience in Text Summarization. Journal of Artificial Intelligence Research 22(1), 457–479 (2004)
Gantz, J., Reinsel, D.: Extracting Values from Chaos. IDC IView (June 2011)
Leite, D.S.: Um Estudo Comparativo de Modelos Baseados em Estatísticas Textuais, Grafos e Aprendizado de Máquina para Sumarização Automática de Textos em Português. MSc Dissertation. Departamento de Computação, Universidade Federal de São Carlos. São Carlos/SP, Brazil, p. 231 (December 2010)
Lima, J.B.P., Pardo, T.A.S.: Ordenação de Sentenças em Sumários Multidocumento: Uma Abordagem Utilizando Relações CST. In: Proceedings of the 2nd STIL Student Workshop on Information and Human Language Technology, Cuiabá/MT, Brazil, October 24-25, pp. 1–3 (2011)
Lin, C.Y., Hovy, E.: Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Edmonton, Canada, May 27 - June 1, pp. 71–78 (2003)
Louis, A., Joshi, A., Nenkova, A.: Discourse indicators for content selection in summarization. In: Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialog, Tokyo, Japan, September 24-25, pp. 147–156 (2010)
Mani, I.: Automatic Summarization. John Benjamins Publishing Co., Amsterdam (2001)
Mani, I., Bloedorn, E.: Summarizing Similarities and Differences Among Related Documents. Information Retrieval 1(1-2), 35–67 (1997)
Maziero, E.G., Castro Jorge, M.L.R., Pardo, T.A.S.: Identifying Multidocument Relations. In: Proceedings of the 7th International Workshop on Natural Language Processing and Cognitive Science, Funchal/Madeira, Portugal, June 8-12, pp. 60–69 (2010)
Maziero, E.G., Pardo, T.A.S.: Multi-Document Discourse Parsing Using Traditional and Hierarchical Machine Learning. In: Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology, Cuiabá/MT, Brazil, October 24-26, pp. 1–10 (2011)
Mihalcea, R., Radev, D.: Graph-based Natural Language Processing and Information Retrieval. Cambridge University Press (2011)
Mihalcea, R., Tarau, P.: An Algorithm for Language Independent Single and Multiple Document Summarization. In: Proceedings of the 2nd International Joint Conference on Natural Language Processing, Jeju Island, Korea, October 11-13 (2005)
Pardo, T.A.S., Rino, L.H.M.: TeMário: Um Corpus para Sumarização Automática de Textos. Technical Report NILC-TR-03-09. Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo. São Carlos/SP, Brazil, p. 13 (October 2003)
Pardo, T.A.S., Rino, L.H.M., Nunes, M.G.V.: GistSumm: A Summarization Tool Based on a New Extractive Method. In: Proceedings of the 6th Workshop on Computational Processing of the Portuguese Language - Written and Spoken, Faro, Portugal, June 26-27, pp. 210–218 (2003)
Pardo, T.A.S.: GistSumm - GIST SUMMarizer: Extensões e Novas Funcionalidades. Technical Report NILC-TR-05-05. Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo. São Carlos/SP, Brazil, p. 8 (February 2005)
Radev, D.R.: A common theory of information fusion from multiple text sources, step one: Cross-document structure. In: Proceedings of the 1st ACL SIGDIAL Workshop on Discourse and Dialogue, Hong Kong, China, October 7-8 (2000)
Radev, D.R., Jung, H., Budzikowska, M.: Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation and user studies. In: Proceedings of the ANLP/NAACL Workshop on Automatic Summarization, Seattle, USA, April 30, pp. 21–30 (2000)
Radev, D.R., Blair-Goldensohn, S., Zhang, Z.: Experiments in single and multidocument summarization using MEAD. In: Proceedings of the 1st DUC Workshop on Text Summarization, New Orleans, USA, September 13-14 (2001)
Radev, D.R., Blair-Goldensohn, S., Zhang, Z., Raghavan, R.S.: NewsInEssence: A system for domain-independent, real-time news clustering and multi-document summarization. In: Proceedings of the 1st International Conference on Human Language Technology Research, San Diego, USA, March 18-21 (2001)
Salton, G.: Automatic text processing. Addison-Wesley Longman Publishing Co., Inc., Boston (1988)
Salton, G., Singhal, A., Mitra, M., Buckley, C.: Automatic Text Structuring And Summarization. Information Processing & Management 33(2), 193–207 (1997)
Wan, X.: An Exploration of Document Impact on Graph-Based Multi-Document Summarization. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Waikiki, USA, October 25-27, pp. 755–762 (2008)
Watts, D.J., Strogatz, S.H.: Collective dynamics of ’small-world’ networks. Nature 393, 440–442 (1998)
Zhang, Z., Blair-Goldensohn, S., Radev, D.R.: Towards CST-enhanced summarization. In: Proceedings of the 18th National Conference on Artificial Intelligence, Edmonton, Canada, July 28 - August 1, pp. 439–446 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ribaldo, R., Akabane, A.T., Rino, L.H.M., Pardo, T.A.S. (2012). Graph-Based Methods for Multi-document Summarization: Exploring Relationship Maps, Complex Networks and Discourse Information. In: Caseli, H., Villavicencio, A., Teixeira, A., Perdigão, F. (eds) Computational Processing of the Portuguese Language. PROPOR 2012. Lecture Notes in Computer Science(), vol 7243. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28885-2_30
Download citation
DOI: https://doi.org/10.1007/978-3-642-28885-2_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28884-5
Online ISBN: 978-3-642-28885-2
eBook Packages: Computer ScienceComputer Science (R0)