Intra-document and Inter-document Redundancy in Multi-document Summarization

Carrillo-Mendoza, Pabel; Calvo, Hiram; Gelbukh, Alexander

doi:10.1007/978-3-319-62434-1_9

Pabel Carrillo-Mendoza¹⁵,
Hiram Calvo¹⁵ &
Alexander Gelbukh¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10061))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

1486 Accesses
1 Citations

Abstract

Multi-document summarization differs from single-document summarization in excessive redundancy of mentions of some events or ideas. We show how the amount of redundancy in a document collection can be used for assigning importance to sentences in multi-document extractive summarization: for instance, an idea could be important if it is redundant across documents because of its popularity; on the other hand, an idea could be important if it is not redundant across documents because of its novelty. We propose an unsupervised graph-based technique that, based on proper similarity measures, allows us to experiment with intra-document and inter-document redundancy. Our experiments on DUC corpora show promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

UIDS: A Multilingual Document Summarization Framework Based on Summary Diversity and Hierarchical Topics

Automatic Multi-Document Summarization Based on Keyword Density and Sentence-Word Graphs

Article 07 June 2018

Literature Study on Multi-document Text Summarization Techniques

Notes

1.
http://www.statmt.org/lm-benchmark/.

References

Nenkova, A., McKeown, K.: Automatic summarization. Found. Trends Inf. Retr. 5(2–3), 103–233 (2011)
Article Google Scholar
Cambria, E., Poria, S., Gelbukh, A., Kwok, K.: Sentic API: a common-sense based API for concept-level sentiment analysis. In: Proceedings of the 4th Workshop on Making Sense of Microposts, Co-located with WWW 2014, 23rd International World Wide Web Conference, Number 1141 in CEUR Workshop Proceedings (2014)
Google Scholar
Poria, S., Gelbukh, A., Agarwal, B., Cambria, E., Howard, N.: Common sense knowledge based personality recognition from text. In: Castro, F., Gelbukh, A., González, M. (eds.) MICAI 2013. LNCS, vol. 8266, pp. 484–496. Springer, Heidelberg (2013). doi:10.1007/978-3-642-45111-9_42
Chapter Google Scholar
Poria, S., Gelbukh, A., Hussain, A., Howard, N., Das, D., Bandyopadhyay, S.: Enhanced SenticNet with affective labels for concept-based opinion mining. IEEE Intell. Syst. 28, 31–38 (2013)
Article Google Scholar
Cambria, E., Poria, S., Bajpai, R., Schuller, B.: SenticNet 4: A semantic resource for sentiment analysis based on conceptual primitives. In: 26th International Conference on Computational Linguistics (COLING 2016), Osaka, Japan (2016)
Google Scholar
Poria, S., Cambria, E., Hazarika, D., Vij, P.: A deeper look into sarcastic tweets using deep convolutional neural networks. In: 26th International Conference on Computational Linguistics (COLING 2016), Osaka, Japan, pp. 1601–1612 (2016)
Google Scholar
Celikyilmaz, A., Hakkani-Tur, D.: A hybrid hierarchical model for multi-document summarization. In: Proceedings of 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010), Uppsala, Sweden, pp. 815–824 (2010)
Google Scholar
Prasad Pingali, R.K, Varma, V.: IIIT hyderabad at DUC 2007. In: Proceedings of 7th Document Understanding Conference (DUC 2007), Rochester, NY (2007)
Google Scholar
Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1998), Melbourne, Australia. pp. 335–336 (1998)
Google Scholar
Li, Y., Li, S.: Query-focused multi-document summarization: Combining a topic model with graph-based semi-supervised learning. In: Proceedings of 25th International Conference on Computational Linguistics (COLING 2014), Dublin, Ireland, pp. 1197–1207 (2014)
Google Scholar
Ouyang, Y., Li, W., Li, S., Qin, L.: Applying regression models to query-focused multi-document summarization. Inf. Process. Manag. 47(2), 227–237 (2011)
Article Google Scholar
Radev, D.R., Jing, H., Styś, M., Tam, D.: Centroid-based summarization of multiple documents. Inf. Process. Manag. 40(6), 919–938 (2004)
Article MATH Google Scholar
Toutanova, K., Brockett, C., Gamon, M., Jagarlamudi, J., Suzuki, H., Vanderwende, L.: The PYTHY summarization system: Microsoft research at DUC 2007. In: Proceedings of 7th Document Understanding Conference(DUC 2007), Rochester, NY (2007)
Google Scholar
Parveen, D., Strube, M.: Multi-document summarization using bipartite graphs. In: Proceedings of TextGraphs-9: Graph-based Methods for Natural Language Processing, Workshop at EMNLP 2014, Doha, Qatar, pp. 15–24 (2014)
Google Scholar
Galanis, D., Lampouras, G., Androutsopoulos, I.: Extractive multi-document summarization with integer linear programming and support vector regression. In: Proceedings of 24th International Conference on Computational Linguistics (COLING 2012), Mumbai, India, pp. 911–926 (2012)
Google Scholar
Mihalcea, R., Tarau, P.: A language independent algorithm for single and multiple document summarization. In: Proceedings of 2nd International Join Conference on Natural Language Processing(IJCNLP 2005), Jeju Island, Korea, pp. 19–24 (2005)
Google Scholar
Shen, C., Li, T.: Multi-document summarization via the minimum dominating set. In: Proceedings of 23rd International Conference on Computational Linguistics (COLING 2010), Beijing, China, vol. 2, pp. 984–992 (2010)
Google Scholar
Mihalcea, R., Tarau, P.: TextRank: Bringing order into texts. In: Proceedings of 9th conference on Empirical Methods in Natural Language Processing (EMNLP 2004), Barcelona, Spain, vol. 4, pp. 404–411 (2004)
Google Scholar
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of 31st International Conference on Machine Learning (ICML 2014), Beijing, China, pp. 1188–1196 (2014)
Google Scholar
Erkan, G., Radev, D.R.: LexPageRank: Prestige in multi-document text summarization. In: Proceedings of 9th conference Empirical Methods in Natural Language Processing (EMNLP 2004), Barcelona, Spain, pp. 365–371 (2004)
Google Scholar
Lin, C.Y., Hovy, E.: Automatic evaluation of summaries using n-gram co-occurrence statistics. In: Proceedings of 1st Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL 2003), Edmonton, Canada, pp. 71–78 (2003)
Google Scholar
Poria, S., Cambria, E., Gelbukh, A., Bisio, F., Hussain, A.: Sentiment data flow analysis by means of dynamic linguistic patterns. IEEE Comput. Intell. Mag. 10, 26–36 (2015)
Article Google Scholar
Majumder, N., Poria, S., Gelbukh, A., Cambria, E.: Deep learning based document modeling for personality detection from text. IEEE Intell. Syst. 32, 74–79 (2017)
Article Google Scholar
Poria, S., Cambria, E., Bajpai, R., Hussain, A.: A review of affective computing: From unimodal analysis to multimodal fusion. Inf. Fus. 37, 98–125 (2017)
Article Google Scholar
Poria, S., Peng, H., Hussain, A., Howard, N., Cambria, E.: Ensemble application of convolutional neural networks and multiple kernel learning for multimodal sentiment analysis. Neurocomputing, page in press (2017)
Google Scholar
Chikersal, P., Poria, S., Cambria, E., Gelbukh, A., Siong, C.E.: Modelling public sentiment in twitter: using linguistic patterns to enhance supervised learning. In: Gelbukh, A. (ed.) CICLing 2015. LNCS, vol. 9042, pp. 49–65. Springer, Cham (2015). doi:10.1007/978-3-319-18117-2_4
Google Scholar
Pakray, P., Neogi, S., Bhaskar, P., Poria, S., Bandyopadhyay, S., Gelbukh, A.: A textual entailment system using anaphora resolution. In: Text Analysis Conference, Recognizing Textual Entailment Track (TAC RTE), System Report. Notebook (2011)
Google Scholar
Pakray, P., Poria, S., Bandyopadhyay, S., Gelbukh, A.: Semantic textual entailment recognition using UNL. POLIBITS 43, 23–27 (2011)
Article Google Scholar
Pakray, P., Pal, S., Poria, S., Bandyopadhyay, S., Gelbukh, A.: JU CSE TAC: Textual entailment recognition system at TAC RTE-6. In: Text Analysis Conference, Recognizing Textual Entailment Track (TAC RTE), System Report. Notebook (2010)
Google Scholar
Poria, S., Cambria, E., Gelbukh, A.: Aspect extraction for opinion mining with a deep convolutional neural network. Knowl. Based Syst. 108, 42–49 (2016)
Article Google Scholar
Poria, S., Chaturvedi, I., Cambria, E., Hussain, A.: Convolutional MKL based multimodal emotion recognition and sentiment analysis. In: 16th International Conference on Data Mining (ICDM 2016), pp. 439–448. IEEE (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

CIC, Instituto Politécnico Nacional, Avenue Juan de Dios Bátiz, 07738, Mexico City, Mexico
Pabel Carrillo-Mendoza, Hiram Calvo & Alexander Gelbukh

Authors

Pabel Carrillo-Mendoza
View author publications
You can also search for this author in PubMed Google Scholar
Hiram Calvo
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Gelbukh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Pabel Carrillo-Mendoza , Hiram Calvo or Alexander Gelbukh .

Editor information

Editors and Affiliations

Instituto Politécnico Nacional, Centro de Investigación en Computación, Mexico City, Mexico
Grigori Sidorov
Universidad Autónoma Metropolitana, Mexico City, Mexico
Oscar Herrera-Alcántara

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Carrillo-Mendoza, P., Calvo, H., Gelbukh, A. (2017). Intra-document and Inter-document Redundancy in Multi-document Summarization. In: Sidorov, G., Herrera-Alcántara, O. (eds) Advances in Computational Intelligence. MICAI 2016. Lecture Notes in Computer Science(), vol 10061. Springer, Cham. https://doi.org/10.1007/978-3-319-62434-1_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-62434-1_9
Published: 03 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62433-4
Online ISBN: 978-3-319-62434-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics