Skip to main content

Intra-document and Inter-document Redundancy in Multi-document Summarization

  • Conference paper
  • First Online:
Advances in Computational Intelligence (MICAI 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10061))

Included in the following conference series:

Abstract

Multi-document summarization differs from single-document summarization in excessive redundancy of mentions of some events or ideas. We show how the amount of redundancy in a document collection can be used for assigning importance to sentences in multi-document extractive summarization: for instance, an idea could be important if it is redundant across documents because of its popularity; on the other hand, an idea could be important if it is not redundant across documents because of its novelty. We propose an unsupervised graph-based technique that, based on proper similarity measures, allows us to experiment with intra-document and inter-document redundancy. Our experiments on DUC corpora show promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.statmt.org/lm-benchmark/.

References

  1. Nenkova, A., McKeown, K.: Automatic summarization. Found. Trends Inf. Retr. 5(2–3), 103–233 (2011)

    Article  Google Scholar 

  2. Cambria, E., Poria, S., Gelbukh, A., Kwok, K.: Sentic API: a common-sense based API for concept-level sentiment analysis. In: Proceedings of the 4th Workshop on Making Sense of Microposts, Co-located with WWW 2014, 23rd International World Wide Web Conference, Number 1141 in CEUR Workshop Proceedings (2014)

    Google Scholar 

  3. Poria, S., Gelbukh, A., Agarwal, B., Cambria, E., Howard, N.: Common sense knowledge based personality recognition from text. In: Castro, F., Gelbukh, A., González, M. (eds.) MICAI 2013. LNCS, vol. 8266, pp. 484–496. Springer, Heidelberg (2013). doi:10.1007/978-3-642-45111-9_42

    Chapter  Google Scholar 

  4. Poria, S., Gelbukh, A., Hussain, A., Howard, N., Das, D., Bandyopadhyay, S.: Enhanced SenticNet with affective labels for concept-based opinion mining. IEEE Intell. Syst. 28, 31–38 (2013)

    Article  Google Scholar 

  5. Cambria, E., Poria, S., Bajpai, R., Schuller, B.: SenticNet 4: A semantic resource for sentiment analysis based on conceptual primitives. In: 26th International Conference on Computational Linguistics (COLING 2016), Osaka, Japan (2016)

    Google Scholar 

  6. Poria, S., Cambria, E., Hazarika, D., Vij, P.: A deeper look into sarcastic tweets using deep convolutional neural networks. In: 26th International Conference on Computational Linguistics (COLING 2016), Osaka, Japan, pp. 1601–1612 (2016)

    Google Scholar 

  7. Celikyilmaz, A., Hakkani-Tur, D.: A hybrid hierarchical model for multi-document summarization. In: Proceedings of 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010), Uppsala, Sweden, pp. 815–824 (2010)

    Google Scholar 

  8. Prasad Pingali, R.K, Varma, V.: IIIT hyderabad at DUC 2007. In: Proceedings of 7th Document Understanding Conference (DUC 2007), Rochester, NY (2007)

    Google Scholar 

  9. Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1998), Melbourne, Australia. pp. 335–336 (1998)

    Google Scholar 

  10. Li, Y., Li, S.: Query-focused multi-document summarization: Combining a topic model with graph-based semi-supervised learning. In: Proceedings of 25th International Conference on Computational Linguistics (COLING 2014), Dublin, Ireland, pp. 1197–1207 (2014)

    Google Scholar 

  11. Ouyang, Y., Li, W., Li, S., Qin, L.: Applying regression models to query-focused multi-document summarization. Inf. Process. Manag. 47(2), 227–237 (2011)

    Article  Google Scholar 

  12. Radev, D.R., Jing, H., Styś, M., Tam, D.: Centroid-based summarization of multiple documents. Inf. Process. Manag. 40(6), 919–938 (2004)

    Article  MATH  Google Scholar 

  13. Toutanova, K., Brockett, C., Gamon, M., Jagarlamudi, J., Suzuki, H., Vanderwende, L.: The PYTHY summarization system: Microsoft research at DUC 2007. In: Proceedings of 7th Document Understanding Conference(DUC 2007), Rochester, NY (2007)

    Google Scholar 

  14. Parveen, D., Strube, M.: Multi-document summarization using bipartite graphs. In: Proceedings of TextGraphs-9: Graph-based Methods for Natural Language Processing, Workshop at EMNLP 2014, Doha, Qatar, pp. 15–24 (2014)

    Google Scholar 

  15. Galanis, D., Lampouras, G., Androutsopoulos, I.: Extractive multi-document summarization with integer linear programming and support vector regression. In: Proceedings of 24th International Conference on Computational Linguistics (COLING 2012), Mumbai, India, pp. 911–926 (2012)

    Google Scholar 

  16. Mihalcea, R., Tarau, P.: A language independent algorithm for single and multiple document summarization. In: Proceedings of 2nd International Join Conference on Natural Language Processing(IJCNLP 2005), Jeju Island, Korea, pp. 19–24 (2005)

    Google Scholar 

  17. Shen, C., Li, T.: Multi-document summarization via the minimum dominating set. In: Proceedings of 23rd International Conference on Computational Linguistics (COLING 2010), Beijing, China, vol. 2, pp. 984–992 (2010)

    Google Scholar 

  18. Mihalcea, R., Tarau, P.: TextRank: Bringing order into texts. In: Proceedings of 9th conference on Empirical Methods in Natural Language Processing (EMNLP 2004), Barcelona, Spain, vol. 4, pp. 404–411 (2004)

    Google Scholar 

  19. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of 31st International Conference on Machine Learning (ICML 2014), Beijing, China, pp. 1188–1196 (2014)

    Google Scholar 

  20. Erkan, G., Radev, D.R.: LexPageRank: Prestige in multi-document text summarization. In: Proceedings of 9th conference Empirical Methods in Natural Language Processing (EMNLP 2004), Barcelona, Spain, pp. 365–371 (2004)

    Google Scholar 

  21. Lin, C.Y., Hovy, E.: Automatic evaluation of summaries using n-gram co-occurrence statistics. In: Proceedings of 1st Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL 2003), Edmonton, Canada, pp. 71–78 (2003)

    Google Scholar 

  22. Poria, S., Cambria, E., Gelbukh, A., Bisio, F., Hussain, A.: Sentiment data flow analysis by means of dynamic linguistic patterns. IEEE Comput. Intell. Mag. 10, 26–36 (2015)

    Article  Google Scholar 

  23. Majumder, N., Poria, S., Gelbukh, A., Cambria, E.: Deep learning based document modeling for personality detection from text. IEEE Intell. Syst. 32, 74–79 (2017)

    Article  Google Scholar 

  24. Poria, S., Cambria, E., Bajpai, R., Hussain, A.: A review of affective computing: From unimodal analysis to multimodal fusion. Inf. Fus. 37, 98–125 (2017)

    Article  Google Scholar 

  25. Poria, S., Peng, H., Hussain, A., Howard, N., Cambria, E.: Ensemble application of convolutional neural networks and multiple kernel learning for multimodal sentiment analysis. Neurocomputing, page in press (2017)

    Google Scholar 

  26. Chikersal, P., Poria, S., Cambria, E., Gelbukh, A., Siong, C.E.: Modelling public sentiment in twitter: using linguistic patterns to enhance supervised learning. In: Gelbukh, A. (ed.) CICLing 2015. LNCS, vol. 9042, pp. 49–65. Springer, Cham (2015). doi:10.1007/978-3-319-18117-2_4

    Google Scholar 

  27. Pakray, P., Neogi, S., Bhaskar, P., Poria, S., Bandyopadhyay, S., Gelbukh, A.: A textual entailment system using anaphora resolution. In: Text Analysis Conference, Recognizing Textual Entailment Track (TAC RTE), System Report. Notebook (2011)

    Google Scholar 

  28. Pakray, P., Poria, S., Bandyopadhyay, S., Gelbukh, A.: Semantic textual entailment recognition using UNL. POLIBITS 43, 23–27 (2011)

    Article  Google Scholar 

  29. Pakray, P., Pal, S., Poria, S., Bandyopadhyay, S., Gelbukh, A.: JU CSE TAC: Textual entailment recognition system at TAC RTE-6. In: Text Analysis Conference, Recognizing Textual Entailment Track (TAC RTE), System Report. Notebook (2010)

    Google Scholar 

  30. Poria, S., Cambria, E., Gelbukh, A.: Aspect extraction for opinion mining with a deep convolutional neural network. Knowl. Based Syst. 108, 42–49 (2016)

    Article  Google Scholar 

  31. Poria, S., Chaturvedi, I., Cambria, E., Hussain, A.: Convolutional MKL based multimodal emotion recognition and sentiment analysis. In: 16th International Conference on Data Mining (ICDM 2016), pp. 439–448. IEEE (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Pabel Carrillo-Mendoza , Hiram Calvo or Alexander Gelbukh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Carrillo-Mendoza, P., Calvo, H., Gelbukh, A. (2017). Intra-document and Inter-document Redundancy in Multi-document Summarization. In: Sidorov, G., Herrera-Alcántara, O. (eds) Advances in Computational Intelligence. MICAI 2016. Lecture Notes in Computer Science(), vol 10061. Springer, Cham. https://doi.org/10.1007/978-3-319-62434-1_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-62434-1_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-62433-4

  • Online ISBN: 978-3-319-62434-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics