Extractive multi-document summarization based on textual entailment and sentence compression via knapsack problem

ALI NASERASADI; HAMID KHOSRAVI; FARAMARZ SADEGHI

doi:10.1017/S1351324918000414

Extractive multi-document summarization based on textual entailment and sentence compression via knapsack problem

Published online by Cambridge University Press: 31 October 2018

ALI NASERASADI ,

HAMID KHOSRAVI and

FARAMARZ SADEGHI

Show author details

ALI NASERASADI: Affiliation:
Department of Applied Mathematics, Faculty of Mathematics and Computer, Shahid Bahonar University of Kerman, Kerman, Iran e-mail: naserasadi@uk.ac.ir
HAMID KHOSRAVI: Affiliation:
Department of Computer Science, Faculty of Mathematics and Computer, Shahid Bahonar University of Kerman, Kerman, Iran e-mails: hkhosravi@uk.ac.ir, farsad@uk.ac.ir
FARAMARZ SADEGHI: Affiliation:
Department of Computer Science, Faculty of Mathematics and Computer, Shahid Bahonar University of Kerman, Kerman, Iran e-mails: hkhosravi@uk.ac.ir, farsad@uk.ac.ir

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

By increasing the amount of data in computer networks, searching and finding suitable information will be harder for users. One of the most widespread forms of information on such networks are textual documents. So exploring these documents to get information about their content is difficult and sometimes impossible. Multi-document text summarization systems are an aid to producing a summary with a fixed and predefined length, while covering the maximum content of the input documents. This paper presents a novel method for multi-document extractive summarization based on textual entailment relations and sentence compression via formulating the problem as a knapsack problem. In this approach, sentences of documents are ranked according to the extended Tf-Idf method, then entailment scores of selected sentences are computed. Through these scores, the final score of each sentence is calculated. Finally, by decreasing the lengths of sentences via sentence compression, the problem has been solved by greedy and dynamic Programming approaches to the knapsack problem. Experiments on standard summarization datasets and evaluating the results based on the Rouge system show that the suggested method, according to the best of our knowledge, has increased F-measure of query-based summarization systems by two per cent and F-measure of general summarization systems by five per cent.

Type: Article
Information: Natural Language Engineering , Volume 25 , Issue 1 , January 2019 , pp. 121 - 146

DOI: https://doi.org/10.1017/S1351324918000414 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2018

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Almeida, M., and Martins, A. 2013. Fast and robust compressive summarization with dual decomposition and multi-task learning. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 196–206.Google Scholar

Amini, M., and Usunier, N. 2009. Incorporating prior knowledge into a transductive ranking algorithm for multi-document summarization. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, pp. 704–5.Google Scholar

Amini, M., Usunier, N., and Gallinari, P., 2005. Automatic text summarization based on word-clusters and ranking algorithms. In Proceedings of the European Conference on Information Retrieval, Springer, Berlin, Heidelberg, pp. 142–56.Google Scholar

Baumel, T., Cohen, R., and Elhadad, M. 2014. Query-chain focused summarization. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 913–22.Google Scholar

Baxendale, P., 1958. Machine-made index for technical literature – an experiment. IBM Journal of Research and Development 2 (4): 354–61.Google Scholar

Bentivogli, L., Clark, P., Dagan, I., and Giampiccolo, D. 2009. The fifth PASCAL recognizing textual entailment challenge. In Proceedings of the Text Analysis Conference.Google Scholar

Berg-Kirkpatrick, T., Gillick, D., and Klein, D. 2011. Jointly learning to extract and compress. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 481–90.Google Scholar

Brazilay, R., and Elhadad, M., 1997. Using lexical chains for text summarization. In Proceedings of the Association for Computational Linguistics (ACL) workshop on intelligent scalable text summarization, vol. 17, Madrid, Spain, pp. 10–17.Google Scholar

Cai, X., and Li, W., 2013. Ranking through clustering: an integrated approach to multi-document summarization. IEEE Transactions on Audio, Speech, and Language Processing 21 (7): 1424–33.Google Scholar

Canhasi, E., and Kononenko, I., 2016. Weighted hierarchical archetypal analysis for multi-document summarization. Computer Speech & Language 37 (2016): 24–46.Google Scholar

Cao, Z., Li, W., Li, S., and Wei, F. 2017. Improving multi-document summarization via text classification. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, pp. 3053–9.Google Scholar

Cao, Z., Li, W., Li, S., Wei, F., and Li, Y., 2016. Attsum: joint learning of focusing and summarization with neural attention. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers (COLING-2016), Osaka, Japan, pp. 547–56.Google Scholar

Cao, Z., Wei, F., Dong, L., Li, S., and Zhou, M. 2015. Ranking with recursive neural jnetworks and its application to multi-document summarization. In Proceedings of the 29th AAAI Conference on Artificial Intelligence, pp. 2153–2159.Google Scholar

Cao, Z., Wei, F., Li, S., Li, W., Zhou, M., and Wang, H. 2015. Learning summary prior representation for extractive summarization. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Short Papers), Beijing: China, pp. 829–33.Google Scholar

Christensen, J., Soderland, S., and Etzioni, O. 2013. Towards coherent multi-document summarization. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1163–73.Google Scholar

Chuang, W., and Yang, J. 2000. Extracting sentence segments for text summarization: a machine learning approach, In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, pp. 152–9.Google Scholar

Conroy, J., and O’leary, D. 2001. Text summarization via hidden markov models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, pp. 406–7.Google Scholar

Conroy, J., Schlesinger, J., and O’Leary, D. 2007. Classy 2007 at duc 2007. In Proceedings of the Document Understanding Conference.Google Scholar

Dagan, I., Glickman, O., and Magnini, B., 2006. The PASCAL recognising textual entailment challenge. In Machine Learning Challenges, Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Tectual Entailment, Springer, Berlin, Heidelberg, pp. 177–90.Google Scholar

Das, D., and Martins, A 2007. A Survey on Automatic Text Summarization. Literature Survey for the Language and Statistics II course at CMU 4, pp. 192–5.Google Scholar

Daume, H., and Marcu, D. 2005. Bayesian multi-document summarization at MSE. In ACL 2005, Workshop on Multilingual Summarization Evaluation (MSE).Google Scholar

Daume, H. III, and Marcu, D. 2006. Bayesian query-focused summarization. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, ACL, pp. 305–12.Google Scholar

Donaway, R., Drummey, K., and Mather, L. 2000. A comparison of rankings produced by summarization evaluation measures. In Proceedings of the 2000 NAACL-ANLP Workshop on Automatic Summarization, vol. 4, pp. 69–78.Google Scholar

Dunlavy, D., O’Leary, D., Conroy, J., and Schlesinger, J., 2007. QCS: a system for querying, clustering and summarizing documents. Information Processing and Management 43 (6): 1588–605.Google Scholar

Edmundson, H., 1969. New methods in automatic extracting. Journal of the ACM 16 (2): 264–85.Google Scholar

Erkan, G., and Radev, D., 2004. LexRank: graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research 22: 457–79.Google Scholar

Filatova, E., and Hatzivassiloglou, V. 2004. A formal model for information selection in multi-sentence text extraction. In Proceedings of the 20th International Conference on Computational Linguistics, ACL, p. 397.Google Scholar

Fuentes, M., Alfonseca, E., and Rodriguez, H. 2007. Support vector machines for query-focused summarization trained and evaluated on pyramid data. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, ACL, pp. 57–60.Google Scholar

Galley, M. 2006. A skip-chain conditional random field for ranking meeting utterances by importance. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, ACL, pp. 364–72.Google Scholar

Gong, Y., and Liu, X. 2001. Generic text summarization using relevance measure and latent semantic analysis. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 19–25.Google Scholar

Gupta, A., Kathuria, M., Singh, A., Sachdeva, A., and Bhati, S., 2012. Analog textual entailment and spectral clustering (atesc) based summarization. In Proceedings of the International Conference on Big Data Analytics, Springer, Berlin, Heidelberg, pp. 101–10.Google Scholar

Gupta, A., Kaur, M., Singh, A., Goel, A., and Mirkin, S. 2014. Text summarization through entailment-based minimum vertex cover. In Proceedings of the Third Joint Conference on Lexical and Computational Semantics (SEM-2014), pp. 75–80.Google Scholar

Haghighi, A., and Vanderwende, L. 2009. Exploring content models for multi-document summarization. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, ACL, pp. 362–70.Google Scholar

Hovy, E., and Lin, C. 1998. Automated text summarization and the SUMMARIST system. In Proceedings of a workshop on held at Baltimore Maryland, ACL, pp. 197–214.Google Scholar

He, Z., Chen, C., Bu, J., Wang, C., Zhang, L., Cai, D., and He, X. 2012. Document summarization based on data reconstruction. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (AAAI–2012).Google Scholar

Hirao, T., Yoshida, Y., Nishino, M., Yasuda, N., and Nagata, M. 2013. Single-document summarization as a tree knapsack problem. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP-2013), vol. 13, pp. 1515–20.Google Scholar

Hirao, T., Isozaki, H., Maeda, E., and Matsumoto, Y., 2002. Extracting important sentences with support vector machines. In Proceedings of the 19th International Conference on Computational Linguistics, ACL, vol. 1, pp. 1–7.Google Scholar

Hong, K., Marcus, M., and Nenkova, A. 2015. System combination for multi-document summarization. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP-2015), pp. 107–17.Google Scholar

Hong, K., and Nenkova, A. 2014. Improving the estimation of word importance for news multi-document summarization. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL-2014), pp. 712–21.Google Scholar

Jin, R., Abu-Ata, M., Xiang, Y., and Ruan, N., 2008. Effective and efficient item set pattern summarization: regression-based approaches. In Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, USA, pp. 399–407.Google Scholar

Kaikhah, K., 2004. Automatic text summarization with neural networks. In Proceedings of the 2nd International IEEE Conference on Intelligent Systems, IEEE, vol 1, pp. 40–4.Google Scholar

Knight, K., and Marcu, D. 2000. Statistics-based summarization-step one: sentence compression. In Proceedings of the 17th National Conference on Artificial Intelligence and 12th Conference on Innovative Applications of Artificial Intelligence (AAAI/IAAI-2000), pp. 703–10.Google Scholar

Knight, K., and Marcu, D., 2002. Summarization beyond sentence extraction: a probabilistic approach to sentence compression. Artificial Intelligence 139 (1): 91–107.Google Scholar

Kutlu, M., Cigir, C., and Cicekli, I., 2010. Generic text summarization for Turkish. The Computer Journal 53 (8): 1315–1323.Google Scholar

Li, P., Bing, L., Lam, W., Li, H., and Liao, Y. 2015. Reader-aware multi-document summarization via sparse coding. In IJCAI, pp. 1270–1276.Google Scholar

Li, C., Liu, Y., and Zhao, L. 2015. Improving update summarization via supervised ILP and sentence reranking. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (HLT-NAACL-2015), pp. 1317–22.Google Scholar

Li, S., Ouyang, Y., Wang, W., and Sun, B. 2007. Multi-document summarization using support vector regression. In Proceedings of Document Understanding Conference (DUC-2007).Google Scholar

Lin, C. 2004. Rouge: a package for automatic evaluation of summaries. In Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, vol. 8.Google Scholar

Lin, C., Cao, G., Gao, J., and Nie, J. 2006. An information-theoretic approach to automatic evaluation of summaries. In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, ACL, pp. 463–70.Google Scholar

Lin, S., and Chen, B. 2010. A risk minimization framework for extractive speech summarization. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL, pp. 79–87.Google Scholar

Lin, C., and Hovy, E. 1997. Identifying topics by position. In Proceedings of the 5th Conference on Applied Natural Language Processing, ACL, pp. 283–90.Google Scholar

Lin, C., and Hovy, E., 2003. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, ACL, vol. 1, pp. 71–8.Google Scholar

Litvak, M., Last, M., and Friedman, M. 2010. A new approach to improving multilingual summarization using a genetic algorithm. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL, pp. 927–36.Google Scholar

Litvak, M., Vanetik, N., and Last, M. 2015. Krimping texts for better summarization. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1931–5.Google Scholar

Liu, F., and Liu, Y. 2009. From extractive to abstractive meeting summaries: can it be done by sentence compression?. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, ACL, pp. 261–4.Google Scholar

Lopez, C., Prince, V., and Roche, M. 2011. Automatic titling of articles using position and statistical information. In Proceedings of the Recent Advances in Natural Language Processing (RANLP-2011), pp. 727–32.Google Scholar

Lopez, M., Buenaga, M., and Gomez-Hidalgo, J., 2004. Multidocument summarization: an added value to clustering in interactive retrieval. ACM Transactions on Informations Systems 22 (2): 215–41.Google Scholar

Louis, A., and Nenkova, A. 2009. Automatically evaluating content selection in summarization without human models. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP-2009), pp. 306–14.Google Scholar

Madnani, N., Zajic, D., Dorr, B., Ayan, N., and Lin, J. 2007. Multiple alternative sentence compressions for automatic text summarization. In Proceedings of Document Understanding Conference (DUC-2007).Google Scholar

Magnini, B., Zanoli, R., Dagan, I., Eichler, K., Neumann, G., Noh, T., Pado, S., Stern, A., and Levy, O. 2014. The excitement open platform for textual inferences. In Proceedings of the Association for Computational Linguistics (System Demonstrations), pp. 43–8.Google Scholar

Mani, I., and Maybury, M. T. 1999. Advances in automatic text summarization, MIT Press, Cambridge, MA, USA.Google Scholar

Marcu, D. 1997. From discourse structures to text summaries. In Proceedings of the Association of Computer Linguistics (ACL) Workshop on Intelligent Scalable Text Summarization, pp. 82–8.Google Scholar

Martins, A. F., and Smith, N. A. 2009. Summarization with a joint model for sentence extraction and compression. In Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing, pp. 1–9.Google Scholar

Mason, R., and Charniak, E. 2011. Extractive multi-document summaries should explicitly not contain document-specific content. In Proceedings of the Workshop on Automatic Summarization for Different Genres, Media, and Languages, ACL, pp. 49–54.Google Scholar

Metzler, D., and Kanungo, T. 2008. Machine learned sentence selection strategies for query-biased summarization. In SIGIR Learning to Rank Workshop, pp. 40–7.Google Scholar

Nenkova, A. 2006. Understanding the Process of Multi-Document Summarization: Content Selection, Rewriting and Evaluation. PhD dissertation, Columbia University.Google Scholar

Nenkova, A., and McKeown, K., 2011. Automatic summarization. Foundations and Trends in Information Retrieval 5 (2–3): 103–233.Google Scholar

Nenkova, A., and McKeown, K., 2012. A survey of text summarization techniques. In Mining Text Data, Springer, USA, pp. 43–76.Google Scholar

Nenkova, A., and Passonneau, R. 2004. Evaluating content selection in summarization: the pyramid method. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL-2004): Main Proceedings, ACL, pp. 145–52.Google Scholar

Nenkova, A., Vanderwende, L., and McKeown, K. 2006. A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, pp. 573–80.Google Scholar

Nishikawa, H., Hirao, T., Makino, T., and Matsuo, Y. 2012. Text summarization model based on redundancy-constrained knapsack problem. In Proceedings of the International Conference on Computational Linguistics (COLING-2012) (Posters), pp. 893–902.Google Scholar

Orasan, C., Pekar, V., and Hasler, L. 2004. A comparison of summarisation methods based on term specificity estimation. In International Conference on Language Resources and Evaluation (LREC-2004).Google Scholar

Osborne, M., 2002. Using maximum entropy for sentence extraction. In Proceedings of the ACL-02 Workshop on Automatic Summarization, ACL, vol. 4, pp. 1–8.Google Scholar

Ouyang, Y., Li, W., Li, S., and Lu, Q., 2011. Applying regression models to query-focused multi-document summarization. Information Processing & Management 47 (2): 227–37.Google Scholar

Pado, S., Noh, T., Stern, A., Wang, R., and Zanoli, R., 2015. Design and realization of a modular architecture for textual entailment. Natural Language Engineering 21 (02): 167–200.Google Scholar

Pollock, J., and Zamora, A., 1999. Automatic abstracting research at chemical abstracts service. Advances in Automatic Text Summarization 15 (4): 43–49.Google Scholar

Radev, D., Hovy, E., and McKeown, K., 2002. Introduction to the special issue on summarization. Computational Linguistics 28 (4): 399–408.Google Scholar

Radev, D., Teufel, S., Saggion, H., Lam, W., Blitzer, J., Qi, H., Elebi, A., Liu, D., and Drabek, E. 2003. Evaluation challenges in large scale document summarization. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics (ACL-2003), pp. 375–82.Google Scholar

Rankel, P., Conroy, J., Dang, H., and Nenkova, A. 2013. A decade of automatic content evaluation of news summaries: reassessing the state of the art. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL-2013), vol. 2, pp. 131–6.Google Scholar

Riedhammer, K., Gillick, D., Favre, B., and Hakkani-Tur, D. 2008. Packing the meeting summarization knapsack. In Proceedings of the INTERSPEECH, pp. 2434–7.Google Scholar

Robertson, S., 2004. Understanding inverse document frequency: on theoretical arguments for IDF. Journal of Documentation 60 (5): 503–20.Google Scholar

Saggion, H., and Gaizauskas, R. 2004. Multi-document summarization by cluster/profile relevance and redundancy removal. In Proceedings of the Document Understanding Conference (DUC-2004).Google Scholar

Schluter, N., and Sogaard, A. 2015. Unsupervised extractive summarization via coverage maximization with syntactic and semantic concepts. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL-2015), vol. 2, pp. 840–4.Google Scholar

Shen, D., Sun, J., Li, H., Yang, Q., and Chen, Z. 2007. Document summarization using conditional random fields. In Proceedings of the 20th International Joint Conference on Artifical Intelligence (IJCAI-2007), vol. 7, pp. 2862–7.Google Scholar

Silva, G., Ferreira, R., Dueire Lins, R., Cabral, L., Oliveira, H., Simske, S., and Riss, M. 2015. Automatic text document summarization based on machine learning. In Proceedings of the ACM Symposium on Document Engineering, ACM, pp. 191–4.Google Scholar

Suzuki, Y., and Fukumoto, F. 2014. Detection of topic and its extrinsic evaluation through multi-document summarization. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL-2014), vol. 2, pp. 241–6.Google Scholar

Takamura, H., and Okumura, M. 2009. Text summarization model based on maximum coverage problem and its variant. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, ACL, pp. 781–9.Google Scholar

Tatar, D., Tamaianu-Morita, E., Mihis, A., and Lupsa, D., 2008. Summarization by logic segmentation and text entailment. Advances in Natural Language Processing and Applications 15: 26.Google Scholar

Toutanova, K., Brockett, C., Gamon, M., Jagarlamudi, J., Suzuki, H., and Vanderwende, L. 2007. The pythy summarization system: microsoft research at duc 2007. In Proceedings of the Document Understanding Conference (DUC-2007), vol. 2007.Google Scholar

Vanderwende, L., Suzuki, H., Brockett, C., and Nenkova, A., 2007. Beyond SumBasic: task-focused summarization with sentence simplification and lexical expansion. Information Processing & Management 43 (6): 1606–18.Google Scholar

Wang, L., Raghavan, H., Castelli, V., Florian, R., and Cardie, C. 2016. A sentence compression based framework to query-focused multi-document summarization. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). ACL. pp. 1384–1394.Google Scholar

Woodsend, K., and Lapata, M. 2012. Multiple aspect summarization using integer linear programming. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 233–43.Google Scholar

Yasunaga, M., Zhang, R., Meelu, K., Pareek, A., Srinivasan, K., and Radev, D. 2017. Graph-based neural multi-document summarization. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL-2017), Vancouver, Canada. pp. 452–62.Google Scholar

Zhou, L., and Hovy, E. 2003. A web-trained extraction summarization system. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, ACL, vol. 1, pp. 205–11.Google Scholar

Article contents

Extractive multi-document summarization based on textual entailment and sentence compression via knapsack problem

Abstract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests