Abstract
Due to the expanding rate of scientific publications, it has become a necessity to summarize scientific documents to allow researchers to keep track of recent developments. In this paper, we formulate the scientific document summarization problem in a multi-view clustering (MVC) framework. Two views of the scientific documents, semantic and syntactic, are considered jointly in MVC framework. To obtain an improved partitioning corresponding to different views, a differential evolution algorithm is utilized as the underlying optimization strategy. After obtaining the final optimal partitioning, various sentence-level features like length, position of the sentences, among others, are investigated to extract high scoring sentences and these extracted sentences are used to form the summary. We have also investigated the use of (a) two embedding spaces to represent the sentences of the documents in semantic space; and (b) use of citation contexts to incorporate their importance in summarizing a given scientific document. Our proposed multi-view based clustering approach is purely unsupervised in nature and, does not utilize any labelled data for the generation of the summary. To show the potentiality of multi-view clustering, a single-view based clustering framework is also developed for the purpose of comparison. The performance of the multi-view based system in comparison to a single-view is first tested on some random articles selected from the recently released SciSummNet 2019 dataset using ROUGE measure. Then, two more datasets namely, CL-SciSumm 2016 and CL-SciSumm 2017 are considered for evaluation. Our investigations prove that the proposed approach yields better results than the existing supervised and unsupervised methods including transformer-based deep learning model, with statistically significant improvements.








Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
AbuRa’ed A, Chiruzzo L, Saggion H, Accuosto P, Bravo Serrano À (2017) Lastus/taln@ clscisumm-17: Cross-document sentence matching and scientific text summarization systems. In: BIRNDL@ SIGIR (2)
Aggarwal CC, Reddy CK (2014) Data clustering algorithms and application. CRC Press, Boca Raton
Alambo A, Lohstroh C, Madaus E, Padhee S, Foster B, Banerjee T, Thirunarayan K, Raymer M (2020) Topic-centric unsupervised multi-document summarization of scientific and news articles. arXiv:201108072
Beltagy I, Cohan A, Feigenblat G, Freitag D, Ghosal T, Hall K, Herrmannova D, Knoth P, Lo K, Mayr P et al (2021) Overview of the second workshop on scholarly document processing. Tech. rep. Oak Ridge National Lab.(ORNL). Oak Ridge, TN (United States)
Bornmann L, Mutz R (2015) Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references. J Assoc Inform Sci Technol 66(11):2215–2222
Burges CJ (2010) From ranknet to lambdarank to lambdamart: an overview. Learning 11 (23–581):81
Cagliero L, La Quatra M (2020) Extracting highlights of scientific articles: a supervised summarization approach. Expert Syst Applic 160:113,659
Cao Z, Li W, Wu D (2016) Polyu at cl-scisumm 2016. In: Proceedings of the joint workshop on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL), pp 132–138
Chandrasekaran MK, Feigenblat G, Freitag D, Ghosal T, Hovy E, Mayr P, Shmueli-Scheuer M, de Waard A (2020) Overview of the first workshop on scholarly document processing (sdp). In: Proceedings of the first workshop on scholarly document processing, pp 1–6
Cohan A, Goharian N (2018) Scientific document summarization via citation contextualization and scientific discourse. Int J Digit Libr 19(2–3):287–303
Cohan A, Soldaini L, Goharian N (2015) Matching citation text and cited spans in biomedical literature: a search-oriented approach. In: Proceedings of the 2015 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 1042–1048
Cohan A, Dernoncourt F, Kim DS, Bui T, Kim S, Chang W, Goharian N (2018) A discourse-aware attention model for abstractive summarization of long documents. arXiv:180405685
Collins E, Augenstein I, Riedel S (2017) A supervised approach to extractive summarisation of scientific papers. 1706.03946
Conroy J, Davis S (2015) Vector space models for scientific document summarization. In: Proceedings of the 1st workshop on vector space modeling for natural language processing, pp 186–191
Davis ST, Conroy JM, Schlesinger JD (2012) Occams–an optimal combinatorial covering algorithm for multi-document summarization. In: 2012 IEEE 12th international conference on data mining workshops. IEEE, pp 454–463
Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE Trans Evol Comput 6(2):182–197
Elkiss A, Shen S, Fader A, Erkan G, States D, Radev D (2008) Blind men and elephants: what do citation summaries tell us about a research article? Journal of the American Society for Information Science and Technologconroy2015vectory 59(1):51–62
Erkan G, Radev DR (2004) Lexrank: graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22:457–479
Hernández-Alvarez M, Gomez JM (2016) Survey about citation context analysis: Tasks, techniques, and resources. Nat Lang Eng 22(3):327–349
Hoang CDV, Kan MY (2010) Towards automated related work summarization. In: Coling 2010: Posters, pp 427–435
Huang S, Kang Z, Xu Z (2020) Auto-weighted multi-view clustering via deep matrix decomposition. Pattern Recogn 97:107,015
Ismayilov G, Topcuoglu HR (2020) Neural network based multi-objective evolutionary algorithm for dynamic workflow scheduling in cloud computing. Fut Gen Comput Syst 102:307–322
Jaidka K, Chandrasekaran MK, Rustagi S, Kan MY (2018) Insights from cl-scisumm 2016: the faceted scientific document summarization shared task. Int J Digit Libr 19(2–3):163– 171
Karimi S, Moraes L, Das A, Shakery A, Verma R (2018) Citance-based retrieval and summarization using ir and machine learning. Scientometrics 116(2):1331–1366
Kuang Y, Sun J, Gan X, Gong D, Liu Z, Zha M (2021) Dynamic multi-objective cooperative coevolutionary scheduling for mobile underwater wireless sensor networks. Comput Indus Eng 156:107,229
Kusner M, Sun Y, Kolkin N, Weinberger K (2015) From word embeddings to document distances. In: International conference on machine learning, pp 957–966
Lauscher A, Glavas G, Eckert K (2017) Citation-based summarization of scientific articles using semantic textual similarity. In: Proc. of the 2nd joint workshop on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL2017), Tokyo
Lauscher A, Glavaš G, Eckert K (2017) University of mannheim@ clscisumm-17: Citation-based summarization of scientific articles using semantic textual similarity. In: CEUR workshop proceedings, RWTH 2002, pp 33–42
Lei Z, Gao S, Zhang Z, Zhou MC, Cheng J (2021) Mo4: a many-objective evolutionary algorithm for protein structure prediction. IEEE Transactions on Evolutionary Computation
Li L, Mao L, Zhang Y, Chi J, Huang T, Cong X, Peng H (2016) Cist system for cl-scisumm 2016 shared task. In: Proceedings of the joint workshop on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL), pp 156–167
Li L, Zhang Y, Mao L, Chi J, Chen M, Huang Z (2017) Cist@ clscisumm-17: Multiple features based citation linkage, classification and summarization. In: BIRNDL@ SIGIR (2)
Li X, Zhang H, Wang R, Nie F (2020) Multi-view clustering: a scalable and parameter-free bipartite graph fusion method. IEEE Transactions on Pattern Analysis and Machine Intelligence
Liang J, Qiao K, Yue C, Yu K, Qu B, Xu R, Li Z, Hu Y (2021) A clustering-based differential evolution algorithm for solving multimodal multi-objective optimization problems. Swarm Evol Comput 60:100,788
Lin CY (2004) Rouge: a package for automatic evaluation of summaries. In: Text summarization branches out, pp 74–81
Liu Y, Lapata M (2019) Text summarization with pretrained encoders. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). https://doi.org/10.18653/v1/d19-1387
Mihalcea R, Tarau P (2004) Textrank: bringing order into text. In: Proceedings of the 2004 conference on empirical methods in natural language processing, pp 404–411
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:13013781
Miller D (2019) Leveraging bert for extractive text summarization on lectures, 1906.04165
Mishra SK, Saini N, Saha S, Bhattacharyya P (2021) Scientific document summarization in multi-objective clustering framework. Appl Intell, 1–24
Pakhira MK, Bandyopadhyay S, Maulik U (2004) Validity index for crisp and fuzzy clusters. Pattern Recogn 37(3):487– 501
Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 311–318
Qazvinian V, Radev DR (2008) Scientific paper summarization using citation summary networks. In: Proceedings of the 22nd international conference on computational linguistics, vol 1. Association for Computational Linguistics, pp 689–696
Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are unsupervised multitask learners. OpenAI blog 1(8):9
Randhawa S, Jain S (2019) Mlbc: multi-objective load balancing clustering technique in wireless sensor networks. Appl Soft Comput 74:66–89
Saggion H, Poibeau T (2013) Automatic text summarization: Past, present and future. In: Multi-source, multilingual information extraction and summarization. Springer, pp 3–21
Saggion H, AbuRa’ed AGT, Ronzano F (2016) Trainable citation-enhanced summarization of scientific articles. In: Cabanac G, Chandrasekaran MK, Frommholz I, Jaidka K, Kan M, Mayr P, Wolfram D (eds) Proceedings of the joint workshop on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL); 2016 June 23; Newark, United States.[place unknown]: CEUR Workshop Proceedings; 2016. pp 175–86. CEUR Workshop Proceedings
Saha S, Mitra S, Kramer S (2018) Exploring multiobjective optimization for multiview clustering. ACM Trans Knowl Discov Data (TKDD) 12(4):44
Saini N, Saha S (2021) Multi-objective optimization techniques: a survey of the state-of-the-art and applications. Europ Phys J Special Topics 230(10):2319–2335
Saini N, Saha S, Bhattacharyya P (2019) Automatic scientific document clustering using self-organized multi-objective differential evolution. Cogn Comput 11(2):271–293
Saini N, Saha S, Jangra A, Bhattacharyya P (2019) Extractive single document summarization using multi-objective optimization: exploring self-organized differential evolution, grey wolf optimizer and water cycle algorithm. Knowl-Based Syst 164:45–67
Saini N, Saha S, Tuteja H, Bhattacharyya P (2019) Textual entailment based figure summarization for biomedical articles. ACM Transactions on Multimedia Computing Communications and Applications
Saini N, Saha S, Bhattacharyya P, Tuteja H (2020) Textual entailment–based figure summarization for biomedical articles. ACM Trans Multimed Comput Commun Applic (TOMM) 16(1s):1–24
Saini N, Bansal D, Saha S, Bhattacharyya P (2021) Multi-objective multi-view based search result clustering using differential evolution framework. Exp Syst Applic 168:114,299
Saini N, Kumar S, Saha S, Bhattacharyya P (2021) Scientific document summarization using citation context and multi-objective optimization. In: 2020 25th International conference on pattern recognition (ICPR). IEEE, pp 4290–4295
Sharma KK, Seal A (2021) Outlier-robust multi-view clustering for uncertain data. Knowl-Based Syst 211:106,567
Song S, Gao S, Chen X, Jia D, Qian X, Todo Y (2018) Aimoes: archive information assisted multi-objective evolutionary strategy for ab initio protein structure prediction. Knowl-Based Syst 146:58–72
Sun C, Qiu X, Xu Y, Huang X (2019) How to fine-tune bert for text classification?. In: China National conference on chinese computational linguistics. Springer, pp 194–206
Teufel S, Moens M (2002) Summarizing scientific articles: experiments with relevance and rhetorical status. Comput Ling 28(4):409–445
Vanderwende L, Suzuki H, Brockett C, Nenkova A (2007) Beyond sumbasic: task-focused summarization with sentence simplification and lexical expansion. Inform Process Manag 43(6):1606–1618
Wan X, Yang J, Xiao J (2007) Manifold-ranking based topic-focused multi-document summarization. IJCAI 7:2903–2908
Wang D, Tan D, Liu L (2018) Particle swarm optimization algorithm: an overview. Soft Comput 22(2):387–408
Wang L, Fu X, Menhas MI, Fei M (2010) A modified binary differential evolution algorithm. In: Life system modeling and intelligent computing. Springer, pp 49–57
Wang R, Lai S, Wu G, Xing L, Wang L, Ishibuchi H (2018) Multi-clustering via evolutionary multi-objective optimization. Inform Sci 450:128–140
Wang S, Liu X, Zhu E, Tang C, Liu J, Hu J, Xia J, Yin J (2019) Multi-view clustering via late fusion alignment maximization. In: IJCAI, pp 3778–3784
Welch BL (1947) The generalization of ‘STUDENT’S’ problem when several different population variances are involved. Biometrika 34(1–2):28–35
Xian Y, Lampert CH, Schiele B, Akata Z (2018) Zero-shot learning a comprehensive evaluation of the good, the bad and the ugly. IEEE Trans Pattern Anal Mach Intell 41(9):2251–2265
Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov R, Le QV (2019) Xlnet: Generalized autoregressive pretraining for language understanding. 1906.08237
Yasunaga M, Kasai J, Zhang R, Fabbri AR, Li I, Friedman D, Radev DR (2019) Scisummnet: a large annotated corpus and content-impact models for scientific paper summarization with citation networks. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 7386–7393
Zhang J, Zhao Y, Saleh M, Liu P (2020) Pegasus: pre-training with extracted gap-sentences for abstractive summarization. In: International conference on machine learning PMLR, pp 11,328–11,339
Zhang Y, Er MJ, Zhao R, Pratama M (2016) Multiview convolutional neural networks for multidocument extractive summarization. IEEE Trans Cybern 47(10):3230–3242
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Ethics approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Conflict of Interests
Authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Saini, N., Reddy, S.M., Saha, S. et al. Multi-view multi-objective clustering-based framework for scientific document summarization using citation context. Appl Intell 53, 18002–18026 (2023). https://doi.org/10.1007/s10489-022-04166-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-04166-z