Multi-view multi-objective clustering-based framework for scientific document summarization using citation context

Saini, Naveen; Reddy, Saichethan Miriyala; Saha, Sriparna; Moreno, Jose G.; Doucet, Antoine

doi:10.1007/s10489-022-04166-z

Multi-view multi-objective clustering-based framework for scientific document summarization using citation context

Published: 20 January 2023

Volume 53, pages 18002–18026, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Naveen Saini ORCID: orcid.org/0000-0002-2421-1457^1,2,
Saichethan Miriyala Reddy³,
Sriparna Saha⁴,
Jose G. Moreno¹ &
…
Antoine Doucet⁵

489 Accesses
Explore all metrics

Abstract

Due to the expanding rate of scientific publications, it has become a necessity to summarize scientific documents to allow researchers to keep track of recent developments. In this paper, we formulate the scientific document summarization problem in a multi-view clustering (MVC) framework. Two views of the scientific documents, semantic and syntactic, are considered jointly in MVC framework. To obtain an improved partitioning corresponding to different views, a differential evolution algorithm is utilized as the underlying optimization strategy. After obtaining the final optimal partitioning, various sentence-level features like length, position of the sentences, among others, are investigated to extract high scoring sentences and these extracted sentences are used to form the summary. We have also investigated the use of (a) two embedding spaces to represent the sentences of the documents in semantic space; and (b) use of citation contexts to incorporate their importance in summarizing a given scientific document. Our proposed multi-view based clustering approach is purely unsupervised in nature and, does not utilize any labelled data for the generation of the summary. To show the potentiality of multi-view clustering, a single-view based clustering framework is also developed for the purpose of comparison. The performance of the multi-view based system in comparison to a single-view is first tested on some random articles selected from the recently released SciSummNet 2019 dataset using ROUGE measure. Then, two more datasets namely, CL-SciSumm 2016 and CL-SciSumm 2017 are considered for evaluation. Our investigations prove that the proposed approach yields better results than the existing supervised and unsupervised methods including transformer-based deep learning model, with statistically significant improvements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scientific document summarization in multi-objective clustering framework

Article 23 May 2021

Let’s Summarize Scientific Documents! A Clustering-Based Approach via Citation Context

CovSumm: an unsupervised transformer-cum-graph-based hybrid document summarization model for CORD-19

Article 26 April 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

References

AbuRa’ed A, Chiruzzo L, Saggion H, Accuosto P, Bravo Serrano À (2017) Lastus/taln@ clscisumm-17: Cross-document sentence matching and scientific text summarization systems. In: BIRNDL@ SIGIR (2)
Aggarwal CC, Reddy CK (2014) Data clustering algorithms and application. CRC Press, Boca Raton
Book MATH Google Scholar
Alambo A, Lohstroh C, Madaus E, Padhee S, Foster B, Banerjee T, Thirunarayan K, Raymer M (2020) Topic-centric unsupervised multi-document summarization of scientific and news articles. arXiv:201108072
Beltagy I, Cohan A, Feigenblat G, Freitag D, Ghosal T, Hall K, Herrmannova D, Knoth P, Lo K, Mayr P et al (2021) Overview of the second workshop on scholarly document processing. Tech. rep. Oak Ridge National Lab.(ORNL). Oak Ridge, TN (United States)
Bornmann L, Mutz R (2015) Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references. J Assoc Inform Sci Technol 66(11):2215–2222
Article Google Scholar
Burges CJ (2010) From ranknet to lambdarank to lambdamart: an overview. Learning 11 (23–581):81
Google Scholar
Cagliero L, La Quatra M (2020) Extracting highlights of scientific articles: a supervised summarization approach. Expert Syst Applic 160:113,659
Article Google Scholar
Cao Z, Li W, Wu D (2016) Polyu at cl-scisumm 2016. In: Proceedings of the joint workshop on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL), pp 132–138
Chandrasekaran MK, Feigenblat G, Freitag D, Ghosal T, Hovy E, Mayr P, Shmueli-Scheuer M, de Waard A (2020) Overview of the first workshop on scholarly document processing (sdp). In: Proceedings of the first workshop on scholarly document processing, pp 1–6
Cohan A, Goharian N (2018) Scientific document summarization via citation contextualization and scientific discourse. Int J Digit Libr 19(2–3):287–303
Article Google Scholar
Cohan A, Soldaini L, Goharian N (2015) Matching citation text and cited spans in biomedical literature: a search-oriented approach. In: Proceedings of the 2015 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 1042–1048
Cohan A, Dernoncourt F, Kim DS, Bui T, Kim S, Chang W, Goharian N (2018) A discourse-aware attention model for abstractive summarization of long documents. arXiv:180405685
Collins E, Augenstein I, Riedel S (2017) A supervised approach to extractive summarisation of scientific papers. 1706.03946
Conroy J, Davis S (2015) Vector space models for scientific document summarization. In: Proceedings of the 1st workshop on vector space modeling for natural language processing, pp 186–191
Davis ST, Conroy JM, Schlesinger JD (2012) Occams–an optimal combinatorial covering algorithm for multi-document summarization. In: 2012 IEEE 12th international conference on data mining workshops. IEEE, pp 454–463
Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE Trans Evol Comput 6(2):182–197
Article Google Scholar
Elkiss A, Shen S, Fader A, Erkan G, States D, Radev D (2008) Blind men and elephants: what do citation summaries tell us about a research article? Journal of the American Society for Information Science and Technologconroy2015vectory 59(1):51–62
Article Google Scholar
Erkan G, Radev DR (2004) Lexrank: graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22:457–479
Article Google Scholar
Hernández-Alvarez M, Gomez JM (2016) Survey about citation context analysis: Tasks, techniques, and resources. Nat Lang Eng 22(3):327–349
Article Google Scholar
Hoang CDV, Kan MY (2010) Towards automated related work summarization. In: Coling 2010: Posters, pp 427–435
Huang S, Kang Z, Xu Z (2020) Auto-weighted multi-view clustering via deep matrix decomposition. Pattern Recogn 97:107,015
Article Google Scholar
Ismayilov G, Topcuoglu HR (2020) Neural network based multi-objective evolutionary algorithm for dynamic workflow scheduling in cloud computing. Fut Gen Comput Syst 102:307–322
Article Google Scholar
Jaidka K, Chandrasekaran MK, Rustagi S, Kan MY (2018) Insights from cl-scisumm 2016: the faceted scientific document summarization shared task. Int J Digit Libr 19(2–3):163– 171
Article Google Scholar
Karimi S, Moraes L, Das A, Shakery A, Verma R (2018) Citance-based retrieval and summarization using ir and machine learning. Scientometrics 116(2):1331–1366
Article Google Scholar
Kuang Y, Sun J, Gan X, Gong D, Liu Z, Zha M (2021) Dynamic multi-objective cooperative coevolutionary scheduling for mobile underwater wireless sensor networks. Comput Indus Eng 156:107,229
Article Google Scholar
Kusner M, Sun Y, Kolkin N, Weinberger K (2015) From word embeddings to document distances. In: International conference on machine learning, pp 957–966
Lauscher A, Glavas G, Eckert K (2017) Citation-based summarization of scientific articles using semantic textual similarity. In: Proc. of the 2nd joint workshop on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL2017), Tokyo
Lauscher A, Glavaš G, Eckert K (2017) University of mannheim@ clscisumm-17: Citation-based summarization of scientific articles using semantic textual similarity. In: CEUR workshop proceedings, RWTH 2002, pp 33–42
Lei Z, Gao S, Zhang Z, Zhou MC, Cheng J (2021) Mo4: a many-objective evolutionary algorithm for protein structure prediction. IEEE Transactions on Evolutionary Computation
Li L, Mao L, Zhang Y, Chi J, Huang T, Cong X, Peng H (2016) Cist system for cl-scisumm 2016 shared task. In: Proceedings of the joint workshop on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL), pp 156–167
Li L, Zhang Y, Mao L, Chi J, Chen M, Huang Z (2017) Cist@ clscisumm-17: Multiple features based citation linkage, classification and summarization. In: BIRNDL@ SIGIR (2)
Li X, Zhang H, Wang R, Nie F (2020) Multi-view clustering: a scalable and parameter-free bipartite graph fusion method. IEEE Transactions on Pattern Analysis and Machine Intelligence
Liang J, Qiao K, Yue C, Yu K, Qu B, Xu R, Li Z, Hu Y (2021) A clustering-based differential evolution algorithm for solving multimodal multi-objective optimization problems. Swarm Evol Comput 60:100,788
Article Google Scholar
Lin CY (2004) Rouge: a package for automatic evaluation of summaries. In: Text summarization branches out, pp 74–81
Liu Y, Lapata M (2019) Text summarization with pretrained encoders. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). https://doi.org/10.18653/v1/d19-1387
Mihalcea R, Tarau P (2004) Textrank: bringing order into text. In: Proceedings of the 2004 conference on empirical methods in natural language processing, pp 404–411
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:13013781
Miller D (2019) Leveraging bert for extractive text summarization on lectures, 1906.04165
Mishra SK, Saini N, Saha S, Bhattacharyya P (2021) Scientific document summarization in multi-objective clustering framework. Appl Intell, 1–24
Pakhira MK, Bandyopadhyay S, Maulik U (2004) Validity index for crisp and fuzzy clusters. Pattern Recogn 37(3):487– 501
Article MATH Google Scholar
Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 311–318
Qazvinian V, Radev DR (2008) Scientific paper summarization using citation summary networks. In: Proceedings of the 22nd international conference on computational linguistics, vol 1. Association for Computational Linguistics, pp 689–696
Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are unsupervised multitask learners. OpenAI blog 1(8):9
Google Scholar
Randhawa S, Jain S (2019) Mlbc: multi-objective load balancing clustering technique in wireless sensor networks. Appl Soft Comput 74:66–89
Article Google Scholar
Saggion H, Poibeau T (2013) Automatic text summarization: Past, present and future. In: Multi-source, multilingual information extraction and summarization. Springer, pp 3–21
Saggion H, AbuRa’ed AGT, Ronzano F (2016) Trainable citation-enhanced summarization of scientific articles. In: Cabanac G, Chandrasekaran MK, Frommholz I, Jaidka K, Kan M, Mayr P, Wolfram D (eds) Proceedings of the joint workshop on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL); 2016 June 23; Newark, United States.[place unknown]: CEUR Workshop Proceedings; 2016. pp 175–86. CEUR Workshop Proceedings
Saha S, Mitra S, Kramer S (2018) Exploring multiobjective optimization for multiview clustering. ACM Trans Knowl Discov Data (TKDD) 12(4):44
Google Scholar
Saini N, Saha S (2021) Multi-objective optimization techniques: a survey of the state-of-the-art and applications. Europ Phys J Special Topics 230(10):2319–2335
Article Google Scholar
Saini N, Saha S, Bhattacharyya P (2019) Automatic scientific document clustering using self-organized multi-objective differential evolution. Cogn Comput 11(2):271–293
Article Google Scholar
Saini N, Saha S, Jangra A, Bhattacharyya P (2019) Extractive single document summarization using multi-objective optimization: exploring self-organized differential evolution, grey wolf optimizer and water cycle algorithm. Knowl-Based Syst 164:45–67
Article Google Scholar
Saini N, Saha S, Tuteja H, Bhattacharyya P (2019) Textual entailment based figure summarization for biomedical articles. ACM Transactions on Multimedia Computing Communications and Applications
Saini N, Saha S, Bhattacharyya P, Tuteja H (2020) Textual entailment–based figure summarization for biomedical articles. ACM Trans Multimed Comput Commun Applic (TOMM) 16(1s):1–24
Article Google Scholar
Saini N, Bansal D, Saha S, Bhattacharyya P (2021) Multi-objective multi-view based search result clustering using differential evolution framework. Exp Syst Applic 168:114,299
Article Google Scholar
Saini N, Kumar S, Saha S, Bhattacharyya P (2021) Scientific document summarization using citation context and multi-objective optimization. In: 2020 25th International conference on pattern recognition (ICPR). IEEE, pp 4290–4295
Sharma KK, Seal A (2021) Outlier-robust multi-view clustering for uncertain data. Knowl-Based Syst 211:106,567
Article Google Scholar
Song S, Gao S, Chen X, Jia D, Qian X, Todo Y (2018) Aimoes: archive information assisted multi-objective evolutionary strategy for ab initio protein structure prediction. Knowl-Based Syst 146:58–72
Article Google Scholar
Sun C, Qiu X, Xu Y, Huang X (2019) How to fine-tune bert for text classification?. In: China National conference on chinese computational linguistics. Springer, pp 194–206
Teufel S, Moens M (2002) Summarizing scientific articles: experiments with relevance and rhetorical status. Comput Ling 28(4):409–445
Article Google Scholar
Vanderwende L, Suzuki H, Brockett C, Nenkova A (2007) Beyond sumbasic: task-focused summarization with sentence simplification and lexical expansion. Inform Process Manag 43(6):1606–1618
Article Google Scholar
Wan X, Yang J, Xiao J (2007) Manifold-ranking based topic-focused multi-document summarization. IJCAI 7:2903–2908
Google Scholar
Wang D, Tan D, Liu L (2018) Particle swarm optimization algorithm: an overview. Soft Comput 22(2):387–408
Article Google Scholar
Wang L, Fu X, Menhas MI, Fei M (2010) A modified binary differential evolution algorithm. In: Life system modeling and intelligent computing. Springer, pp 49–57
Wang R, Lai S, Wu G, Xing L, Wang L, Ishibuchi H (2018) Multi-clustering via evolutionary multi-objective optimization. Inform Sci 450:128–140
Article MathSciNet MATH Google Scholar
Wang S, Liu X, Zhu E, Tang C, Liu J, Hu J, Xia J, Yin J (2019) Multi-view clustering via late fusion alignment maximization. In: IJCAI, pp 3778–3784
Welch BL (1947) The generalization of ‘STUDENT’S’ problem when several different population variances are involved. Biometrika 34(1–2):28–35
MathSciNet MATH Google Scholar
Xian Y, Lampert CH, Schiele B, Akata Z (2018) Zero-shot learning a comprehensive evaluation of the good, the bad and the ugly. IEEE Trans Pattern Anal Mach Intell 41(9):2251–2265
Article Google Scholar
Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov R, Le QV (2019) Xlnet: Generalized autoregressive pretraining for language understanding. 1906.08237
Yasunaga M, Kasai J, Zhang R, Fabbri AR, Li I, Friedman D, Radev DR (2019) Scisummnet: a large annotated corpus and content-impact models for scientific paper summarization with citation networks. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 7386–7393
Zhang J, Zhao Y, Saleh M, Liu P (2020) Pegasus: pre-training with extracted gap-sentences for abstractive summarization. In: International conference on machine learning PMLR, pp 11,328–11,339
Zhang Y, Er MJ, Zhao R, Pratama M (2016) Multiview convolutional neural networks for multidocument extractive summarization. IEEE Trans Cybern 47(10):3230–3242
Article Google Scholar

Download references

Author information

Authors and Affiliations

Université Toulouse III - Paul Sabatier, IRIT, Toulouse, France
Naveen Saini & Jose G. Moreno
Department of Computer Science, Indian Institute of Information Technology Lucknow, Uttar Pradesh, India
Naveen Saini
The University of Texas at Dallas, Richardson, TX, USA
Saichethan Miriyala Reddy
Department of Computer Science and Engineering, Indian Institute of Technology, Patna, Bihar, India
Sriparna Saha
L3i Laboratory, University of La Rochelle, La Rochelle, France
Antoine Doucet

Authors

Naveen Saini
View author publications
You can also search for this author in PubMed Google Scholar
Saichethan Miriyala Reddy
View author publications
You can also search for this author in PubMed Google Scholar
Sriparna Saha
View author publications
You can also search for this author in PubMed Google Scholar
Jose G. Moreno
View author publications
You can also search for this author in PubMed Google Scholar
Antoine Doucet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Naveen Saini or Saichethan Miriyala Reddy.

Ethics declarations

Ethics approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Conflict of Interests

Authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Saini, N., Reddy, S.M., Saha, S. et al. Multi-view multi-objective clustering-based framework for scientific document summarization using citation context. Appl Intell 53, 18002–18026 (2023). https://doi.org/10.1007/s10489-022-04166-z

Download citation

Accepted: 07 September 2022
Published: 20 January 2023
Issue Date: July 2023
DOI: https://doi.org/10.1007/s10489-022-04166-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-view multi-objective clustering-based framework for scientific document summarization using citation context

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Scientific document summarization in multi-objective clustering framework

Let’s Summarize Scientific Documents! A Clustering-Based Approach via Citation Context

CovSumm: an unsupervised transformer-cum-graph-based hybrid document summarization model for CORD-19

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Ethics approval

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now