Abstract
The exponential growth in the number of scientific articles has made it difficult for the researchers to keep themselves updated with the new developments. Scientific document summarization solves this problem by providing a summary of essential contributions. In this paper, we have presented a novel method of scientific document summarization using a multi-objective differential evolution technique. Here, firstly distinct important sentences are extracted by using citation contextualization. These sentences are further clustered using the concept of multi-objective clustering. Two objective functions, PBM index, and XB index, measuring the compactness and separation of sentence clusters, are simultaneously optimized utilizing the search capability of multi-objective differential evolution. We have conducted our experiments on CL-SciSumm 2016, CL-SciSumm 2017, CL-SciSumm 2018, and CL-SciSumm 2019 datasets. Obtained results of CL-SciSumm 2016 and CL-SciSumm 2017 are compared with the state-of-the-art methods. Evaluation results demonstrate that our method outperforms others in terms of ROUGE-2, ROUGE-3, and ROUGE-SU4 scores.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bornmann L, Mutz R (2015) Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references. Journal of the Association for Information Science and Technology 66(11):2215
Atanassova I, Bertin M, Larivière V (2016) On the composition of scientific abstracts. Journal of Documentation
Cohan A, Goharian N (2018) Scientific document summarization via citation contextualization and scientific discourse. Int J Digit Libr 19(2-3):287
Saha S, Bandyopadhyay S (2010) A symmetry based multiobjective clustering technique for automatic evolution of clusters. Pattern Recognition 43(3):738
Bandyopadhyay S, Saha S, Maulik U, Deb K (2008) A simulated annealing-based multiobjective optimization algorithm: amosa. IEEE Transactions on Evolutionary Computation 12(3):269
Storn R, Price K (1997) Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization 11(4):341
Pakhira MK, Bandyopadhyay S, Maulik U (2004) Validity index for crisp and fuzzy clusters. Pattern Recognition 37(3):487
Kusner M, Sun Y, Kolkin N, Weinberger K (2015) From word embeddings to document distances. In: International conference on machine learning, pp 957–966
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781
Gong Y, Liu X (2001) Generic text summarization using relevance measure and latent semantic analysis. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 19–25
Vanderwende L, Suzuki H, Brockett C, Nenkova A (2007) Beyond sumbasic: task-focused summarization with sentence simplification and lexical expansion. Information Processing & Management 43(6):1606
Ma T, Nakagawa H (2013) Automatically Determining a Proper Length for Multi-document Summarization: A Bayesian Nonparametric Approach. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 736–746
Saini N, Saha S, Chakraborty D, Bhattacharyya P (2019) Extractive single document summarization using binary differential evolution: optimization of different sentence quality measures. PloS One 14(11)
Clarke J, Lapata M (2008) Global inference for sentence compression: an integer linear programming approach. J Artif Intell Res 31:399
Louis A, Joshi A, Nenkova A (2010) Discourse Indicators for Content Selection in Summaization. In: Proceedings of the 11th annual meeting of the special interest group on discourse and dialogue. Association for Computational Linguistics, pp 147– 156
Erkan G, Radev DR (2004) Lexrank: graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research 22:457
Mihalcea R, Tarau P (2004) Textrank: Bringing order into text. In: Proceedings of the 2004 conference on empirical methods in natural language processing, pp 404–411
Teufel S, Moens M (2002) Summarizing scientific articles: experiments with relevance and rhetorical status. Computational Linguistics 28(4):409
Elkiss A, Shen S, Fader A, Erkan G, States D, Radev D (2008) Blind men and elephants: what do citation summaries tell us about a research article?. Journal of the American Society for Information Science and Technology 59(1):51
Hernández-Alvarez M, Gomez JM (2016) Survey about citation context analysis: tasks, techniques, and resources. Nat Lang Eng 22(3):327
Hoang CDV, Kan MY (2010) Towards automated related work summarization. In: Proceedings of the 23rd international conference on computational linguistics: posters. Association for Computational Linguistics, pp 427–435
Cohan A, Soldaini L, Goharian N (2015) Matching citation text and cited spans in biomedical literature: a search-oriented approach. In: Proceedings of the 2015 conference of the North American Chapter of the association for computational linguistics: human language technologies, pp 1042–1048
Jaidka K, Chandrasekaran MK, Rustagi S, Kan MY (2016) Overview of the CL-SciSumm 2016 shared task. In: Proceedings of the joint workshop on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL), pp 93–102
Qazvinian V, Radev DR (2008) Scientific paper summarization using citation summary networks. In: Proceedings of the 22nd international conference on computational linguistics, vol 1. Association for Computational Linguistics, pp 689–696
Li L, Mao L, Zhang Y, Chi J, Huang T, Cong X, Peng H (2016) Cist system for cl-scisumm 2016 shared task. In: Proceedings of the joint workshop on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL), pp 156–167
Nomoto T (2016) NEAL: A neurally enhanced approach to linking citation and reference. In: Proceedings of the joint workshop on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL), pp 168–174
Klampfl S, Rexha A, Kern R (2016) Identifying Referenced Text in ScientificPublications by Summarisation and Classification Techniques. In: Proceedings of the joint workshop on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL), pp 122–131
Moraes L, Baki S, Verma R, Lee D (2016) Identifying referenced text in scientific publications by summarisation and classification techniques. In: Proceedings of the joint workshop on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL), pp 113–121
Cortes C, Vapnik V (1995) Support-vector networks. Machine learning 20(3):273
Conroy J, Davis S (2015) Vector space models for scientific document summarization. In: Proceedings of the 1st workshop on vector space modeling for natural language processing, pp 186–191
Malenfant B, Lapalme G (2016) RALI system description for CL-SciSumm 2016 shared task. In: Proceedings of the joint workshop on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL), pp 146–155
Cao Z, Li W, Wu D (2016) Polyu at cl-scisumm 2016. In: Proceedings of the joint workshop on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL), pp 132–138
Wan X, Yang J, Xiao J (2007) Manifold-Ranking Based Topic-Focused Multi-Document Summarization. In: IJCAI, vol 7, pp 2903–2908
Li L, Zhang Y, Mao L, Chi J, Chen M, Huang Z (2017) Cist@ clscisumm-17: multiple features based citation linkage classification and summarization
Abura’ed A, Chiruzzo L, Saggion H, Accuosto P, Bravo Serrano À (2017) Lastus/taln@ Clscisumm-17: cross-document sentence matching and scientific text summarization systems
Bird S, Dale R, Dorr BJ, Gibson B, Joseph MT, Kan MY, Lee D, Powley B, Radev DR, Tan YF (2008) The acl anthology reference corpus: a reference dataset for bibliographic research in computational linguistics
Jaccard P (1901) ÉTude comparative de la distribution florale dans une portion des alpes et des jura. Bull Soc Vaudoise Sci Nat 37:547
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. Journal of machine Learning research 3(Jan):993
Lauscher A, Glavas G, Eckert K (2017) University of Mannheim@ CLSciSumm-17: Citation-based summarization of scientific articles using semantic textual similarity. CEUR workshop proceedings 2002:33–42. RWTH
Dipankar Das S, Pramanick A (2017) .. In: Proc. of the 2nd joint workshop on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL2017). Tokyo, Japan (August 2017)
Karimi S, Moraes L, Das A, Shakery A, Verma R (2018) Citance-based retrieval and summarization using ir and machine learning. Scientometrics 116(2):1331
Lv Y, Zhai C (2009) Positional language models for information retrieval. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval, pp 299–306
Tian R, Miyao Y, Matsuzaki T (2014) Logical inference on dependency-based compositional semantics. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (volume 1: Long Papers), pp 79–89
Blitzer J, McDonald R, Pereira F (2006) Domain adaptation with structural correspondence learning. In: Proceedings of the 2006 conference on empirical methods in natural language processing. https://www.aclweb.org/anthology/W06-1615. Association for Computational Linguistics, Sydney, pp 120–128
Cohan A, Goharian N (2017) Scientific article summarization using citation-context and article’s discourse structure. arXiv:1704.06619
AbuRa’ed A, Bravo Serrano À, Chiruzzo L, Saggion H (2019) LaSTUS-TALN+ INCO@ CL-SciSumm 2019. BIRNDL@ SIGIR, 224–232
Ma S, Xu J, Wang J, Zhang C (2017) NJUST @ CLSciSumm-17. BIRNDL@SIGIR
Ma S, Zhang H, Xu J, Zhang C (2018) NJUST @ CLSciSumm-18. BIRNDL@SIGIR
Debnath D, Achom A, Pakray P (2018) NLP-NITMZ@ CLScisumm-18. BIRNDL@ SIGIR. pp 164–171
Zerva C, Nghiem MQ, Nguyen NT, Ananiadou S (2020) Cited text span identification for scientific summarisation using pre-trained encoders. Scientometrics 125(3):3109–3137. Springer
Li L, Zhu Y, Xie Y, Huang Z, Liu W, Li X, Liu Y (2019) CIST@ CLSciSumm-19: Automatic Scientific Paper Summarization with Citances and Facets. In: BIRNDL@ SIGIR, pp 196–207
La Quatra M, Cagliero L, Baralis E (2019) Poli2Sum@ CL-SciSumm-19: Identify, Classify, and Summarize Cited Text Spans by means of Ensembles of Supervised Models. In: BIRNDL@ SIGIR , pp 233–246
Ma S, Zhang H, Xu T, Xu J, Hu S, Zhang C (2019) IR&TM-NJUST @ CLSciSumm-19. In: BIRNDL@ SIGIR, pp 181–195
Chiruzzo L, AbuRa’ed A, Bravo À, Saggion H (2019) LaSTUS-TALN+ INCO@ CL-SciSumm 2019. BIRNDL@ SIGIR, pp 224–232
Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: nsga-ii. IEEE Trans Evolutionary Computation 6(2):182
Saini N, Chourasia S, Saha S, Bhattacharyya P (2017) A self organizing map based multi-objective framework for automatic evolution of clusters. In: International conference on neural information processing. Springer, pp 672–682
Suresh K, Kundu D, Ghosh S, Das S, Abraham A (2009) Data clustering using multi-objective differential evolution algorithms. Fundamenta Informaticae 97(4):381
Das S, Abraham A, Konar A (2007) Automatic clustering using an improved differential evolution algorithm. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 38(1):218
Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95-international conference on neural networks, vol 4. IEEE, pp 1942–1948
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall Inc.
Saha S, Bandyopadhyay S (2013) A generalized automatic clustering algorithm in a multiobjective framework. Appl Soft Comput 13(1):89
Handl J, Knowles J (2007) An evolutionary approach to multiobjective clustering. IEEE transactions on Evolutionary Computation 11(1):56
Saini N, Saha S, Bhattacharyya P (2019) Automatic scientific document clustering using self-organized multi-objective differential evolution. Cognitive Computation 11(2):271
Hancer E (2020) A new multi-objective differential evolution approach for simultaneous clustering and feature selection. Eng Appl Artif Intell 87:103307
Liu C, Li Y, Zhao Q, Liu C (2019) Reference vector-based multi-objective clustering for high-dimensional data. Appl Soft Comput 78:614
Mukhopadhyay A, Maulik U, Bandyopadhyay S (2009) .. In: 2009 seventh international conference on advances in pattern recognition. IEEE, pp 236–239
Zhang H, Zhou A, Song S, Zhang Q, Gao XZ, Zhang J (2016) A self-organizing multiobjective evolutionary algorithm. IEEE Trans Evol Comput 20(5):792
Kupiec J, Pedersen J, Chen F (1995) A trainable document summarizer. In: Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, pp 68–73
Lin CY (2004) Rouge: A package for automatic evaluation of summaries. In: Text summarization branches out. https://www.aclweb.org/anthology/W04-1013. Association for Computational Linguistics, Barcelona, pp 74–81
Heng W, Yu J, Li L, Liu Y (2013) Research on key factors in multi-document topic modelling application with hlda. Journal of Chinese Information Processing 27(6):117
Robertson S, Zaragoza H (2009) The probabilistic relevance framework: BM25 and beyond Now Publishers Inc
Bendersky M, Croft WB (2008) Discovering key concepts in verbose queries. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, pp 491–498
Faruqui M, Dodge J, Jauhar SK, Dyer C, Hovy E, Smith NA (2014) Retrofitting word vectors to semantic lexicons. arXiv:1411.4166
Zhai C, Lafferty J (2004) A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems (TOIS) 22(2):179
Shutian M, Xu J, Wang J, Zhang C (2017) NJUST @ CLSciSumm-17. BIRNDL@SIGIR
Saini N, Saha S, Bhattacharyya P (2019) Multiobjective-based approach for microblog summarization. IEEE Transactions on Computational Social Systems 6(6):1219–1231
Welch BL (1947) The generalization ofstudent’s’ problem when several different population variances are involved. Biometrika 34(1/2):28
Alguliev RM, Aliguliyev RM, Isazade NR (2013) Multiple documents summarization based on evolutionary optimization algorithm. Expert Syst Appl 40(5):1675
Nomoto T, Matsumoto Y (2001) A new approach to unsupervised text summarization. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, pp 26–34
Saha S, Mitra S, Kramer S (2018) Exploring multiobjective optimization for multiview clustering. ACM Transactions on Knowledge Discovery from Data (TKDD) 12(4):1
Jaidka K, Yasunaga M, Chandrasekaran MK, Radev D, Kan MY (2019) The cl-scisumm shared task 2018: results and key insights. arXiv:1909.00764
Chandrasekaran MK, Yasunaga M, Radev D, Freitag D, Kan MY (2019) Overview and results: Cl-scisumm shared task 2019. arXiv:1907.09854
Garcia S, Herrera F (2008) An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons. Journal of Machine Learning Research 9(Dec):2677
Acknowledgements
This publication is an outcome of the R&D work undertaken in the project under the Visvesvaraya Ph.D. Scheme of Ministry of Electronics & Information Technology, Government of India, being implemented by Digital India Corporation (Formerly Media Lab Asia),
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mishra, S.K., Saini, N., Saha, S. et al. Scientific document summarization in multi-objective clustering framework. Appl Intell 52, 1520–1543 (2022). https://doi.org/10.1007/s10489-021-02376-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02376-5