Skip to main content
Log in

Scientific document summarization in multi-objective clustering framework

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The exponential growth in the number of scientific articles has made it difficult for the researchers to keep themselves updated with the new developments. Scientific document summarization solves this problem by providing a summary of essential contributions. In this paper, we have presented a novel method of scientific document summarization using a multi-objective differential evolution technique. Here, firstly distinct important sentences are extracted by using citation contextualization. These sentences are further clustered using the concept of multi-objective clustering. Two objective functions, PBM index, and XB index, measuring the compactness and separation of sentence clusters, are simultaneously optimized utilizing the search capability of multi-objective differential evolution. We have conducted our experiments on CL-SciSumm 2016, CL-SciSumm 2017, CL-SciSumm 2018, and CL-SciSumm 2019 datasets. Obtained results of CL-SciSumm 2016 and CL-SciSumm 2017 are compared with the state-of-the-art methods. Evaluation results demonstrate that our method outperforms others in terms of ROUGE-2, ROUGE-3, and ROUGE-SU4 scores.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. https://tac.nist.gov/2014/BiomedSumm/

  2. https://github.com/WING-NUS/scisumm-corpus/

References

  1. Bornmann L, Mutz R (2015) Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references. Journal of the Association for Information Science and Technology 66(11):2215

    Article  Google Scholar 

  2. Atanassova I, Bertin M, Larivière V (2016) On the composition of scientific abstracts. Journal of Documentation

  3. Cohan A, Goharian N (2018) Scientific document summarization via citation contextualization and scientific discourse. Int J Digit Libr 19(2-3):287

    Article  Google Scholar 

  4. Saha S, Bandyopadhyay S (2010) A symmetry based multiobjective clustering technique for automatic evolution of clusters. Pattern Recognition 43(3):738

    Article  Google Scholar 

  5. Bandyopadhyay S, Saha S, Maulik U, Deb K (2008) A simulated annealing-based multiobjective optimization algorithm: amosa. IEEE Transactions on Evolutionary Computation 12(3):269

    Article  Google Scholar 

  6. Storn R, Price K (1997) Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization 11(4):341

    Article  MathSciNet  Google Scholar 

  7. Pakhira MK, Bandyopadhyay S, Maulik U (2004) Validity index for crisp and fuzzy clusters. Pattern Recognition 37(3):487

    Article  Google Scholar 

  8. Kusner M, Sun Y, Kolkin N, Weinberger K (2015) From word embeddings to document distances. In: International conference on machine learning, pp 957–966

  9. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781

  10. Gong Y, Liu X (2001) Generic text summarization using relevance measure and latent semantic analysis. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 19–25

  11. Vanderwende L, Suzuki H, Brockett C, Nenkova A (2007) Beyond sumbasic: task-focused summarization with sentence simplification and lexical expansion. Information Processing & Management 43(6):1606

    Article  Google Scholar 

  12. Ma T, Nakagawa H (2013) Automatically Determining a Proper Length for Multi-document Summarization: A Bayesian Nonparametric Approach. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 736–746

  13. Saini N, Saha S, Chakraborty D, Bhattacharyya P (2019) Extractive single document summarization using binary differential evolution: optimization of different sentence quality measures. PloS One 14(11)

  14. Clarke J, Lapata M (2008) Global inference for sentence compression: an integer linear programming approach. J Artif Intell Res 31:399

    Article  Google Scholar 

  15. Louis A, Joshi A, Nenkova A (2010) Discourse Indicators for Content Selection in Summaization. In: Proceedings of the 11th annual meeting of the special interest group on discourse and dialogue. Association for Computational Linguistics, pp 147– 156

  16. Erkan G, Radev DR (2004) Lexrank: graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research 22:457

    Article  Google Scholar 

  17. Mihalcea R, Tarau P (2004) Textrank: Bringing order into text. In: Proceedings of the 2004 conference on empirical methods in natural language processing, pp 404–411

  18. Teufel S, Moens M (2002) Summarizing scientific articles: experiments with relevance and rhetorical status. Computational Linguistics 28(4):409

    Article  Google Scholar 

  19. Elkiss A, Shen S, Fader A, Erkan G, States D, Radev D (2008) Blind men and elephants: what do citation summaries tell us about a research article?. Journal of the American Society for Information Science and Technology 59(1):51

    Article  Google Scholar 

  20. Hernández-Alvarez M, Gomez JM (2016) Survey about citation context analysis: tasks, techniques, and resources. Nat Lang Eng 22(3):327

    Article  Google Scholar 

  21. Hoang CDV, Kan MY (2010) Towards automated related work summarization. In: Proceedings of the 23rd international conference on computational linguistics: posters. Association for Computational Linguistics, pp 427–435

  22. Cohan A, Soldaini L, Goharian N (2015) Matching citation text and cited spans in biomedical literature: a search-oriented approach. In: Proceedings of the 2015 conference of the North American Chapter of the association for computational linguistics: human language technologies, pp 1042–1048

  23. Jaidka K, Chandrasekaran MK, Rustagi S, Kan MY (2016) Overview of the CL-SciSumm 2016 shared task. In: Proceedings of the joint workshop on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL), pp 93–102

  24. Qazvinian V, Radev DR (2008) Scientific paper summarization using citation summary networks. In: Proceedings of the 22nd international conference on computational linguistics, vol 1. Association for Computational Linguistics, pp 689–696

  25. Li L, Mao L, Zhang Y, Chi J, Huang T, Cong X, Peng H (2016) Cist system for cl-scisumm 2016 shared task. In: Proceedings of the joint workshop on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL), pp 156–167

  26. Nomoto T (2016) NEAL: A neurally enhanced approach to linking citation and reference. In: Proceedings of the joint workshop on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL), pp 168–174

  27. Klampfl S, Rexha A, Kern R (2016) Identifying Referenced Text in ScientificPublications by Summarisation and Classification Techniques. In: Proceedings of the joint workshop on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL), pp 122–131

  28. Moraes L, Baki S, Verma R, Lee D (2016) Identifying referenced text in scientific publications by summarisation and classification techniques. In: Proceedings of the joint workshop on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL), pp 113–121

  29. Cortes C, Vapnik V (1995) Support-vector networks. Machine learning 20(3):273

    MATH  Google Scholar 

  30. Conroy J, Davis S (2015) Vector space models for scientific document summarization. In: Proceedings of the 1st workshop on vector space modeling for natural language processing, pp 186–191

  31. Malenfant B, Lapalme G (2016) RALI system description for CL-SciSumm 2016 shared task. In: Proceedings of the joint workshop on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL), pp 146–155

  32. Cao Z, Li W, Wu D (2016) Polyu at cl-scisumm 2016. In: Proceedings of the joint workshop on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL), pp 132–138

  33. Wan X, Yang J, Xiao J (2007) Manifold-Ranking Based Topic-Focused Multi-Document Summarization. In: IJCAI, vol 7, pp 2903–2908

  34. Li L, Zhang Y, Mao L, Chi J, Chen M, Huang Z (2017) Cist@ clscisumm-17: multiple features based citation linkage classification and summarization

  35. Abura’ed A, Chiruzzo L, Saggion H, Accuosto P, Bravo Serrano À (2017) Lastus/taln@ Clscisumm-17: cross-document sentence matching and scientific text summarization systems

  36. Bird S, Dale R, Dorr BJ, Gibson B, Joseph MT, Kan MY, Lee D, Powley B, Radev DR, Tan YF (2008) The acl anthology reference corpus: a reference dataset for bibliographic research in computational linguistics

  37. Jaccard P (1901) ÉTude comparative de la distribution florale dans une portion des alpes et des jura. Bull Soc Vaudoise Sci Nat 37:547

    Google Scholar 

  38. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. Journal of machine Learning research 3(Jan):993

    MATH  Google Scholar 

  39. Lauscher A, Glavas G, Eckert K (2017) University of Mannheim@ CLSciSumm-17: Citation-based summarization of scientific articles using semantic textual similarity. CEUR workshop proceedings 2002:33–42. RWTH

    Google Scholar 

  40. Dipankar Das S, Pramanick A (2017) .. In: Proc. of the 2nd joint workshop on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL2017). Tokyo, Japan (August 2017)

  41. Karimi S, Moraes L, Das A, Shakery A, Verma R (2018) Citance-based retrieval and summarization using ir and machine learning. Scientometrics 116(2):1331

    Article  Google Scholar 

  42. Lv Y, Zhai C (2009) Positional language models for information retrieval. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval, pp 299–306

  43. Tian R, Miyao Y, Matsuzaki T (2014) Logical inference on dependency-based compositional semantics. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (volume 1: Long Papers), pp 79–89

  44. Blitzer J, McDonald R, Pereira F (2006) Domain adaptation with structural correspondence learning. In: Proceedings of the 2006 conference on empirical methods in natural language processing. https://www.aclweb.org/anthology/W06-1615. Association for Computational Linguistics, Sydney, pp 120–128

  45. Cohan A, Goharian N (2017) Scientific article summarization using citation-context and article’s discourse structure. arXiv:1704.06619

  46. AbuRa’ed A, Bravo Serrano À, Chiruzzo L, Saggion H (2019) LaSTUS-TALN+ INCO@ CL-SciSumm 2019. BIRNDL@ SIGIR, 224–232

  47. Ma S, Xu J, Wang J, Zhang C (2017) NJUST @ CLSciSumm-17. BIRNDL@SIGIR

  48. Ma S, Zhang H, Xu J, Zhang C (2018) NJUST @ CLSciSumm-18. BIRNDL@SIGIR

  49. Debnath D, Achom A, Pakray P (2018) NLP-NITMZ@ CLScisumm-18. BIRNDL@ SIGIR. pp 164–171

  50. Zerva C, Nghiem MQ, Nguyen NT, Ananiadou S (2020) Cited text span identification for scientific summarisation using pre-trained encoders. Scientometrics 125(3):3109–3137. Springer

    Article  Google Scholar 

  51. Li L, Zhu Y, Xie Y, Huang Z, Liu W, Li X, Liu Y (2019) CIST@ CLSciSumm-19: Automatic Scientific Paper Summarization with Citances and Facets. In: BIRNDL@ SIGIR, pp 196–207

  52. La Quatra M, Cagliero L, Baralis E (2019) Poli2Sum@ CL-SciSumm-19: Identify, Classify, and Summarize Cited Text Spans by means of Ensembles of Supervised Models. In: BIRNDL@ SIGIR , pp 233–246

  53. Ma S, Zhang H, Xu T, Xu J, Hu S, Zhang C (2019) IR&TM-NJUST @ CLSciSumm-19. In: BIRNDL@ SIGIR, pp 181–195

  54. Chiruzzo L, AbuRa’ed A, Bravo À, Saggion H (2019) LaSTUS-TALN+ INCO@ CL-SciSumm 2019. BIRNDL@ SIGIR, pp 224–232

  55. Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: nsga-ii. IEEE Trans Evolutionary Computation 6(2):182

    Article  Google Scholar 

  56. Saini N, Chourasia S, Saha S, Bhattacharyya P (2017) A self organizing map based multi-objective framework for automatic evolution of clusters. In: International conference on neural information processing. Springer, pp 672–682

  57. Suresh K, Kundu D, Ghosh S, Das S, Abraham A (2009) Data clustering using multi-objective differential evolution algorithms. Fundamenta Informaticae 97(4):381

    Article  MathSciNet  Google Scholar 

  58. Das S, Abraham A, Konar A (2007) Automatic clustering using an improved differential evolution algorithm. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 38(1):218

    Article  Google Scholar 

  59. Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95-international conference on neural networks, vol 4. IEEE, pp 1942–1948

  60. Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall Inc.

  61. Saha S, Bandyopadhyay S (2013) A generalized automatic clustering algorithm in a multiobjective framework. Appl Soft Comput 13(1):89

    Article  Google Scholar 

  62. Handl J, Knowles J (2007) An evolutionary approach to multiobjective clustering. IEEE transactions on Evolutionary Computation 11(1):56

    Article  Google Scholar 

  63. Saini N, Saha S, Bhattacharyya P (2019) Automatic scientific document clustering using self-organized multi-objective differential evolution. Cognitive Computation 11(2):271

    Article  Google Scholar 

  64. Hancer E (2020) A new multi-objective differential evolution approach for simultaneous clustering and feature selection. Eng Appl Artif Intell 87:103307

    Article  Google Scholar 

  65. Liu C, Li Y, Zhao Q, Liu C (2019) Reference vector-based multi-objective clustering for high-dimensional data. Appl Soft Comput 78:614

    Article  Google Scholar 

  66. Mukhopadhyay A, Maulik U, Bandyopadhyay S (2009) .. In: 2009 seventh international conference on advances in pattern recognition. IEEE, pp 236–239

  67. Zhang H, Zhou A, Song S, Zhang Q, Gao XZ, Zhang J (2016) A self-organizing multiobjective evolutionary algorithm. IEEE Trans Evol Comput 20(5):792

    Article  Google Scholar 

  68. Kupiec J, Pedersen J, Chen F (1995) A trainable document summarizer. In: Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, pp 68–73

  69. Lin CY (2004) Rouge: A package for automatic evaluation of summaries. In: Text summarization branches out. https://www.aclweb.org/anthology/W04-1013. Association for Computational Linguistics, Barcelona, pp 74–81

  70. Heng W, Yu J, Li L, Liu Y (2013) Research on key factors in multi-document topic modelling application with hlda. Journal of Chinese Information Processing 27(6):117

    Google Scholar 

  71. Robertson S, Zaragoza H (2009) The probabilistic relevance framework: BM25 and beyond Now Publishers Inc

  72. Bendersky M, Croft WB (2008) Discovering key concepts in verbose queries. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, pp 491–498

  73. Faruqui M, Dodge J, Jauhar SK, Dyer C, Hovy E, Smith NA (2014) Retrofitting word vectors to semantic lexicons. arXiv:1411.4166

  74. Zhai C, Lafferty J (2004) A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems (TOIS) 22(2):179

    Article  Google Scholar 

  75. Shutian M, Xu J, Wang J, Zhang C (2017) NJUST @ CLSciSumm-17. BIRNDL@SIGIR

  76. Saini N, Saha S, Bhattacharyya P (2019) Multiobjective-based approach for microblog summarization. IEEE Transactions on Computational Social Systems 6(6):1219–1231

    Article  Google Scholar 

  77. Welch BL (1947) The generalization ofstudent’s’ problem when several different population variances are involved. Biometrika 34(1/2):28

    Article  MathSciNet  Google Scholar 

  78. Alguliev RM, Aliguliyev RM, Isazade NR (2013) Multiple documents summarization based on evolutionary optimization algorithm. Expert Syst Appl 40(5):1675

    Article  Google Scholar 

  79. Nomoto T, Matsumoto Y (2001) A new approach to unsupervised text summarization. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, pp 26–34

  80. Saha S, Mitra S, Kramer S (2018) Exploring multiobjective optimization for multiview clustering. ACM Transactions on Knowledge Discovery from Data (TKDD) 12(4):1

    Article  Google Scholar 

  81. Jaidka K, Yasunaga M, Chandrasekaran MK, Radev D, Kan MY (2019) The cl-scisumm shared task 2018: results and key insights. arXiv:1909.00764

  82. Chandrasekaran MK, Yasunaga M, Radev D, Freitag D, Kan MY (2019) Overview and results: Cl-scisumm shared task 2019. arXiv:1907.09854

  83. Garcia S, Herrera F (2008) An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons. Journal of Machine Learning Research 9(Dec):2677

    MATH  Google Scholar 

Download references

Acknowledgements

This publication is an outcome of the R&D work undertaken in the project under the Visvesvaraya Ph.D. Scheme of Ministry of Electronics & Information Technology, Government of India, being implemented by Digital India Corporation (Formerly Media Lab Asia),

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Santosh Kumar Mishra.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mishra, S.K., Saini, N., Saha, S. et al. Scientific document summarization in multi-objective clustering framework. Appl Intell 52, 1520–1543 (2022). https://doi.org/10.1007/s10489-021-02376-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02376-5

Keywords

Navigation