Skip to main content

Using K-Means and Variable Neighborhood Search for Automatic Summarization of Scientific Articles

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12559))

Abstract

This work presents a method for summarizing scientific articles from the arXive dataset using Variable Neighborhood Search (VNS) heuristics to automatically find the best summaries in terms of ROUGE-1 score we could assemble from scientific article text sentences. Then vectorizing the sentences using BERT pre-trained language model and augmenting the vectors with topic embeddings obtained by applying the K-means algorithm. Finally, training the Random Forest classification model to find sentences suitable for the summary and compile a summary from the selected sentences. The described algorithm produced summaries with high ROUGE-1 scores (0.45 on average), so we are heading for further developments on a larger dataset.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Automatic text summarization - is a process of extracting the most important information from a text.

  2. 2.

    The source code is available on GitHub at https://github.com/iskander-akhmetov/Using-k-means-and-Variable-Neighborhood-Search-for-automatic-summarization-of-scientific-articles/.

References

  1. Knowles, Elizabeth: Oxford dictionary of quotations. Oxford University Press, Oxford (2001)

    Google Scholar 

  2. Graff, David, Cieri, Christopher: English Gigaword - Linguistic Data Consortium. Linguistic Data Consortium (2003)

    Google Scholar 

  3. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. (2018)

    Google Scholar 

  4. Breiman, L.: Random forests. Mach. Learn. (2001). https://doi.org/10.1023/A:1010933404324

  5. Hansen, P., Mladenović, N.: Variable neighborhood search. In: Handbook of Heuristics (2018). https://doi.org/10.1007/978-3-319-07124-4_19

  6. Hansen, P., Mladenović, N., Moreno Pérez, J.A.: Variable neighbourhood search: Methods and applications. Ann. Oper. Res. (2010). https://doi.org/10.1007/s10479-009-0657-6

  7. Luhn, H.P.: The Automatic Creation of Literature Abstracts. IBM J. Res. Dev. 2, 159–165 (1958). https://doi.org/10.1147/rd.22.0159

  8. Kupiec, J., Pedersen, J.: A trainable document summarizer. 18th Annu. Int. ACM. (1995)

    Google Scholar 

  9. Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. (arXiv:1912.08777v1 [cs.CL]). arXiv Comput. Sci. https://doi.org/arXiv:1912.08777v1

  10. Liu, Y., Lapata, M.: Text Summarization with Pretrained Encoders. (2019)

    Google Scholar 

  11. Lloret, E., Plaza, L., Aker, A.: The challenging task of summary evaluation: an overview. Lang. Resour. Eval. 52, 101–148 (2018). https://doi.org/10.1007/s10579-017-9399-2

  12. Radev, D.R., Hovy, E., McKeown, K.: Introduction to the Special Issue on Summarization. Comput. Linguist. (2002). https://doi.org/10.1162/089120102762671927

  13. Abualigah, L., Bashabsheh, M.Q., Alabool, H., Shehab, M.: Text Summarization: A Brief Review. Stud. Comput. Intell. 874, 1–15(2020). https://doi.org/10.1007/978-3-030-34614-0_1

  14. Hansen, P., Mladenović, N.: J-Means: a new local search heuristic for minimum sum of squares clustering. Pattern Recognit. (2001). https://doi.org/10.1016/S0031-3203(99)00216-2.

  15. Cohan, A., Dernoncourt, F., Kim, D.S., Bui, T., Kim, S., Chang, W., Goharian, N.: A discourse-aware attention model for abstractive summarization of long documents. NAACL HLT 2018–2018 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. - Proc. Conf. 2, 615–621 (2018). https://doi.org/10.18653/v1/n18-2097

  16. Lin, C.-Y.: ROUGE: A Package for Automatic Evaluation of Summaries. Text Summa 74–81, (2004)

    Google Scholar 

  17. Vanderwende, L., Suzuki, H., Brockett, C., Nenkova, A.: Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion. Inf. Process. Manag. (2007). https://doi.org/10.1016/j.ipm.2007.01.023

  18. Erkan, G., Radev, D.R.: LexRank: Graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. (2004). https://doi.org/10.1613/jair.1523

  19. Jezek, K., Steinberger, J., Ježek, K.: Using latent semantic analysis in text summarization and summary evaluation. In: Proceedings of the 7th International Conference ISIM (2004)

    Google Scholar 

  20. Nallapati, R., Zhou, B., dos Santos, C., Gulçehre, Ç., Xiang, B.: Abstractive text summarization using sequence-to-sequence RNNs and beyond. In: CoNLL 2016–20th SIGNLL Conference on Computational Natural Language Learning, Proceedings (2016). https://doi.org/10.18653/v1/k16-1028

  21. See, A., Liu, P.J., Manning, C.D.: Get to the point: Summarization with pointer-generator networks. ACL 2017–55th Annu. Meet. Assoc. Comput. Linguist. Proc. Conf. (Long Pap. 1, 1073–1083 (2017). https://doi.org/10.18653/v1/P17-1099

  22. Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Brew, J.: HuggingFace’s Transformers: State-of-the-art Natural Language Processing. (2019)

    Google Scholar 

  23. Burke, E., Kendall, G.: Search Methodologies: Introductory Tutorials in Optimization and Decision Support Techniques. Springer Science+Business Media, New York (2014)

    Book  Google Scholar 

Download references

Acknowledgement

This work was supported by the Science Committee of RK, under the grants AP08856034, AP09058174, BR05236839.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to I. Akhmetov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Akhmetov, I., Mladenovic, N., Mussabayev, R. (2021). Using K-Means and Variable Neighborhood Search for Automatic Summarization of Scientific Articles. In: Mladenovic, N., Sleptchenko, A., Sifaleras, A., Omar, M. (eds) Variable Neighborhood Search. ICVNS 2021. Lecture Notes in Computer Science(), vol 12559. Springer, Cham. https://doi.org/10.1007/978-3-030-69625-2_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-69625-2_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-69624-5

  • Online ISBN: 978-3-030-69625-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics