Abstract
This work presents a method for summarizing scientific articles from the arXive dataset using Variable Neighborhood Search (VNS) heuristics to automatically find the best summaries in terms of ROUGE-1 score we could assemble from scientific article text sentences. Then vectorizing the sentences using BERT pre-trained language model and augmenting the vectors with topic embeddings obtained by applying the K-means algorithm. Finally, training the Random Forest classification model to find sentences suitable for the summary and compile a summary from the selected sentences. The described algorithm produced summaries with high ROUGE-1 scores (0.45 on average), so we are heading for further developments on a larger dataset.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Automatic text summarization - is a process of extracting the most important information from a text.
- 2.
The source code is available on GitHub at https://github.com/iskander-akhmetov/Using-k-means-and-Variable-Neighborhood-Search-for-automatic-summarization-of-scientific-articles/.
References
Knowles, Elizabeth: Oxford dictionary of quotations. Oxford University Press, Oxford (2001)
Graff, David, Cieri, Christopher: English Gigaword - Linguistic Data Consortium. Linguistic Data Consortium (2003)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. (2018)
Breiman, L.: Random forests. Mach. Learn. (2001). https://doi.org/10.1023/A:1010933404324
Hansen, P., Mladenović, N.: Variable neighborhood search. In: Handbook of Heuristics (2018). https://doi.org/10.1007/978-3-319-07124-4_19
Hansen, P., Mladenović, N., Moreno Pérez, J.A.: Variable neighbourhood search: Methods and applications. Ann. Oper. Res. (2010). https://doi.org/10.1007/s10479-009-0657-6
Luhn, H.P.: The Automatic Creation of Literature Abstracts. IBM J. Res. Dev. 2, 159–165 (1958). https://doi.org/10.1147/rd.22.0159
Kupiec, J., Pedersen, J.: A trainable document summarizer. 18th Annu. Int. ACM. (1995)
Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. (arXiv:1912.08777v1 [cs.CL]). arXiv Comput. Sci. https://doi.org/arXiv:1912.08777v1
Liu, Y., Lapata, M.: Text Summarization with Pretrained Encoders. (2019)
Lloret, E., Plaza, L., Aker, A.: The challenging task of summary evaluation: an overview. Lang. Resour. Eval. 52, 101–148 (2018). https://doi.org/10.1007/s10579-017-9399-2
Radev, D.R., Hovy, E., McKeown, K.: Introduction to the Special Issue on Summarization. Comput. Linguist. (2002). https://doi.org/10.1162/089120102762671927
Abualigah, L., Bashabsheh, M.Q., Alabool, H., Shehab, M.: Text Summarization: A Brief Review. Stud. Comput. Intell. 874, 1–15(2020). https://doi.org/10.1007/978-3-030-34614-0_1
Hansen, P., Mladenović, N.: J-Means: a new local search heuristic for minimum sum of squares clustering. Pattern Recognit. (2001). https://doi.org/10.1016/S0031-3203(99)00216-2.
Cohan, A., Dernoncourt, F., Kim, D.S., Bui, T., Kim, S., Chang, W., Goharian, N.: A discourse-aware attention model for abstractive summarization of long documents. NAACL HLT 2018–2018 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. - Proc. Conf. 2, 615–621 (2018). https://doi.org/10.18653/v1/n18-2097
Lin, C.-Y.: ROUGE: A Package for Automatic Evaluation of Summaries. Text Summa 74–81, (2004)
Vanderwende, L., Suzuki, H., Brockett, C., Nenkova, A.: Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion. Inf. Process. Manag. (2007). https://doi.org/10.1016/j.ipm.2007.01.023
Erkan, G., Radev, D.R.: LexRank: Graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. (2004). https://doi.org/10.1613/jair.1523
Jezek, K., Steinberger, J., Ježek, K.: Using latent semantic analysis in text summarization and summary evaluation. In: Proceedings of the 7th International Conference ISIM (2004)
Nallapati, R., Zhou, B., dos Santos, C., Gulçehre, Ç., Xiang, B.: Abstractive text summarization using sequence-to-sequence RNNs and beyond. In: CoNLL 2016–20th SIGNLL Conference on Computational Natural Language Learning, Proceedings (2016). https://doi.org/10.18653/v1/k16-1028
See, A., Liu, P.J., Manning, C.D.: Get to the point: Summarization with pointer-generator networks. ACL 2017–55th Annu. Meet. Assoc. Comput. Linguist. Proc. Conf. (Long Pap. 1, 1073–1083 (2017). https://doi.org/10.18653/v1/P17-1099
Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Brew, J.: HuggingFace’s Transformers: State-of-the-art Natural Language Processing. (2019)
Burke, E., Kendall, G.: Search Methodologies: Introductory Tutorials in Optimization and Decision Support Techniques. Springer Science+Business Media, New York (2014)
Acknowledgement
This work was supported by the Science Committee of RK, under the grants AP08856034, AP09058174, BR05236839.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Akhmetov, I., Mladenovic, N., Mussabayev, R. (2021). Using K-Means and Variable Neighborhood Search for Automatic Summarization of Scientific Articles. In: Mladenovic, N., Sleptchenko, A., Sifaleras, A., Omar, M. (eds) Variable Neighborhood Search. ICVNS 2021. Lecture Notes in Computer Science(), vol 12559. Springer, Cham. https://doi.org/10.1007/978-3-030-69625-2_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-69625-2_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69624-5
Online ISBN: 978-3-030-69625-2
eBook Packages: Computer ScienceComputer Science (R0)