Using K-Means and Variable Neighborhood Search for Automatic Summarization of Scientific Articles

Akhmetov, I.; Mladenovic, N.; Mussabayev, R.

doi:10.1007/978-3-030-69625-2_13

Using K-Means and Variable Neighborhood Search for Automatic Summarization of Scientific Articles

Conference paper
First Online: 15 March 2021

422 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12559))

Abstract

This work presents a method for summarizing scientific articles from the arXive dataset using Variable Neighborhood Search (VNS) heuristics to automatically find the best summaries in terms of ROUGE-1 score we could assemble from scientific article text sentences. Then vectorizing the sentences using BERT pre-trained language model and augmenting the vectors with topic embeddings obtained by applying the K-means algorithm. Finally, training the Random Forest classification model to find sentences suitable for the summary and compile a summary from the selected sentences. The described algorithm produced summaries with high ROUGE-1 scores (0.45 on average), so we are heading for further developments on a larger dataset.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Automatic text summarization - is a process of extracting the most important information from a text.
2.
The source code is available on GitHub at https://github.com/iskander-akhmetov/Using-k-means-and-Variable-Neighborhood-Search-for-automatic-summarization-of-scientific-articles/.

References

Knowles, Elizabeth: Oxford dictionary of quotations. Oxford University Press, Oxford (2001)
Google Scholar
Graff, David, Cieri, Christopher: English Gigaword - Linguistic Data Consortium. Linguistic Data Consortium (2003)
Google Scholar
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. (2018)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. (2001). https://doi.org/10.1023/A:1010933404324
Hansen, P., Mladenović, N.: Variable neighborhood search. In: Handbook of Heuristics (2018). https://doi.org/10.1007/978-3-319-07124-4_19
Hansen, P., Mladenović, N., Moreno Pérez, J.A.: Variable neighbourhood search: Methods and applications. Ann. Oper. Res. (2010). https://doi.org/10.1007/s10479-009-0657-6
Luhn, H.P.: The Automatic Creation of Literature Abstracts. IBM J. Res. Dev. 2, 159–165 (1958). https://doi.org/10.1147/rd.22.0159
Kupiec, J., Pedersen, J.: A trainable document summarizer. 18th Annu. Int. ACM. (1995)
Google Scholar
Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. (arXiv:1912.08777v1 [cs.CL]). arXiv Comput. Sci. https://doi.org/arXiv:1912.08777v1
Liu, Y., Lapata, M.: Text Summarization with Pretrained Encoders. (2019)
Google Scholar
Lloret, E., Plaza, L., Aker, A.: The challenging task of summary evaluation: an overview. Lang. Resour. Eval. 52, 101–148 (2018). https://doi.org/10.1007/s10579-017-9399-2
Radev, D.R., Hovy, E., McKeown, K.: Introduction to the Special Issue on Summarization. Comput. Linguist. (2002). https://doi.org/10.1162/089120102762671927
Abualigah, L., Bashabsheh, M.Q., Alabool, H., Shehab, M.: Text Summarization: A Brief Review. Stud. Comput. Intell. 874, 1–15(2020). https://doi.org/10.1007/978-3-030-34614-0_1
Hansen, P., Mladenović, N.: J-Means: a new local search heuristic for minimum sum of squares clustering. Pattern Recognit. (2001). https://doi.org/10.1016/S0031-3203(99)00216-2.
Cohan, A., Dernoncourt, F., Kim, D.S., Bui, T., Kim, S., Chang, W., Goharian, N.: A discourse-aware attention model for abstractive summarization of long documents. NAACL HLT 2018–2018 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. - Proc. Conf. 2, 615–621 (2018). https://doi.org/10.18653/v1/n18-2097
Lin, C.-Y.: ROUGE: A Package for Automatic Evaluation of Summaries. Text Summa 74–81, (2004)
Google Scholar
Vanderwende, L., Suzuki, H., Brockett, C., Nenkova, A.: Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion. Inf. Process. Manag. (2007). https://doi.org/10.1016/j.ipm.2007.01.023
Erkan, G., Radev, D.R.: LexRank: Graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. (2004). https://doi.org/10.1613/jair.1523
Jezek, K., Steinberger, J., Ježek, K.: Using latent semantic analysis in text summarization and summary evaluation. In: Proceedings of the 7th International Conference ISIM (2004)
Google Scholar
Nallapati, R., Zhou, B., dos Santos, C., Gulçehre, Ç., Xiang, B.: Abstractive text summarization using sequence-to-sequence RNNs and beyond. In: CoNLL 2016–20th SIGNLL Conference on Computational Natural Language Learning, Proceedings (2016). https://doi.org/10.18653/v1/k16-1028
See, A., Liu, P.J., Manning, C.D.: Get to the point: Summarization with pointer-generator networks. ACL 2017–55th Annu. Meet. Assoc. Comput. Linguist. Proc. Conf. (Long Pap. 1, 1073–1083 (2017). https://doi.org/10.18653/v1/P17-1099
Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Brew, J.: HuggingFace’s Transformers: State-of-the-art Natural Language Processing. (2019)
Google Scholar
Burke, E., Kendall, G.: Search Methodologies: Introductory Tutorials in Optimization and Decision Support Techniques. Springer Science+Business Media, New York (2014)
Book Google Scholar

Download references

Acknowledgement

This work was supported by the Science Committee of RK, under the grants AP08856034, AP09058174, BR05236839.

Author information

Authors and Affiliations

Institute of Information and Computational Technologies, Pushkin str. 125, Almaty, Kazakhstan
I. Akhmetov & R. Mussabayev
Kazakh-British Technical University, Almaty, Kazakhstan
I. Akhmetov
Khalifa University of Science Technology: Abu Dhabi, Abu Dhabi, UAE
N. Mladenovic

Authors

I. Akhmetov
View author publications
You can also search for this author in PubMed Google Scholar
N. Mladenovic
View author publications
You can also search for this author in PubMed Google Scholar
R. Mussabayev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to I. Akhmetov .

Editor information

Editors and Affiliations

Khalifa University, Abu Dhabi, United Arab Emirates
Nenad Mladenovic
Khalifa University, Abu Dhabi, United Arab Emirates
Andrei Sleptchenko
University of Macedonia, Thessaloniki, Greece
Angelo Sifaleras
Khalifa University, Abu Dhabi, United Arab Emirates
Mohammed Omar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Akhmetov, I., Mladenovic, N., Mussabayev, R. (2021). Using K-Means and Variable Neighborhood Search for Automatic Summarization of Scientific Articles. In: Mladenovic, N., Sleptchenko, A., Sifaleras, A., Omar, M. (eds) Variable Neighborhood Search. ICVNS 2021. Lecture Notes in Computer Science(), vol 12559. Springer, Cham. https://doi.org/10.1007/978-3-030-69625-2_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-69625-2_13
Published: 15 March 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69624-5
Online ISBN: 978-3-030-69625-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics