Keyword Extraction Approach Based on Probabilistic-Entropy, Graph, and Neural Network Methods

Selivanov, Anton A.; Moloshnikov, Ivan A.; Rybka, Roman B.; Sboev, Alexandr G.

doi:10.1007/978-3-030-59535-7_21

Keyword Extraction Approach Based on Probabilistic-Entropy, Graph, and Neural Network Methods

Conference paper
First Online: 22 September 2020

966 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12412))

Abstract

Nowadays, methods of automatic keyword extraction are developed based on statistical and graph features of texts. The transfer of learning approaches allows one to use additional word features obtained from deep neural network models fitted to solve different tasks. The paper proposes an integrated approach to keyword extraction based on a classification model that aggregates results of probabilistic-entropy, graph methods, and word features extracted from a neural network for text title generation. To validate the method, a dataset of news texts was gathered, with keywords manually selected through crowdsourcing. For the proposed approach F1-measure weighted by classes accuracy of keyword extraction is 72%, which is approximately 5% better in comparison with the existing methods.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Pre-fitted model could be found here: https://github.com/google-research/bert.
2.
https://github.com/RossiyaSegodnya/ria_news_dataset.
3.
You could get an access to dataset through contacts available at https://sagteam.ru.

References

Rose, S., Engel, D., Cramer, N., Cowley, W.: Automatic keyword extraction from individual documents. Text Min. Appl. Theory 1, 1–20 (2010)
Google Scholar
El-Beltagy, S.R., Rafea, A.: KP-miner: participation in SemEval-2. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 190–193 (2010)
Google Scholar
Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., Jatowt, A.: YAKE! keyword extraction from single documents using multiple local features. Inf. Sci. 509, 257–89 (2020)
Article Google Scholar
Gydovskikh, D.V., Moloshnikov, I.A., Naumov, A.V., Rybka, R.B., Sboev, A.G., Selivanov, A.A.: A probabilistically entropic mechanism of topical clusterisation along with thematic annotation for evolution analysis of meaningful social information of internet sources. Lobachevskii J. Math. 38(5), 910–913 (2017). https://doi.org/10.1134/S1995080217050134
Article MathSciNet Google Scholar
Gianni, A., Rijsbergen, V.: Cornelis Joost probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst. (TOIS) 20(4), 357–389 (2002)
Article Google Scholar
Mihalcea R., Tarau P. Textrank: Bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411 (2004)
Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Stanford InfoLab, 11 November 1999
Google Scholar
Wan, X., Xiao, J.: CollabRank: towards a collaborative approach to single-document keyphrase extraction. In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pp. 969–976 (2008)
Google Scholar
Bougouin, A., Boudin, F.: TopicRank: topic ranking for automatic keyphrase extraction, no. 55. pp. 45–69 (2014)
Google Scholar
Florescu, C., Caragea, C.: PositionRank: an unsupervised approach to keyphrase extraction from scholarly documents. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 1105–1115 (2017)
Google Scholar
Boudin, F.: Unsupervised keyphrase extraction with multipartite graphs. arXiv preprint arXiv:1803.08721 (2018)
Moloshnikov, I.A., Gryaznov, A.V., Vlasov, D.S., Sboev, A.G.: Vibor effectivnogo neirosetevovo metoda formirovaniya zagolovkov. In: NRNC MePhI. VI International Conference “Lasernie i plasmennie tehnologii i issledovaniya, LaPlaz-2020” proceedings, vol. 1., pp. 80–81 (2020)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Sokolov, A.M.: Phrase-based attentional transformer dlya generacii zagolovkov. Kompyuternaya lingvistika i intellektualnie technologii (po materialam ezhegodnoi konferencii “Dialog”), no. 18 (2019). Additional tome
Google Scholar
Friedman, J.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001)
Article MathSciNet Google Scholar
Pedregosa, F.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)
MathSciNet MATH Google Scholar
Straka, M., Straková, J.: Tokenizing, POS Tagging, lemmatizing and parsing UD 2.0 with UDPipe. In: Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies (2017)
Google Scholar

Download references

Acknowledgements.

The reported study was funded by RFBR (project 18-29-10084). This work has been carried out using computing resources of the federal collective user center Complex for Simulation and Data Processing for Mega-science Facilities at NRC “Kurchatov Institute”, http://ckp.nrcki.ru/.

Author information

Authors and Affiliations

NRC “Kurchatov Institute”, Moscow, Russia
Anton A. Selivanov, Ivan A. Moloshnikov, Roman B. Rybka & Alexandr G. Sboev

Authors

Anton A. Selivanov
View author publications
You can also search for this author in PubMed Google Scholar
Ivan A. Moloshnikov
View author publications
You can also search for this author in PubMed Google Scholar
Roman B. Rybka
View author publications
You can also search for this author in PubMed Google Scholar
Alexandr G. Sboev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anton A. Selivanov .

Editor information

Editors and Affiliations

National Research University Higher School, Moscow, Russia
Sergei O. Kuznetsov
Moscow Institute of Physics and Technology, Dolgoprudny, Russia
Aleksandr I. Panov
Federal Research Center Computer Science and Control, Moscow, Russia
Konstantin S. Yakovlev

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Selivanov, A.A., Moloshnikov, I.A., Rybka, R.B., Sboev, A.G. (2020). Keyword Extraction Approach Based on Probabilistic-Entropy, Graph, and Neural Network Methods. In: Kuznetsov, S.O., Panov, A.I., Yakovlev, K.S. (eds) Artificial Intelligence. RCAI 2020. Lecture Notes in Computer Science(), vol 12412. Springer, Cham. https://doi.org/10.1007/978-3-030-59535-7_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-59535-7_21
Published: 22 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59534-0
Online ISBN: 978-3-030-59535-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics