Abstract
Nowadays, methods of automatic keyword extraction are developed based on statistical and graph features of texts. The transfer of learning approaches allows one to use additional word features obtained from deep neural network models fitted to solve different tasks. The paper proposes an integrated approach to keyword extraction based on a classification model that aggregates results of probabilistic-entropy, graph methods, and word features extracted from a neural network for text title generation. To validate the method, a dataset of news texts was gathered, with keywords manually selected through crowdsourcing. For the proposed approach F1-measure weighted by classes accuracy of keyword extraction is 72%, which is approximately 5% better in comparison with the existing methods.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Pre-fitted model could be found here: https://github.com/google-research/bert.
- 2.
- 3.
You could get an access to dataset through contacts available at https://sagteam.ru.
References
Rose, S., Engel, D., Cramer, N., Cowley, W.: Automatic keyword extraction from individual documents. Text Min. Appl. Theory 1, 1–20 (2010)
El-Beltagy, S.R., Rafea, A.: KP-miner: participation in SemEval-2. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 190–193 (2010)
Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., Jatowt, A.: YAKE! keyword extraction from single documents using multiple local features. Inf. Sci. 509, 257–89 (2020)
Gydovskikh, D.V., Moloshnikov, I.A., Naumov, A.V., Rybka, R.B., Sboev, A.G., Selivanov, A.A.: A probabilistically entropic mechanism of topical clusterisation along with thematic annotation for evolution analysis of meaningful social information of internet sources. Lobachevskii J. Math. 38(5), 910–913 (2017). https://doi.org/10.1134/S1995080217050134
Gianni, A., Rijsbergen, V.: Cornelis Joost probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst. (TOIS) 20(4), 357–389 (2002)
Mihalcea R., Tarau P. Textrank: Bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411 (2004)
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Stanford InfoLab, 11 November 1999
Wan, X., Xiao, J.: CollabRank: towards a collaborative approach to single-document keyphrase extraction. In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pp. 969–976 (2008)
Bougouin, A., Boudin, F.: TopicRank: topic ranking for automatic keyphrase extraction, no. 55. pp. 45–69 (2014)
Florescu, C., Caragea, C.: PositionRank: an unsupervised approach to keyphrase extraction from scholarly documents. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 1105–1115 (2017)
Boudin, F.: Unsupervised keyphrase extraction with multipartite graphs. arXiv preprint arXiv:1803.08721 (2018)
Moloshnikov, I.A., Gryaznov, A.V., Vlasov, D.S., Sboev, A.G.: Vibor effectivnogo neirosetevovo metoda formirovaniya zagolovkov. In: NRNC MePhI. VI International Conference “Lasernie i plasmennie tehnologii i issledovaniya, LaPlaz-2020” proceedings, vol. 1., pp. 80–81 (2020)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Sokolov, A.M.: Phrase-based attentional transformer dlya generacii zagolovkov. Kompyuternaya lingvistika i intellektualnie technologii (po materialam ezhegodnoi konferencii “Dialog”), no. 18 (2019). Additional tome
Friedman, J.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001)
Pedregosa, F.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)
Straka, M., Straková, J.: Tokenizing, POS Tagging, lemmatizing and parsing UD 2.0 with UDPipe. In: Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies (2017)
Acknowledgements.
The reported study was funded by RFBR (project 18-29-10084). This work has been carried out using computing resources of the federal collective user center Complex for Simulation and Data Processing for Mega-science Facilities at NRC “Kurchatov Institute”, http://ckp.nrcki.ru/.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Selivanov, A.A., Moloshnikov, I.A., Rybka, R.B., Sboev, A.G. (2020). Keyword Extraction Approach Based on Probabilistic-Entropy, Graph, and Neural Network Methods. In: Kuznetsov, S.O., Panov, A.I., Yakovlev, K.S. (eds) Artificial Intelligence. RCAI 2020. Lecture Notes in Computer Science(), vol 12412. Springer, Cham. https://doi.org/10.1007/978-3-030-59535-7_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-59535-7_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59534-0
Online ISBN: 978-3-030-59535-7
eBook Packages: Computer ScienceComputer Science (R0)