Abstract
The bibliographic databases have abstract and citations of scientific articles, the summary being the most consulted section of an article. In order to classify and address the entries in a system of indexing and retrieval of information in the databases of a manuscript, there are keywords, which in many cases this information should not achieve greater dissemination. This paper presents an evaluation of the semantic relatedness between the abstract of scientific papers and their keywords. This analysis will be using word2vec that is a predictive model, and it will find the nearest words. Thus, this study is focused on the metadata quality assessment through the similar semantics between two words that allow the accuracy in relation to metadata of scientific databases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
Common words like the, at, which, and others.
References
Yih, W., Zweig, G., Platt, J.C.: Polarity inducing latent semantic analysis. In: Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 1212–1222 (2012)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space, CoRR, vol. abs/1301.3 (2013)
Yan, E., Zhu, Y.: Tracking word semantic change in biomedical literature. Int. J. Med. Inform. 109, 76–86 (2018)
Ferrone, L., Zanzotto, F.M.: A symbolic, distributed and distributional representations for natural language processing in the era of deep learning: a survey (2017)
Goldberg, Y.: A primer on neural network models for natural language processing. JAIR 57, 345–420 (2016)
Deerwest, S.T., Dumais, G.W., Furnas, T.K., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41, 1212–1222 (1990)
Mikolov, T., Le, Q.V., Sutskever, I.: Exploiting Similarities among Languages for Machine Translation, CoRR, vol. abs/1309.4 (2013)
Romero Pelaez, A., Segarra-Faggioni, V., Alarcon, P.P.: Exploring the provenance and accuracy as metadata quality metrics in assessment resources of OCW repositories. In: ICETC 2018 (2018)
Baroni, M., Dinu, G., Kruszewski, G.: Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors (2014)
ISO 25000 software product quality. http://iso25000.com/index.php/en/iso-25000-standards/iso-25012. Accessed 01 Apr 2018
Acknowledgments
The research team would like to thank Universidad Técnica Particular de Loja, especially to Tecnologías Avanzadas de la Web y SBC Group.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Segarra-Faggioni, V., Romero-Pelaez, A. (2019). Analyzing Scientific Corpora Using Word Embedding. In: Rocha, Á., Ferrás, C., Paredes, M. (eds) Information Technology and Systems. ICITS 2019. Advances in Intelligent Systems and Computing, vol 918. Springer, Cham. https://doi.org/10.1007/978-3-030-11890-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-11890-7_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11889-1
Online ISBN: 978-3-030-11890-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)