Abstract
Information Technologies have created many documents, which are the basis of information systems, capable of speeding up the process for which they were developed. These processes provide results trying to imitate human knowledge, but it is still essential to generate techniques that support each other to provide even more precise results. It is essential to incorporate techniques in the automatic extraction of topics to reflect results with greater coherence according to the topic in question. In this work, the behavior of the Latent Dirichlet Analysis (LDA) algorithm is studied by incorporating the hypernymy-type semantic relationship extracted from WordNet in order to improve the results obtained when applying LDA on a set of documents without the use of an external source of knowledge. The experimental results showed an improvement when incorporating hypernyms providing a 1.23 topic coherence for GoogleNews corpus.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alfaro-Flores, R.: Evaluación del efecto en el algoritmo de análisis semántico latente al utilizar colecciones de datos cada vez más grandes para la detección y extracción de sinónimos y su independencia respecto al lenguaje, por medio de su implementación distribuida. Tesis de maestría, Instituto Tecnológico de Costa Rica (2014)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Cong, Y., Chen, B., Liu, H., Zhou, M.: Deep latent dirichlet allocation with topic-layer-adaptive stochastic gradient riemannian mcmc. In: International Conference on Machine Learning, pp. 864–873. PMLR (2017)
George, C.P., Wang, D.Z., Wilson, J.N., Epstein, L.M., Garland, P., Suh, A.: A machine learning based topic exploration and categorization on surveys. In: 2012 11th International Conference on Machine Learning and Applications, vol. 2, pp. 7–12. IEEE (2012)
Gutiérrez, R.M.: Análisis semántico latente:¿teoría psicológica del significado? Revista signos 38(59), 303–323 (2005)
Li, C., Duan, Y., Wang, H., Zhang, Z., Sun, A., Ma, Z.: Enhancing topic modeling for short texts with auxiliary word embeddings. ACM Trans. Inf. Syst. (TOIS) 36(2), 1–30 (2017)
Mehrotra, R., Sanner, S., Buntine, W., Xie, L.: Improving LDA topic models for microblogs via tweet pooling and automatic labeling. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 889–892 (2013)
Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)
Ordun, C., Purushotham, S., Raff, E.: Exploratory analysis of Covid-19 tweets using topic modeling, UMAP, and DIGraphs (2020)
Qiang, J., Qian, Z., Li, Y., Yuan, Y., Wu, X.: Short text topic modeling techniques, applications, and performance: a survey. IEEE Transactions on Knowledge and Data Engineering, p. 1 (2020)
R., G.: Recuperación y acceso a la información. Addison-Wesley, Harlow (2000)
Sridhar, V.K.R.: Unsupervised topic modeling for short texts using distributed representations of words. In: Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, pp. 192–200 (2015)
Torres-Rondón, A., Hojas-Mazo, W., Simón-Cuevas, A.J.: Método de detección de tópicos en documentos basado en análisis contextual del contenido. Informática (2018)
Tovar, M., Pinto, D., Montes, A., González, G., Vilariño, D., Beltrán, B.: Use of Lexico-syntactic patterns for the evaluation of taxonomic relations. In: Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Olvera-Lopez, J.A., Salas-Rodríguez, J., Suen, C.Y. (eds.) MCPR 2014. LNCS, vol. 8495, pp. 331–340. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07491-7_34
Tovar, M., et al.: Evaluación de relaciones ontológicas en corpora de dominio restringido. Computacion y sistemas 19(1), 135–149 (2015)
Tovar, M., Pinto, D., Montes, A., Serna, G., Vilariño, D.: Patterns used to identify relations in corpus using formal concept analysis. In: Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Sossa-Azuela, J.H., Olvera López, J.A., Famili, F. (eds.) MCPR 2015. LNCS, vol. 9116, pp. 236–245. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19264-2_23
Tovar, M., Pinto, D., Montes, A., González, G.: An approach based in LSA for evaluation of ontological relations on domain corpora. In: Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Olvera-López, J.A. (eds.) MCPR 2017. LNCS, vol. 10267, pp. 225–233. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59226-8_22
Wei, W., Guo, C.: A text semantic topic discovery method based on the conditional co-occurrence degree. Neurocomputing 368, 11–24 (2019)
Yu, J., Lu, Y., Muñoz-Justicia, J.: Analyzing Spanish news frames on twitter during Covid-19–a network study of el país and el Mundo. International journal of environmental research and public health 17(15), 5414 (2020)
Acknowledgements
The authors would like to thank Universidad Autónoma Metropolitana, Azcapotzalco. The present work has been funded by the research project SI001-20 at UAM Azcapotzalco, partly supported by project VIEP 2021 at BUAP and by the Consejo Nacional de Ciencia y TecnologíÃÃ’¯Â¿Â½a (CONACYT) with the scholarship number 788155.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Sánchez, A.L.L., Vidal, M.T., Reyes-Ortiz, J.A. (2021). Hypernyms-Based Topic Discovery Using LDA. In: Batyrshin, I., Gelbukh, A., Sidorov, G. (eds) Advances in Soft Computing. MICAI 2021. Lecture Notes in Computer Science(), vol 13068. Springer, Cham. https://doi.org/10.1007/978-3-030-89820-5_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-89820-5_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89819-9
Online ISBN: 978-3-030-89820-5
eBook Packages: Computer ScienceComputer Science (R0)