Hypernyms-Based Topic Discovery Using LDA

Sánchez, Ana Laura Lezama; Vidal, Mireya Tovar; Reyes-Ortiz, José A.

doi:10.1007/978-3-030-89820-5_6

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13068))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

741 Accesses

Abstract

Information Technologies have created many documents, which are the basis of information systems, capable of speeding up the process for which they were developed. These processes provide results trying to imitate human knowledge, but it is still essential to generate techniques that support each other to provide even more precise results. It is essential to incorporate techniques in the automatic extraction of topics to reflect results with greater coherence according to the topic in question. In this work, the behavior of the Latent Dirichlet Analysis (LDA) algorithm is studied by incorporating the hypernymy-type semantic relationship extracted from WordNet in order to improve the results obtained when applying LDA on a set of documents without the use of an external source of knowledge. The experimental results showed an improvement when incorporating hypernyms providing a 1.23 topic coherence for GoogleNews corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alfaro-Flores, R.: Evaluación del efecto en el algoritmo de análisis semántico latente al utilizar colecciones de datos cada vez más grandes para la detección y extracción de sinónimos y su independencia respecto al lenguaje, por medio de su implementación distribuida. Tesis de maestría, Instituto Tecnológico de Costa Rica (2014)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Cong, Y., Chen, B., Liu, H., Zhou, M.: Deep latent dirichlet allocation with topic-layer-adaptive stochastic gradient riemannian mcmc. In: International Conference on Machine Learning, pp. 864–873. PMLR (2017)
Google Scholar
George, C.P., Wang, D.Z., Wilson, J.N., Epstein, L.M., Garland, P., Suh, A.: A machine learning based topic exploration and categorization on surveys. In: 2012 11th International Conference on Machine Learning and Applications, vol. 2, pp. 7–12. IEEE (2012)
Google Scholar
Gutiérrez, R.M.: Análisis semántico latente:¿teoría psicológica del significado? Revista signos 38(59), 303–323 (2005)
Google Scholar
Li, C., Duan, Y., Wang, H., Zhang, Z., Sun, A., Ma, Z.: Enhancing topic modeling for short texts with auxiliary word embeddings. ACM Trans. Inf. Syst. (TOIS) 36(2), 1–30 (2017)
Article Google Scholar
Mehrotra, R., Sanner, S., Buntine, W., Xie, L.: Improving LDA topic models for microblogs via tweet pooling and automatic labeling. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 889–892 (2013)
Google Scholar
Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Ordun, C., Purushotham, S., Raff, E.: Exploratory analysis of Covid-19 tweets using topic modeling, UMAP, and DIGraphs (2020)
Google Scholar
Qiang, J., Qian, Z., Li, Y., Yuan, Y., Wu, X.: Short text topic modeling techniques, applications, and performance: a survey. IEEE Transactions on Knowledge and Data Engineering, p. 1 (2020)
Google Scholar
R., G.: Recuperación y acceso a la información. Addison-Wesley, Harlow (2000)
Google Scholar
Sridhar, V.K.R.: Unsupervised topic modeling for short texts using distributed representations of words. In: Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, pp. 192–200 (2015)
Google Scholar
Torres-Rondón, A., Hojas-Mazo, W., Simón-Cuevas, A.J.: Método de detección de tópicos en documentos basado en análisis contextual del contenido. Informática (2018)
Google Scholar
Tovar, M., Pinto, D., Montes, A., González, G., Vilariño, D., Beltrán, B.: Use of Lexico-syntactic patterns for the evaluation of taxonomic relations. In: Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Olvera-Lopez, J.A., Salas-Rodríguez, J., Suen, C.Y. (eds.) MCPR 2014. LNCS, vol. 8495, pp. 331–340. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07491-7_34
Chapter Google Scholar
Tovar, M., et al.: Evaluación de relaciones ontológicas en corpora de dominio restringido. Computacion y sistemas 19(1), 135–149 (2015)
Google Scholar
Tovar, M., Pinto, D., Montes, A., Serna, G., Vilariño, D.: Patterns used to identify relations in corpus using formal concept analysis. In: Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Sossa-Azuela, J.H., Olvera López, J.A., Famili, F. (eds.) MCPR 2015. LNCS, vol. 9116, pp. 236–245. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19264-2_23
Chapter Google Scholar
Tovar, M., Pinto, D., Montes, A., González, G.: An approach based in LSA for evaluation of ontological relations on domain corpora. In: Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Olvera-López, J.A. (eds.) MCPR 2017. LNCS, vol. 10267, pp. 225–233. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59226-8_22
Chapter Google Scholar
Wei, W., Guo, C.: A text semantic topic discovery method based on the conditional co-occurrence degree. Neurocomputing 368, 11–24 (2019)
Article Google Scholar
Yu, J., Lu, Y., Muñoz-Justicia, J.: Analyzing Spanish news frames on twitter during Covid-19–a network study of el país and el Mundo. International journal of environmental research and public health 17(15), 5414 (2020)
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank Universidad Autónoma Metropolitana, Azcapotzalco. The present work has been funded by the research project SI001-20 at UAM Azcapotzalco, partly supported by project VIEP 2021 at BUAP and by the Consejo Nacional de Ciencia y TecnologíÃƒÆ’Ãâ€™Ãƒâ€šÃ‚Â¯ÃƒÆ’Ã¢â‚¬Å¡Ãƒâ€šÃ‚Â¿ÃƒÆ’Ã¢â‚¬Å¡Ãƒâ€šÃ‚Â½a (CONACYT) with the scholarship number 788155.

Author information

Authors and Affiliations

Facultad de Ciencias de la Computación, Benemérita Universidad Autónoma de Puebla, 72590, Puebla, Mexico
Ana Laura Lezama Sánchez & Mireya Tovar Vidal
Departamento de Sistemas, Universidad Autónoma Metropolitana, Ciudad de México, Azcapotzalco, 02200, Mexico
José A. Reyes-Ortiz

Authors

Ana Laura Lezama Sánchez
View author publications
You can also search for this author in PubMed Google Scholar
Mireya Tovar Vidal
View author publications
You can also search for this author in PubMed Google Scholar
José A. Reyes-Ortiz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ana Laura Lezama Sánchez .

Editor information

Editors and Affiliations

Instituto Politécnico Nacional, Centro de Investigación en Computación, Mexico City, Mexico
Ildar Batyrshin
Instituto Politécnico Nacional, Centro de Investigación en Computación, Mexico City, Mexico
Alexander Gelbukh
Instituto Politécnico Nacional, Centro de Investigación en Computación, Mexico City, Mexico
Grigori Sidorov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sánchez, A.L.L., Vidal, M.T., Reyes-Ortiz, J.A. (2021). Hypernyms-Based Topic Discovery Using LDA. In: Batyrshin, I., Gelbukh, A., Sidorov, G. (eds) Advances in Soft Computing. MICAI 2021. Lecture Notes in Computer Science(), vol 13068. Springer, Cham. https://doi.org/10.1007/978-3-030-89820-5_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-89820-5_6
Published: 21 October 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89819-9
Online ISBN: 978-3-030-89820-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics