Skip to main content

Hypernyms-Based Topic Discovery Using LDA

  • Conference paper
  • First Online:
Advances in Soft Computing (MICAI 2021)

Abstract

Information Technologies have created many documents, which are the basis of information systems, capable of speeding up the process for which they were developed. These processes provide results trying to imitate human knowledge, but it is still essential to generate techniques that support each other to provide even more precise results. It is essential to incorporate techniques in the automatic extraction of topics to reflect results with greater coherence according to the topic in question. In this work, the behavior of the Latent Dirichlet Analysis (LDA) algorithm is studied by incorporating the hypernymy-type semantic relationship extracted from WordNet in order to improve the results obtained when applying LDA on a set of documents without the use of an external source of knowledge. The experimental results showed an improvement when incorporating hypernyms providing a 1.23 topic coherence for GoogleNews corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alfaro-Flores, R.: Evaluación del efecto en el algoritmo de análisis semántico latente al utilizar colecciones de datos cada vez más grandes para la detección y extracción de sinónimos y su independencia respecto al lenguaje, por medio de su implementación distribuida. Tesis de maestría, Instituto Tecnológico de Costa Rica (2014)

    Google Scholar 

  2. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  3. Cong, Y., Chen, B., Liu, H., Zhou, M.: Deep latent dirichlet allocation with topic-layer-adaptive stochastic gradient riemannian mcmc. In: International Conference on Machine Learning, pp. 864–873. PMLR (2017)

    Google Scholar 

  4. George, C.P., Wang, D.Z., Wilson, J.N., Epstein, L.M., Garland, P., Suh, A.: A machine learning based topic exploration and categorization on surveys. In: 2012 11th International Conference on Machine Learning and Applications, vol. 2, pp. 7–12. IEEE (2012)

    Google Scholar 

  5. Gutiérrez, R.M.: Análisis semántico latente:¿teoría psicológica del significado? Revista signos 38(59), 303–323 (2005)

    Google Scholar 

  6. Li, C., Duan, Y., Wang, H., Zhang, Z., Sun, A., Ma, Z.: Enhancing topic modeling for short texts with auxiliary word embeddings. ACM Trans. Inf. Syst. (TOIS) 36(2), 1–30 (2017)

    Article  Google Scholar 

  7. Mehrotra, R., Sanner, S., Buntine, W., Xie, L.: Improving LDA topic models for microblogs via tweet pooling and automatic labeling. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 889–892 (2013)

    Google Scholar 

  8. Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  9. Ordun, C., Purushotham, S., Raff, E.: Exploratory analysis of Covid-19 tweets using topic modeling, UMAP, and DIGraphs (2020)

    Google Scholar 

  10. Qiang, J., Qian, Z., Li, Y., Yuan, Y., Wu, X.: Short text topic modeling techniques, applications, and performance: a survey. IEEE Transactions on Knowledge and Data Engineering, p. 1 (2020)

    Google Scholar 

  11. R., G.: Recuperación y acceso a la información. Addison-Wesley, Harlow (2000)

    Google Scholar 

  12. Sridhar, V.K.R.: Unsupervised topic modeling for short texts using distributed representations of words. In: Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, pp. 192–200 (2015)

    Google Scholar 

  13. Torres-Rondón, A., Hojas-Mazo, W., Simón-Cuevas, A.J.: Método de detección de tópicos en documentos basado en análisis contextual del contenido. Informática (2018)

    Google Scholar 

  14. Tovar, M., Pinto, D., Montes, A., González, G., Vilariño, D., Beltrán, B.: Use of Lexico-syntactic patterns for the evaluation of taxonomic relations. In: Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Olvera-Lopez, J.A., Salas-Rodríguez, J., Suen, C.Y. (eds.) MCPR 2014. LNCS, vol. 8495, pp. 331–340. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07491-7_34

    Chapter  Google Scholar 

  15. Tovar, M., et al.: Evaluación de relaciones ontológicas en corpora de dominio restringido. Computacion y sistemas 19(1), 135–149 (2015)

    Google Scholar 

  16. Tovar, M., Pinto, D., Montes, A., Serna, G., Vilariño, D.: Patterns used to identify relations in corpus using formal concept analysis. In: Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Sossa-Azuela, J.H., Olvera López, J.A., Famili, F. (eds.) MCPR 2015. LNCS, vol. 9116, pp. 236–245. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19264-2_23

    Chapter  Google Scholar 

  17. Tovar, M., Pinto, D., Montes, A., González, G.: An approach based in LSA for evaluation of ontological relations on domain corpora. In: Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Olvera-López, J.A. (eds.) MCPR 2017. LNCS, vol. 10267, pp. 225–233. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59226-8_22

    Chapter  Google Scholar 

  18. Wei, W., Guo, C.: A text semantic topic discovery method based on the conditional co-occurrence degree. Neurocomputing 368, 11–24 (2019)

    Article  Google Scholar 

  19. Yu, J., Lu, Y., Muñoz-Justicia, J.: Analyzing Spanish news frames on twitter during Covid-19–a network study of el país and el Mundo. International journal of environmental research and public health 17(15), 5414 (2020)

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Universidad Autónoma Metropolitana, Azcapotzalco. The present work has been funded by the research project SI001-20 at UAM Azcapotzalco, partly supported by project VIEP 2021 at BUAP and by the Consejo Nacional de Ciencia y TecnologíÃÃ’¯Â¿Â½a (CONACYT) with the scholarship number 788155.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ana Laura Lezama Sánchez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sánchez, A.L.L., Vidal, M.T., Reyes-Ortiz, J.A. (2021). Hypernyms-Based Topic Discovery Using LDA. In: Batyrshin, I., Gelbukh, A., Sidorov, G. (eds) Advances in Soft Computing. MICAI 2021. Lecture Notes in Computer Science(), vol 13068. Springer, Cham. https://doi.org/10.1007/978-3-030-89820-5_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-89820-5_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-89819-9

  • Online ISBN: 978-3-030-89820-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics