Skip to main content

An Article-Oriented Framework for Automatic Semantic Analysis of COVID-19 Researches

  • Conference paper
  • First Online:
Computational Science and Its Applications – ICCSA 2021 (ICCSA 2021)

Abstract

In this work, we propose an article-oriented framework for automatically extract semantic topics from scientific articles related to COVID-19 researches. Our framework has four key building blocks (i) pre-processing, (ii) topic modeling, (iii) correlating topics, authors, and institutions, and (iv) a summarization interface. The first one corresponds to apply traditional textual pre-processing strategies in the texts extracted from articles and constructing their data representations. The topic modeling block aims at finding semantic topics from articles based on the most relevant words of each discovered topic. The third block correlates these discovered topics with the articles, authors (researchers), and institutions. The summarization interface provides an intuitive visualization for these results. Our evaluation shows that our framework can automatically extract relevant features from the articles, identifying the key topics covered by them, as well as the contribution of researchers, institutions, and countries to the topics. Our framework can help research institutions and companies to form multidisciplinary teams and funding agencies to identify more promising research approaches regarding COVID-19.

Supported by CAPES, CNPq, Finep, Fapesp and Fapemig.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://scholar.google.com.

  2. 2.

    username: user, password: covid-19.

  3. 3.

    https://github.com/explosion/spaCy/blob/master/spacy/lang/en/stop_words.py.

  4. 4.

    https://www.nltk.org/api/nltk.tokenize.html#module-nltk.tokenize.casual.

  5. 5.

    labpi.ufsj.edu.br/covpapers/

    username: user

    password: covid-19.

  6. 6.

    https://www.erasmusmc.nl/en/research/researchers/koopmans-marion.

References

  1. Ai, A.I.F.: Covid-19 open research dataset challenge (cord-19) (2020)

    Google Scholar 

  2. Auden, W.H.: The Complete Works of WH Auden: Prose, vol. 2. Princeton University Press, Princeton, April 2002

    Google Scholar 

  3. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(null), 993–1022 (2003)

    Google Scholar 

  4. Carron-Arthur, B., Reynolds, J., Bennett, K., Bennett, A., Griffiths, K.M.: What’s all the talk about? topic modelling in a mental health internet support group. BMC Psychiatry 16(1), 367 (2016)

    Article  Google Scholar 

  5. Cunha, W., et al.: Extended pre-processing pipeline for text classification: on the role of meta-feature representations, sparsification and selective sampling. Inform. Process. Manag. 57(4), 102263 (2020)

    Google Scholar 

  6. Deerwester, S., Dumais, S., Landauer, T., Furnas, G., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41, 391–407 (1990)

    Article  Google Scholar 

  7. Dutta, P.S., Tahbilder, H., et al.: Prediction of rainfall using data mining technique over Assam. Indian J. Comput. Sci. Eng. (IJCSE) 5, 85–90 (2014)

    Google Scholar 

  8. Fathalla, S., Vahdati, S., Auer, S., Lange, C.: Metadata analysis of scholarly events of computer science, physics, engineering, and mathematics. In: TPDL (2018)

    Google Scholar 

  9. Gerlach, M., Shi, H., Amaral, L.A.N.: A universal information theoretic approach to the identification of stopwords. Nat. Mach. Intell. 1(12), 606–612 (2019)

    Article  Google Scholar 

  10. Ghosh, S., et al.: Temporal topic modeling to assess associations between news trends and infectious disease outbreaks. Sci. Rep. 7, 40841 (2017)

    Article  Google Scholar 

  11. Greene, D., Cross, J.P.: Exploring the political agenda of the european parliament using a dynamic topic modeling approach (2016)

    Google Scholar 

  12. Griffiths, T.L., Steyvers, M.: A probabilistic approach to semantic representation. In: Proceedings of the Annual Meeting of the Cognitive Science Society, vol. 24 (2002)

    Google Scholar 

  13. Gudivada, V.N., Arbabifard, K.: Chapter 3 - open-source libraries, application frameworks, and workflow systems for nlp. In: Gudivada, V.N., Rao, C. (eds) Computational Analysis and Understanding of Natural Languages: Principles, Methods and Applications, volume 38 of Handbook of Statistics, pp. 31–50. Elsevier (2018)

    Google Scholar 

  14. Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 289–296. Morgan Kaufmann Publishers Inc. (1999)

    Google Scholar 

  15. Huang, C., Wang, Q., Yang, D., Xu, F.: Topic mining of tourist attractions based on a seasonal context aware lda model. Intell. Data Anal. 22, 383–405 (2018)

    Article  Google Scholar 

  16. Jacobi, C., van Atteveldt, W., Welbers, K.: Quantitative analysis of large amounts of journalistic texts using topic modelling. Digit. Journalism 4(1), 89–106 (2016)

    Article  Google Scholar 

  17. Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)

    Article  Google Scholar 

  18. Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Proceedings of the 13th International Conference on Neural Information Processing Systems, NIPS’00, pp. 535–541. MIT Press, Cambridge, MA, USA (2000)

    Google Scholar 

  19. Luiz, W., et al.: A feature-oriented sentiment rating for mobile app reviews. In: Proceedings of the 2018 World Wide Web Conference, WWW 2018, pp. 1909–1918. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE (2018)

    Google Scholar 

  20. Luo, J.M., Vu, H.Q., Li, G., Law, R.: Topic modelling for theme park online reviews: analysis of disneyland. J. Travel Tourism Market. 37(2), 272–285 (2020)

    Article  Google Scholar 

  21. Mallapaty, S.: Meet the scientists investigating the origins of the covid pandemic (2020)

    Google Scholar 

  22. Marr, B.: 20 fatos sobre a internet que você (provavelmente) não sabe (2015)

    Google Scholar 

  23. Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., Joulin, A.: Advances in pre-training distributed word representations. CoRR, abs/1712.09405 (2017)

    Google Scholar 

  24. Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., Joulin, A.: Advances in pre-training distributed word representations. In: LREC 2018 (2018)

    Google Scholar 

  25. Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticæ Investigationes 30(1), 3–26 (2007)

    Article  Google Scholar 

  26. Nikolenko, S.I., Koltcov, S., Koltsova, O.: Topic modelling for qualitative studies. J. Inf. Sci. 43(1), 88–102 (2017)

    Article  Google Scholar 

  27. Pennington, J., Socher, R., Manning, V.: Glove: global vectors for word representation. In: EMNLP (2014)

    Google Scholar 

  28. Qaiser, S., Ali, R.: Text mining: use of tf-idf to examine the relevance of words to documents. Int. J. Comput. Appl. 181(1), 25–29 (2018)

    Google Scholar 

  29. Sundar, N.A., Latha, P.P., Chandra, M.R.: Performance analysis of classification data mining techniques over heart disease database, vol. 2, pp. 470–478. Citeseer (2012)

    Google Scholar 

  30. Uysal, A.K., Gunal, S.: The impact of preprocessing on text classification. Inform. Process. Manag. 50(1), 04–112 (2014)

    Article  Google Scholar 

  31. van Altena, A.J., Moerland, P., Zwinderman, A., Olabarriaga, S.: Understanding big data themes from scientific biomedical literature through topic modeling. J. Big Data 3, 1–21 (2016)

    Google Scholar 

  32. Viegas, F., et al.: Cluwords: exploiting semantic word clustering representation for enhanced topic modeling, pp. 753–761 (2019)

    Google Scholar 

  33. Viegas, F., Cunha, W., Gomes, C., Pereira, A., Rocha, L., Goncalves, M.: CluHTM - semantic hierarchical topic modeling based on CluWords. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 8138–8150. Association for Computational Linguistics, Online, July 2020

    Google Scholar 

  34. Viegas, F., et al.: Semantically-enhanced topic modeling. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, CIKM 2018, pp. 893–902. Association for Computing Machinery, New York, NY, USA (2018)

    Google Scholar 

  35. Vijayarani, S., Ilamathi, M.J., Nithya, M.: Preprocessing techniques for text mining-an overview. Int. J. Comput. Sci. Commun. Netw. 5, 7–16 (2015)

    Google Scholar 

  36. Yethiraj, N.G.: Applying data mining techniques in the field of agriculture and allied sciences. Int. J. Bus. 001, 40–42 (2012)

    Google Scholar 

  37. You, J.: Just how big is google scholar? ummm..., (2014)

    Google Scholar 

Download references

Acknowledgements

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001 and National Council for Scientific and Technological Development – CNPq.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antonio Pedro .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pedro, A. et al. (2021). An Article-Oriented Framework for Automatic Semantic Analysis of COVID-19 Researches. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2021. ICCSA 2021. Lecture Notes in Computer Science(), vol 12951. Springer, Cham. https://doi.org/10.1007/978-3-030-86970-0_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86970-0_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86969-4

  • Online ISBN: 978-3-030-86970-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics