Abstract
In this work, we propose an article-oriented framework for automatically extract semantic topics from scientific articles related to COVID-19 researches. Our framework has four key building blocks (i) pre-processing, (ii) topic modeling, (iii) correlating topics, authors, and institutions, and (iv) a summarization interface. The first one corresponds to apply traditional textual pre-processing strategies in the texts extracted from articles and constructing their data representations. The topic modeling block aims at finding semantic topics from articles based on the most relevant words of each discovered topic. The third block correlates these discovered topics with the articles, authors (researchers), and institutions. The summarization interface provides an intuitive visualization for these results. Our evaluation shows that our framework can automatically extract relevant features from the articles, identifying the key topics covered by them, as well as the contribution of researchers, institutions, and countries to the topics. Our framework can help research institutions and companies to form multidisciplinary teams and funding agencies to identify more promising research approaches regarding COVID-19.
Supported by CAPES, CNPq, Finep, Fapesp and Fapemig.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
username: user, password: covid-19.
- 3.
- 4.
- 5.
labpi.ufsj.edu.br/covpapers/
username: user
password: covid-19.
- 6.
References
Ai, A.I.F.: Covid-19 open research dataset challenge (cord-19) (2020)
Auden, W.H.: The Complete Works of WH Auden: Prose, vol. 2. Princeton University Press, Princeton, April 2002
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(null), 993–1022 (2003)
Carron-Arthur, B., Reynolds, J., Bennett, K., Bennett, A., Griffiths, K.M.: What’s all the talk about? topic modelling in a mental health internet support group. BMC Psychiatry 16(1), 367 (2016)
Cunha, W., et al.: Extended pre-processing pipeline for text classification: on the role of meta-feature representations, sparsification and selective sampling. Inform. Process. Manag. 57(4), 102263 (2020)
Deerwester, S., Dumais, S., Landauer, T., Furnas, G., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41, 391–407 (1990)
Dutta, P.S., Tahbilder, H., et al.: Prediction of rainfall using data mining technique over Assam. Indian J. Comput. Sci. Eng. (IJCSE) 5, 85–90 (2014)
Fathalla, S., Vahdati, S., Auer, S., Lange, C.: Metadata analysis of scholarly events of computer science, physics, engineering, and mathematics. In: TPDL (2018)
Gerlach, M., Shi, H., Amaral, L.A.N.: A universal information theoretic approach to the identification of stopwords. Nat. Mach. Intell. 1(12), 606–612 (2019)
Ghosh, S., et al.: Temporal topic modeling to assess associations between news trends and infectious disease outbreaks. Sci. Rep. 7, 40841 (2017)
Greene, D., Cross, J.P.: Exploring the political agenda of the european parliament using a dynamic topic modeling approach (2016)
Griffiths, T.L., Steyvers, M.: A probabilistic approach to semantic representation. In: Proceedings of the Annual Meeting of the Cognitive Science Society, vol. 24 (2002)
Gudivada, V.N., Arbabifard, K.: Chapter 3 - open-source libraries, application frameworks, and workflow systems for nlp. In: Gudivada, V.N., Rao, C. (eds) Computational Analysis and Understanding of Natural Languages: Principles, Methods and Applications, volume 38 of Handbook of Statistics, pp. 31–50. Elsevier (2018)
Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 289–296. Morgan Kaufmann Publishers Inc. (1999)
Huang, C., Wang, Q., Yang, D., Xu, F.: Topic mining of tourist attractions based on a seasonal context aware lda model. Intell. Data Anal. 22, 383–405 (2018)
Jacobi, C., van Atteveldt, W., Welbers, K.: Quantitative analysis of large amounts of journalistic texts using topic modelling. Digit. Journalism 4(1), 89–106 (2016)
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)
Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Proceedings of the 13th International Conference on Neural Information Processing Systems, NIPS’00, pp. 535–541. MIT Press, Cambridge, MA, USA (2000)
Luiz, W., et al.: A feature-oriented sentiment rating for mobile app reviews. In: Proceedings of the 2018 World Wide Web Conference, WWW 2018, pp. 1909–1918. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE (2018)
Luo, J.M., Vu, H.Q., Li, G., Law, R.: Topic modelling for theme park online reviews: analysis of disneyland. J. Travel Tourism Market. 37(2), 272–285 (2020)
Mallapaty, S.: Meet the scientists investigating the origins of the covid pandemic (2020)
Marr, B.: 20 fatos sobre a internet que você (provavelmente) não sabe (2015)
Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., Joulin, A.: Advances in pre-training distributed word representations. CoRR, abs/1712.09405 (2017)
Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., Joulin, A.: Advances in pre-training distributed word representations. In: LREC 2018 (2018)
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticæ Investigationes 30(1), 3–26 (2007)
Nikolenko, S.I., Koltcov, S., Koltsova, O.: Topic modelling for qualitative studies. J. Inf. Sci. 43(1), 88–102 (2017)
Pennington, J., Socher, R., Manning, V.: Glove: global vectors for word representation. In: EMNLP (2014)
Qaiser, S., Ali, R.: Text mining: use of tf-idf to examine the relevance of words to documents. Int. J. Comput. Appl. 181(1), 25–29 (2018)
Sundar, N.A., Latha, P.P., Chandra, M.R.: Performance analysis of classification data mining techniques over heart disease database, vol. 2, pp. 470–478. Citeseer (2012)
Uysal, A.K., Gunal, S.: The impact of preprocessing on text classification. Inform. Process. Manag. 50(1), 04–112 (2014)
van Altena, A.J., Moerland, P., Zwinderman, A., Olabarriaga, S.: Understanding big data themes from scientific biomedical literature through topic modeling. J. Big Data 3, 1–21 (2016)
Viegas, F., et al.: Cluwords: exploiting semantic word clustering representation for enhanced topic modeling, pp. 753–761 (2019)
Viegas, F., Cunha, W., Gomes, C., Pereira, A., Rocha, L., Goncalves, M.: CluHTM - semantic hierarchical topic modeling based on CluWords. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 8138–8150. Association for Computational Linguistics, Online, July 2020
Viegas, F., et al.: Semantically-enhanced topic modeling. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, CIKM 2018, pp. 893–902. Association for Computing Machinery, New York, NY, USA (2018)
Vijayarani, S., Ilamathi, M.J., Nithya, M.: Preprocessing techniques for text mining-an overview. Int. J. Comput. Sci. Commun. Netw. 5, 7–16 (2015)
Yethiraj, N.G.: Applying data mining techniques in the field of agriculture and allied sciences. Int. J. Bus. 001, 40–42 (2012)
You, J.: Just how big is google scholar? ummm..., (2014)
Acknowledgements
This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001 and National Council for Scientific and Technological Development – CNPq.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Pedro, A. et al. (2021). An Article-Oriented Framework for Automatic Semantic Analysis of COVID-19 Researches. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2021. ICCSA 2021. Lecture Notes in Computer Science(), vol 12951. Springer, Cham. https://doi.org/10.1007/978-3-030-86970-0_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-86970-0_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86969-4
Online ISBN: 978-3-030-86970-0
eBook Packages: Computer ScienceComputer Science (R0)