skip to main content
10.1145/3653081.3653109acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiotaaiConference Proceedingsconference-collections
research-article

Extraction and Analysis of Multi-Source COVID-19 Data based on Deep Learning

Published:03 May 2024Publication History

ABSTRACT

In order to solve the problems of low efficiency of key information extraction, single type of data extracted by model, and poor data correlation in multi-source data generated in the process of COVID-19 transmission, this paper proposed a multi-source COVID-19 data extraction and analysis method. Methods The data of epidemic transmission in Yangzhou, Shijiazhuang and Chongqing were taken as examples. The basic information of patient flow was extracted by Chinese natural language processing technology, and the complex activity tracks of patients were extracted based on Bert-BilSTM-CRF model, and then the city-scale COVID-19 risk index was constructed by combining the basic information and behavior tracks of patients. Compared with HMM, CRF, BiLSTM and BILSTM-CRF, Bert-BILSTM-CRF model has the highest accuracy of 96.08%. Pearson correlation coefficient and Cox proportional hazards model analysis showed that the number of people over 59 years old, the number of floating population, the number of total population and the number of medical institutions were significantly correlated with the spread of urban epidemic. The regions with high epidemic risk index showed the characteristics of large population mobility and relatively dense population. The unstructured and semi-structured multi-source COVID-19 data were extracted, and their correlation with the spread of the epidemic was analyzed. Finally, the reasons for the impact of risk factors on the epidemic risk level and the risk level of each city were analyzed. The results can provide data extraction analysis and reference for relevant departments of epidemic prevention and control decision-making and similar research.

References

  1. HONG D D, XU T T, WANG J B, Spatio-temporal pattern evolution and influencing factors of COVID-19 epidemic in Anhui Province[J].Journal of Northwest University(Natural Science Edition), 2021, 51(02):220-223.Google ScholarGoogle Scholar
  2. ZHANG G Y, GONG J H, SUN J, An Interactive Individual Spatiotemporal Trajectory Extraction and Quality Evaluation Method for COVID-19 Cases [J]. Geomatics and Information Science of Wuhan University, 2021, 46(02):177-183.Google ScholarGoogle Scholar
  3. The Apache Software Foundation. The apache OpenNLP li-brary is a machine learning based toolkit for the processing of natural language text [EB/OL]. 2018, 05, 01, 2019, 11, 21. https://opennlp.apache.org/Google ScholarGoogle Scholar
  4. NLTK Project. Some simple things you can do with NLTK[EB/OL]. 2019, 09, 04, 2019, 11, 10. https:// www.nltk.org/.Google ScholarGoogle Scholar
  5. A toolkit for chinese natural language pro-cessing [EB/OL]. 2020, 03, 21. https://github.com/FudanNLP/fnlp.Google ScholarGoogle Scholar
  6. LUO R, XU J,ZHANG Y, Pkuseg: A toolkit for mul-ti-domain chinese word segmentation[J].arXiv preprint arXiv:1906.11455, 2019.Google ScholarGoogle Scholar
  7. WU G H, ZHANG X Y, YE P, A Chinese Address Res-olution Method Based on BERT-BiLSTM-CRF [J]. Geogra-phy and Geo-information Science, 2021, 37(04):10-15Google ScholarGoogle Scholar
  8. ZHANG S H, DU S D, JIA Z, Medical Entity Relation Extraction Based on Deep Neural Network and Self-attention Mechanism [J]. Computer Science, 2021, 48(10):77-84Google ScholarGoogle Scholar
  9. WEI Y Y, JIANG N, CHENG Y H, Modeling and Ap-plication of Epidemic Risk Assessment Considering Spatial Interaction of Spatial-temporal Objects [J]. Journal of Geo-Information Science, 2021, 23(03):274-283Google ScholarGoogle Scholar
  10. YE Y Y, WANG C J, ZHANG H O, Spatio-temporal analysis of COVID-19 epidemic risk in Guangdong Province based on population migration [J] Acta Geographica Sini-ca, 2020, 75(11): 2521-2534Google ScholarGoogle Scholar
  11. BI J, WANG X M, HU Y Z, Emergencies based on an lmproved SEIR Model: NovelCoronavirus COVID-19 in Ten European Countries [J]. Journal of Geo-Information Sci-ence, 2021, 23(02):259-273.Google ScholarGoogle Scholar
  12. LIU Y, YANG D Y, DONG G P, The Spatio-Temporal Spread Characteristics of 2019 Novel Coronavirus Pneumonia and Risk Assessment Based on Population Movement in He-nan Province: Analysis of 1 243 Individual Case Re-ports [J]. Economic Geogra-phy, 2020, 40(03):24-32.Google ScholarGoogle Scholar
  13. LIU Y X, SONG C, LIU Q Y, Spatial-temporal Charac-teristics of COVID-19 in Chongqing and Its Relationship with Human Mobility [J]. Journal of Geo-Information Science, 2021, 23(02):222-235.Google ScholarGoogle Scholar
  14. ZHANG F, FLEYEH H, WANG X, Construction site accident analysis using text mining and natural language pro-cessing techniques [J]. Automation in Construction, 2019, 99: 238-248.Google ScholarGoogle ScholarCross RefCross Ref
  15. YANG C J, SHAO Y B,SUN J, Syntactic Boundary Analysis Based on BiLSTM+CRF and TextRank [J/OL]. Journal of Chinese Computer SystemsZ: 1-9, 2022, 06, 28. http://kns.cnki.net/kcms/detail/21.1106.TP.20210622.1831.032.htmlGoogle ScholarGoogle Scholar
  16. LUO L, YANG Z, YANG P, An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition[J].Bioinformatics, 2018, 34(8):1381-1388.Google ScholarGoogle ScholarCross RefCross Ref
  17. IA C, SHI Y, YANG Q, Entity enhanced BERT pre-training for Chinese NER[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Pro-cessing (EMNLP). 2020: 6384-6396.Google ScholarGoogle Scholar
  18. JIA Y M, YUAN Y, JIA S, Risk Analysis of COVID-19 Based on Population Flow [j]. Bulletin of National Natural Science Foundation of China,2020, 34(06): 667-674.Google ScholarGoogle Scholar
  19. FEI T, WANG X, SONG C, Review on Spatiotemporal Analysis and Modeling of COVID-19 Pandemic Journal of Geo-Information Science, 2021, 23(2):23.Google ScholarGoogle Scholar
  20. RASHEDI J, MAHDAVI P B, ASGHARZADEH V, Risk factors for COVID-19 [J]. Infez Med, 2020, 28(4): 469-474.Google ScholarGoogle Scholar
  21. MIHALCEA R, TARAU P. Textrank: Bringing order into text[C]//Proceedings of the 2004 conference on empirical methods in natural language processing. 2004: 404-411.Google ScholarGoogle Scholar
  22. LI H, GUO L, SUN J Z. The Values of COX Regression Model in the Clinical Medicine Research [J]. The Journal of Evidence-Based Medicine, 2011,11(1):4Google ScholarGoogle Scholar
  23. ISSA U H, BALABEL A, ABDELHAKEEM M, Developing a Risk Model for Assessment and Control of the Spread of COVID-19 [J]. Risks, 2021, 9(2):38.Google ScholarGoogle ScholarCross RefCross Ref
  24. ZHANG Y, YANG J. Chinese NER Using Lattice LSTM[C]// The 56th Annual Meeting of the Association for Computational Linguistics (ACL). 2018.Google ScholarGoogle Scholar
  25. ZHANG C, FU H. City's Network Position and Epidemic Risk [J]. Statistical Research, 2021, 38(08):111-120.Google ScholarGoogle Scholar
  26. PLUCHINO A, BIONDO A E, GIUFFRIDA N, A novel methodology for epidemic risk assessment of COVID-19 outbreak[J].Scientific Reports. 2021, 11(1):1-20.Google ScholarGoogle Scholar

Index Terms

  1. Extraction and Analysis of Multi-Source COVID-19 Data based on Deep Learning

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      IoTAAI '23: Proceedings of the 2023 5th International Conference on Internet of Things, Automation and Artificial Intelligence
      November 2023
      902 pages
      ISBN:9798400716485
      DOI:10.1145/3653081

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 May 2024

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited
    • Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)2

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format