ABSTRACT
In order to solve the problems of low efficiency of key information extraction, single type of data extracted by model, and poor data correlation in multi-source data generated in the process of COVID-19 transmission, this paper proposed a multi-source COVID-19 data extraction and analysis method. Methods The data of epidemic transmission in Yangzhou, Shijiazhuang and Chongqing were taken as examples. The basic information of patient flow was extracted by Chinese natural language processing technology, and the complex activity tracks of patients were extracted based on Bert-BilSTM-CRF model, and then the city-scale COVID-19 risk index was constructed by combining the basic information and behavior tracks of patients. Compared with HMM, CRF, BiLSTM and BILSTM-CRF, Bert-BILSTM-CRF model has the highest accuracy of 96.08%. Pearson correlation coefficient and Cox proportional hazards model analysis showed that the number of people over 59 years old, the number of floating population, the number of total population and the number of medical institutions were significantly correlated with the spread of urban epidemic. The regions with high epidemic risk index showed the characteristics of large population mobility and relatively dense population. The unstructured and semi-structured multi-source COVID-19 data were extracted, and their correlation with the spread of the epidemic was analyzed. Finally, the reasons for the impact of risk factors on the epidemic risk level and the risk level of each city were analyzed. The results can provide data extraction analysis and reference for relevant departments of epidemic prevention and control decision-making and similar research.
- HONG D D, XU T T, WANG J B, Spatio-temporal pattern evolution and influencing factors of COVID-19 epidemic in Anhui Province[J].Journal of Northwest University(Natural Science Edition), 2021, 51(02):220-223.Google Scholar
- ZHANG G Y, GONG J H, SUN J, An Interactive Individual Spatiotemporal Trajectory Extraction and Quality Evaluation Method for COVID-19 Cases [J]. Geomatics and Information Science of Wuhan University, 2021, 46(02):177-183.Google Scholar
- The Apache Software Foundation. The apache OpenNLP li-brary is a machine learning based toolkit for the processing of natural language text [EB/OL]. 2018, 05, 01, 2019, 11, 21. https://opennlp.apache.org/Google Scholar
- NLTK Project. Some simple things you can do with NLTK[EB/OL]. 2019, 09, 04, 2019, 11, 10. https:// www.nltk.org/.Google Scholar
- A toolkit for chinese natural language pro-cessing [EB/OL]. 2020, 03, 21. https://github.com/FudanNLP/fnlp.Google Scholar
- LUO R, XU J,ZHANG Y, Pkuseg: A toolkit for mul-ti-domain chinese word segmentation[J].arXiv preprint arXiv:1906.11455, 2019.Google Scholar
- WU G H, ZHANG X Y, YE P, A Chinese Address Res-olution Method Based on BERT-BiLSTM-CRF [J]. Geogra-phy and Geo-information Science, 2021, 37(04):10-15Google Scholar
- ZHANG S H, DU S D, JIA Z, Medical Entity Relation Extraction Based on Deep Neural Network and Self-attention Mechanism [J]. Computer Science, 2021, 48(10):77-84Google Scholar
- WEI Y Y, JIANG N, CHENG Y H, Modeling and Ap-plication of Epidemic Risk Assessment Considering Spatial Interaction of Spatial-temporal Objects [J]. Journal of Geo-Information Science, 2021, 23(03):274-283Google Scholar
- YE Y Y, WANG C J, ZHANG H O, Spatio-temporal analysis of COVID-19 epidemic risk in Guangdong Province based on population migration [J] Acta Geographica Sini-ca, 2020, 75(11): 2521-2534Google Scholar
- BI J, WANG X M, HU Y Z, Emergencies based on an lmproved SEIR Model: NovelCoronavirus COVID-19 in Ten European Countries [J]. Journal of Geo-Information Sci-ence, 2021, 23(02):259-273.Google Scholar
- LIU Y, YANG D Y, DONG G P, The Spatio-Temporal Spread Characteristics of 2019 Novel Coronavirus Pneumonia and Risk Assessment Based on Population Movement in He-nan Province: Analysis of 1 243 Individual Case Re-ports [J]. Economic Geogra-phy, 2020, 40(03):24-32.Google Scholar
- LIU Y X, SONG C, LIU Q Y, Spatial-temporal Charac-teristics of COVID-19 in Chongqing and Its Relationship with Human Mobility [J]. Journal of Geo-Information Science, 2021, 23(02):222-235.Google Scholar
- ZHANG F, FLEYEH H, WANG X, Construction site accident analysis using text mining and natural language pro-cessing techniques [J]. Automation in Construction, 2019, 99: 238-248.Google ScholarCross Ref
- YANG C J, SHAO Y B,SUN J, Syntactic Boundary Analysis Based on BiLSTM+CRF and TextRank [J/OL]. Journal of Chinese Computer SystemsZ: 1-9, 2022, 06, 28. http://kns.cnki.net/kcms/detail/21.1106.TP.20210622.1831.032.htmlGoogle Scholar
- LUO L, YANG Z, YANG P, An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition[J].Bioinformatics, 2018, 34(8):1381-1388.Google ScholarCross Ref
- IA C, SHI Y, YANG Q, Entity enhanced BERT pre-training for Chinese NER[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Pro-cessing (EMNLP). 2020: 6384-6396.Google Scholar
- JIA Y M, YUAN Y, JIA S, Risk Analysis of COVID-19 Based on Population Flow [j]. Bulletin of National Natural Science Foundation of China,2020, 34(06): 667-674.Google Scholar
- FEI T, WANG X, SONG C, Review on Spatiotemporal Analysis and Modeling of COVID-19 Pandemic Journal of Geo-Information Science, 2021, 23(2):23.Google Scholar
- RASHEDI J, MAHDAVI P B, ASGHARZADEH V, Risk factors for COVID-19 [J]. Infez Med, 2020, 28(4): 469-474.Google Scholar
- MIHALCEA R, TARAU P. Textrank: Bringing order into text[C]//Proceedings of the 2004 conference on empirical methods in natural language processing. 2004: 404-411.Google Scholar
- LI H, GUO L, SUN J Z. The Values of COX Regression Model in the Clinical Medicine Research [J]. The Journal of Evidence-Based Medicine, 2011,11(1):4Google Scholar
- ISSA U H, BALABEL A, ABDELHAKEEM M, Developing a Risk Model for Assessment and Control of the Spread of COVID-19 [J]. Risks, 2021, 9(2):38.Google ScholarCross Ref
- ZHANG Y, YANG J. Chinese NER Using Lattice LSTM[C]// The 56th Annual Meeting of the Association for Computational Linguistics (ACL). 2018.Google Scholar
- ZHANG C, FU H. City's Network Position and Epidemic Risk [J]. Statistical Research, 2021, 38(08):111-120.Google Scholar
- PLUCHINO A, BIONDO A E, GIUFFRIDA N, A novel methodology for epidemic risk assessment of COVID-19 outbreak[J].Scientific Reports. 2021, 11(1):1-20.Google Scholar
Index Terms
- Extraction and Analysis of Multi-Source COVID-19 Data based on Deep Learning
Recommendations
Understanding and Analyzing COVID-19-related Online Hate Propagation Through Hateful Memes Shared on Twitter
ASONAM '23: Proceedings of the 2023 IEEE/ACM International Conference on Advances in Social Networks Analysis and MiningRecent studies regarding the COVID-19 pandemic have revealed the widespread propagation of hateful content during this period. While significant research has focused on COVID-19-related online hate in text (e.g., text-based tweets), the role of memes in ...
Analysis of COVID-19 5G Conspiracy Theory Tweets Using SentenceBERT Embedding
Artificial Neural Networks and Machine Learning – ICANN 2022AbstractTwitter is a popular major social media platform with a central role in the distribution of information, and as such a fertile land for the growth of conspiracy theories in different subjects, with COVID-19 conspiracies among them. In this ...
Are Mutated Misinformation More Contagious? A Case Study of COVID-19 Misinformation on Twitter
WebSci '22: Proceedings of the 14th ACM Web Science Conference 2022The spread of online misinformation has become a major global risk. Understanding how misinformation propagates on social media is vital. While prior studies suggest that the content factors, such as emotion and topic in texts, are closely related to the ...
Comments