Abstract
Analysis of real-world clinical data, which includes multiple heterogenous categories of attributes, requires a well-designed data-mining process to obtain meaningful information. We explored the application of several data-mining methods on a real-world dataset of patients diagnosed with COVID-19, with emphasis on cohort selection with maximum data availability, and feature selection based on their correlation with clinical data. Two data mining platforms (Orange data-mining platform coupled with LRNet method, and Waikato environment for knowledge analysis) were used for finding important attributes associated with post COVID-19 symptoms from the multiple modalities of this cohort and evaluating the ability of these attributes to separate patients into clusters. We introduced a dynamic method of inclusion and exclusion, as well as outlier selection, which maximized the knowledge extracted from this real-world dataset. We also demonstrated that a comprehensive first-view of this dataset was only possible by applying multiple methods for dimensionality reduction and feature selection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Can be accessed at: https://homel.vsb.cz/~kud007/lrnet_files (last accessed 23 June 2023).
References
Chen, J., et al.: The current landscape in biostatistics of real-world data and evidence: clinical study design and analysis. Stat. Biopharm. Res. 15(1), 29–42 (2023). https://doi.org/10.1080/19466315.2021.1883474
Golestan Hashemi, F.S., et al.: Intelligent mining of large-scale bio-data: Bioinformatics applications. Biotechnol. Biotechnol. Equipment 32(1), 10–29 (2017). https://doi.org/10.1080/13102818.2017.1364977
Yap, T.A., Jacobs, I., Baumfeld Andre, E., Lee, L.J., Beaupre, D., Azoulay, L.: Application of real-world data to external control groups in oncology clinical trial drug development. Fron. Oncol. 11, 695936 (2022). https://doi.org/10.3389/fonc.2021.695936
Zou, K.H., et al.: Harnessing real-world data for regulatory use and applying innovative applications. J. Multidisc. Healthc. 13, 671–679 (2020). https://doi.org/10.2147/JMDH.S262776
Chatterjee, S., Davies, M.J., Khunti, K.: What have we learnt from ‘real world’ data, observational studies and meta-analyses. Diabetes Obes. Metab. 20, 47–58 (2018). https://doi.org/10.1111/dom.13178
Lipkova, J., et al.: Artificial intelligence for multimodal data integration in oncology. Cancer Cell 40(10), 1095–1110 (2022). https://doi.org/10.1016/j.ccell.2022.09.012
Torab-Miandoab, A., Samad-Soltani, T., Jodati, A., Rezaei-Hachesu, P.: Interoperability of heterogeneous health information systems: a systematic literature review. BMC Med. Inform. Decis. Mak. 23(1), 18 (2023). https://doi.org/10.1186/s12911-023-02115-5
Wu, W.-T., et al.: Data mining in clinical big data: the frequently used databases, steps, and methodological models. Military Med. Res. 8(1), 44 (2021). https://doi.org/10.1186/s40779-021-00338-z
Meng, C., Trinh, L., Xu, N., Enouen, J., Liu, Y.: Interpretability and fairness evaluation of deep learning models on MIMIC-IV dataset. Sci. Rep. 12(1), 7166 (2022). https://doi.org/10.1038/s41598-022-11012-2
Choi, J.-H., Lee, J.-S.: EmbraceNet: a robust deep learning architecture for multimodal classification. Inform. Fusion 51, 259–270 (2019). https://doi.org/10.1016/j.inffus.2019.02.010
Liu, Y., Liu, L., Guo, Y., Lew, M.S.: Learning visual and textual representations for multimodal matching and classification. Pattern Recogn. 84, 51–67 (2018). https://doi.org/10.1016/j.patcog.2018.07.001
Ramachandram, D., Taylor, G.W.: Deep multimodal learning: a survey on recent advances and trends. IEEE Signal Process. Mag. 34(6), 96–108 (2017). https://doi.org/10.1109/MSP.2017.2738401
Liu, Z., et al.: Multi-omics integration reveals only minor long-term molecular and functional sequelae in immune cells of individuals recovered from COVID-19. Front. Immunol. 13, 838132 (2022). https://doi.org/10.3389/fimmu.2022.838132
Caruana, E.J., Roman, M., Hernández-Sánchez, J., Solli, P.: Longitudinal Studies. J. Thorac. Dis.\ 7(11), E537–E540 (2015). https://doi.org/10.3978/j.issn.2072-1439.2015.10.63
Bartlett, V.L., Dhruva, S.S., Shah, N.D., Ryan, P., Ross, J.S.: Feasibility of using real-world data to replicate clinical trial evidence. JAMA Netw. Open 2(10), e1912869 (2019). https://doi.org/10.1001/jamanetworkopen.2019.12869
Mehandru, S., Merad, M.: Pathological sequelae of long-haul COVID. Nat. Immunol. 23(2), 194–202 (2022). https://doi.org/10.1038/s41590-021-01104-y
Han, Q., Zheng, B., Daines, L., Sheikh, A.: Long-term sequelae of COVID-19: a systematic review and meta-analysis of one-year follow-up studies on Post-COVID symptoms. Pathogens 11(2), 269 (2022). https://doi.org/10.3390/pathogens11020269
Ruggiero, V., Aquino, R.P., Del Gaudio, P., Campiglia, P., Russo, P.: Post-COVID syndrome: the research progress in the treatment of pulmonary sequelae after COVID-19 Infection. Pharmaceutics 14(6), 1135 (2022). https://doi.org/10.3390/pharmaceutics14061135
Davido, B., Seang, S., Tubiana, R., De Truchis, P.: Post–COVID-19 chronic symptoms: a postinfectious entity? Clin. Microbiol. Infect. 26(11), 1448–1449 (2020). https://doi.org/10.1016/j.cmi.2020.07.028
Al-Aly, Z., Xie, Y.: High-dimensional characterization of post-acute sequelae of COVID-19. Nature 594(7862), 259–264 (2021). https://doi.org/10.1038/s41586-021-03553-9
Torres-Ruiz, J., et al.: Novel clinical and immunological features associated with persistent post-acute sequelae of COVID-19 after six months of follow-up: a pilot study. Infect. Dis. 55(4), 243–254 (2023). https://doi.org/10.1080/23744235.2022.2158217
Stajdohar, M., Demsar, J.: Interactive network exploration with orange. J. Stat. Soft. 53(6), 1–24 (2013). https://doi.org/10.18637/jss.v053.i06
Stekhoven, D.J., Buhlmann, P.: MissForest–non-parametric missing value imputation for mixed-type data. Bioinformatics 28(1), 112–118 (2012). https://doi.org/10.1093/bioinformatics/btr597
Hong, S., Lynn, H.S.: Accuracy of random-forest-based imputation of missing data in the presence of non-normality, non-linearity, and interaction. BMC Med. Res. Methodol. 20(1), 199 (2020). https://doi.org/10.1186/s12874-020-01080-1
Thachil, J., et al.: ISTH interim guidance on recognition and management of coagulopathy in COVID-19. J. Thromb. Haemost. 18(5), 1023–1026 (2020). https://doi.org/10.1111/jth.14810
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(86), 2579–2605 (2008). http://jmlr.org/papers/v9/vandermaaten08a.html
Virtanen, P., et al.: SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature Methods 17, 261–272 (2020). https://doi.org/10.1038/s41592-019-0686-2
Waskom, M.L.: Seaborn: statistical data visualization. J. Open Source Softw. 6(60), 3021 (2021). https://doi.org/10.21105/joss.03021
Quinlan, J.R.: Improved use of continuous attributes in C4.5. J. Artif. Intell. Res. 4, 77–90 (1996). https://doi.org/10.1613/jair.279
Yu, L., Liu, H.: Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of the Twentieth International Conference on International Conference on Machine Learning, in ICML’03, pp. 856–863. AAAI Press, Washington, DC, USA (2003)
Jiménez, F., Sánchez, G., García, J.M., Sciavicco, G., Miralles, L.: Multi-objective evolutionary feature selection for online sales forecasting. Neurocomputing 234, 75–92 (2017). https://doi.org/10.1016/j.neucom.2016.12.045
Ratra, R., Gulia, P., Gill, N.S.: Performance analysis of classification techniques in data mining using WEKA. SSRN J. (2021). https://doi.org/10.2139/ssrn.3879610
Hornik, K., Buchta, C., Zeileis, A.: Open-source machine learning: R meets Weka. Comput Stat 24(2), 225–232 (2009). https://doi.org/10.1007/s00180-008-0119-7
Mikulkova, Z., et al.: Deciphering the complex circulating immune cell microenvironment in chronic lymphocytic leukaemia using patient similarity networks. Sci. Rep. 11(1), 322 (2021). https://doi.org/10.1038/s41598-020-79121-4
Ochodkova, E., Zehnalova, S., Kudelka, M.: Graph construction based on local representativeness. In: Cao, Y., Chen, J. (eds.) Computing and Combinatorics COCOON 2017. Lecture Notes in Computer Science LNCS, vol. 10392, pp. 654–665. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-62389-4_54
Sova, M., et al.: Network analysis for uncovering the relationship between host response and clinical factors to virus pathogen: lessons from SARS-CoV-2. Viruses 14(11), 2422 (2022). https://doi.org/10.3390/v14112422
Fernández Villalobos, N.V., et al.: Effect modification of the association between comorbidities and severe course of COVID-19 disease by age of study participants: a systematic review and meta-analysis. Syst. Rev. 10(1), 194 (2021). https://doi.org/10.1186/s13643-021-01732-3
Russell, C.D., Lone, N.I., Kenneth Baillie, J.: Comorbidities, multimorbidity and COVID-19. Nat. Med. 29(2), 334–343 (2023). https://doi.org/10.1038/s41591-022-02156-9
Acknowledgments
The study was performed in accordance with the ethical standards of the institutional or national research committee and respected the 1964 Helsinki Declaration and its later amendments or comparable relevant ethical standards and was approved by the Institutional Ethics Committee of Palacký University Olomouc and University Hospital Olomouc.
This research was funded by SGS, VSB-Technical University of Ostrava (grant number SP2023/076) and the Ministry of Health of the Czech Republic (grant number NU22-A-105), and in part by IGA-LFUP-2023-010 and FNOL-00098892.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Gharibian, A. et al. (2023). A Real-World Clinical Data Mining of Post COVID-19 Patients. In: Barolli, L. (eds) Advances in Intelligent Networking and Collaborative Systems. INCoS 2023. Lecture Notes on Data Engineering and Communications Technologies, vol 182. Springer, Cham. https://doi.org/10.1007/978-3-031-40971-4_41
Download citation
DOI: https://doi.org/10.1007/978-3-031-40971-4_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40970-7
Online ISBN: 978-3-031-40971-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)