Abstract
In medical domain, huge amounts of data are generated at all times. Data exploration is very important to help physicians or medical researchers to find the required datasets. However, the storage and computation of large-scale data surpass the performance limits of traditional relational databases, which are prone to performance bottlenecks, and it is difficult to expand the storage capacity and computational power. To solve the above problems, this paper studies the distributed migration storage and computing methods of medical data, and proposes a distributed computing and storage strategy of medical data to achieve efficient medical data exploration. We use the open source tools to migrate the MIMIC-IV medical database and optimize the sepsis data exploration. The experimental results show that compared with the traditional single node method, our optimization method based on Hive-ORC and its indexes, partition table reduces the storage space by 85% and the query time by 86%. This mechanism has higher efficiency in data management in the medical field. In addition, with the increase of data volume and cluster nodes, the strategy can achieve better optimization results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Joyce, A.M., Naddaf-Dezfbli, A., Davenport, S.L.: A medical microcomputer database management system. Methods Inf. Med. 24(02), 73–78 (1985)
Mohamad, B., d’Orazio, L., Gruenwald, L.: Towards a hybrid row-column database for a cloud-based medical data management system. In: Proceedings of the 1st International Workshop on Cloud Intelligence, pp. 1–4 (2012)
Sebaa, A., Chikh, F., Nouicer, A., Tari, A.: Medical big data warehouse: architecture and system design, a case study: improving healthcare resources distribution. J. Med. Syst. 42(4), 1–16 (2018)
Farooqui, N.A., Mehra, R.: Design of a data warehouse for medical information system using data mining techniques. In: 2018 Fifth International Conference on Parallel, Distributed and Grid Computing (PDGC), pp. 199–203. IEEE (2018)
Tu, Y., Lu, Y., Chen, G., Zhao, J., Yi, F.: Architecture design of distributed medical big data platform based on spark. In: 2019 IEEE 8th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), pp. 682–685. IEEE (2019)
Song, G., Wen, Y., Jia, Y., Liu, H.: Research on medical service system based on big data technology. In: 2019 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS), pp. 302–304. IEEE (2019)
Li, D., Ye, Z., Li, L., Wei, X., Qin, B., Li, Y.: Practical data mid-platform design and implementation for medical big data. In: 2019 IEEE 4th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), vol. 1, pp. 1042–1045. IEEE (2019)
Du, B.: Distributed large-scale time-series data processing and analysis system based on spark platform. In: 2021 International Conference on Big Data Analysis and Computer Science (BDACS), pp. 105–110. IEEE (2021)
Peng, B., Liu, L.: Query optimization for air quality big data based on hive-orc. In: 2020 5th International Conference on Control, Robotics and Cybernetics (CRC), pp. 19–23. IEEE (2020)
Čerešňák, R., Kvet, M.: Comparison of distributed data transformation and comparing query performance in relational and non-relational database. In: 2019 17th International Conference on Emerging eLearning Technologies and Applications (ICETA), pp. 108–114. IEEE (2019)
Ranade, M.D., Deshpande, A.: Exploratory analysis of disease characteristics and demographic data of neonatal patients using MIMIC-IV database. In: 2021 International Conference on Communication Information and Computing Technology (ICCICT), pp. 1–6. IEEE (2021)
Nowroozilarki, Z., Pakbin, A., Royalty, J., Lee, D.K., Mortazavi, B.J.: Real-time mortality prediction using MIMIC-IV ICU data via boosted nonparametric hazards. In: 2021 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI), pp. 1–4. IEEE (2021)
Zhang, Y., et al.: HKGB: an inclusive, extensible, intelligent, semi-auto-constructed knowledge graph framework for healthcare with clinicians’ expertise incorporated. Inf. Process. Manage. 57(6), 102324 (2020)
Sarki, R., Ahmed, K., Wang, H., Zhang, Y.: Automated detection of mild and multi-class diabetic eye diseases using deep learning. Health Inf. Sci. Syst. 8(1), 1–9 (2020)
Opaliński, A., et al.: Medical data exploration based on the heterogeneous data sources aggregation system. In: 2019 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 591–597. IEEE (2019)
Demirer, R.M., Demirer, O.: Early prediction of sepsis from clinical data using artificial intelligence. In: 2019 Scientific Meeting on Electrical-Electronics & Biomedical Engineering and Computer Science (EBBT), pp. 1–4. IEEE (2019)
Shanthi, N., et al.: A novel machine learning approach to predict sepsis at an early stage. In: 2022 International Conference on Computer Communication and Informatics (ICCCI), pp. 1–7. IEEE (2022)
Acknowledgements
This work was supported in part by the China Postdoctoral Science Foundation under Grant 2020M672217, and in part by the Science and Technology Research Key Project of Henan Province Science and Technology Department under Grant 222102210133.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ding, S., Mao, C., Zheng, W., Xiao, Q., Wu, Y. (2022). Data Exploration Optimization for Medical Big Data. In: Traina, A., Wang, H., Zhang, Y., Siuly, S., Zhou, R., Chen, L. (eds) Health Information Science. HIS 2022. Lecture Notes in Computer Science, vol 13705. Springer, Cham. https://doi.org/10.1007/978-3-031-20627-6_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-20627-6_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20626-9
Online ISBN: 978-3-031-20627-6
eBook Packages: Computer ScienceComputer Science (R0)