Skip to main content

Data Exploration Optimization for Medical Big Data

  • Conference paper
  • First Online:
Health Information Science (HIS 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13705))

Included in the following conference series:

Abstract

In medical domain, huge amounts of data are generated at all times. Data exploration is very important to help physicians or medical researchers to find the required datasets. However, the storage and computation of large-scale data surpass the performance limits of traditional relational databases, which are prone to performance bottlenecks, and it is difficult to expand the storage capacity and computational power. To solve the above problems, this paper studies the distributed migration storage and computing methods of medical data, and proposes a distributed computing and storage strategy of medical data to achieve efficient medical data exploration. We use the open source tools to migrate the MIMIC-IV medical database and optimize the sepsis data exploration. The experimental results show that compared with the traditional single node method, our optimization method based on Hive-ORC and its indexes, partition table reduces the storage space by 85% and the query time by 86%. This mechanism has higher efficiency in data management in the medical field. In addition, with the increase of data volume and cluster nodes, the strategy can achieve better optimization results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Joyce, A.M., Naddaf-Dezfbli, A., Davenport, S.L.: A medical microcomputer database management system. Methods Inf. Med. 24(02), 73–78 (1985)

    Article  Google Scholar 

  2. Mohamad, B., d’Orazio, L., Gruenwald, L.: Towards a hybrid row-column database for a cloud-based medical data management system. In: Proceedings of the 1st International Workshop on Cloud Intelligence, pp. 1–4 (2012)

    Google Scholar 

  3. Sebaa, A., Chikh, F., Nouicer, A., Tari, A.: Medical big data warehouse: architecture and system design, a case study: improving healthcare resources distribution. J. Med. Syst. 42(4), 1–16 (2018)

    Article  Google Scholar 

  4. Farooqui, N.A., Mehra, R.: Design of a data warehouse for medical information system using data mining techniques. In: 2018 Fifth International Conference on Parallel, Distributed and Grid Computing (PDGC), pp. 199–203. IEEE (2018)

    Google Scholar 

  5. Tu, Y., Lu, Y., Chen, G., Zhao, J., Yi, F.: Architecture design of distributed medical big data platform based on spark. In: 2019 IEEE 8th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), pp. 682–685. IEEE (2019)

    Google Scholar 

  6. Song, G., Wen, Y., Jia, Y., Liu, H.: Research on medical service system based on big data technology. In: 2019 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS), pp. 302–304. IEEE (2019)

    Google Scholar 

  7. Li, D., Ye, Z., Li, L., Wei, X., Qin, B., Li, Y.: Practical data mid-platform design and implementation for medical big data. In: 2019 IEEE 4th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), vol. 1, pp. 1042–1045. IEEE (2019)

    Google Scholar 

  8. Du, B.: Distributed large-scale time-series data processing and analysis system based on spark platform. In: 2021 International Conference on Big Data Analysis and Computer Science (BDACS), pp. 105–110. IEEE (2021)

    Google Scholar 

  9. Peng, B., Liu, L.: Query optimization for air quality big data based on hive-orc. In: 2020 5th International Conference on Control, Robotics and Cybernetics (CRC), pp. 19–23. IEEE (2020)

    Google Scholar 

  10. Čerešňák, R., Kvet, M.: Comparison of distributed data transformation and comparing query performance in relational and non-relational database. In: 2019 17th International Conference on Emerging eLearning Technologies and Applications (ICETA), pp. 108–114. IEEE (2019)

    Google Scholar 

  11. Ranade, M.D., Deshpande, A.: Exploratory analysis of disease characteristics and demographic data of neonatal patients using MIMIC-IV database. In: 2021 International Conference on Communication Information and Computing Technology (ICCICT), pp. 1–6. IEEE (2021)

    Google Scholar 

  12. Nowroozilarki, Z., Pakbin, A., Royalty, J., Lee, D.K., Mortazavi, B.J.: Real-time mortality prediction using MIMIC-IV ICU data via boosted nonparametric hazards. In: 2021 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI), pp. 1–4. IEEE (2021)

    Google Scholar 

  13. Zhang, Y., et al.: HKGB: an inclusive, extensible, intelligent, semi-auto-constructed knowledge graph framework for healthcare with clinicians’ expertise incorporated. Inf. Process. Manage. 57(6), 102324 (2020)

    Article  Google Scholar 

  14. Sarki, R., Ahmed, K., Wang, H., Zhang, Y.: Automated detection of mild and multi-class diabetic eye diseases using deep learning. Health Inf. Sci. Syst. 8(1), 1–9 (2020)

    Article  Google Scholar 

  15. Opaliński, A., et al.: Medical data exploration based on the heterogeneous data sources aggregation system. In: 2019 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 591–597. IEEE (2019)

    Google Scholar 

  16. Demirer, R.M., Demirer, O.: Early prediction of sepsis from clinical data using artificial intelligence. In: 2019 Scientific Meeting on Electrical-Electronics & Biomedical Engineering and Computer Science (EBBT), pp. 1–4. IEEE (2019)

    Google Scholar 

  17. Shanthi, N., et al.: A novel machine learning approach to predict sepsis at an early stage. In: 2022 International Conference on Computer Communication and Informatics (ICCCI), pp. 1–7. IEEE (2022)

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by the China Postdoctoral Science Foundation under Grant 2020M672217, and in part by the Science and Technology Research Key Project of Henan Province Science and Technology Department under Grant 222102210133.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenkui Zheng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ding, S., Mao, C., Zheng, W., Xiao, Q., Wu, Y. (2022). Data Exploration Optimization for Medical Big Data. In: Traina, A., Wang, H., Zhang, Y., Siuly, S., Zhou, R., Chen, L. (eds) Health Information Science. HIS 2022. Lecture Notes in Computer Science, vol 13705. Springer, Cham. https://doi.org/10.1007/978-3-031-20627-6_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20627-6_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20626-9

  • Online ISBN: 978-3-031-20627-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics