A cloud-based framework for Home-diagnosis service over big medical data

https://doi.org/10.1016/j.jss.2014.05.068Get rights and content

Highlights

  • We design a cloud-based framework to implement a Home-diagnosis service.

  • A disease-symptom lattice is computed to help users judge what is their illness.

  • Similar medical records are provided as references for Home-diagnosis service.

  • The cloud-based framework could achieve well scalability.

Abstract

Self-caring services are becoming more and more important for our daily life, especially under the urgent situation of global aging. Big data such as massive historical medical records makes it possible for users to have self-caring services, such as to get diagnosis by themselves with similar patients’ records. Developing such a self-caring service gives rises to challenges including highly concurrent and scalable medical record retrieval, data analysis, as well as privacy protection. In this paper, we propose a cloud-based framework for implementing a self-caring service named Home-diagnosis to address the above challenges. Concretely, a Lucene-based distributed search cluster is designed to support highly concurrent and scalable medical record retrieval, data analysis and privacy protection. Moreover, to speed up medical record retrieval, a Hadoop cluster is adopted for offline data storage and index building. The implementation of the Home-diagnosis service is discussed, where similar historical medical records as well as a disease-symptom lattice are obtained, to help users figure out which kind of disease they are probably infected with. Finally, a prototype system is designed and a running example is presented to demonstrate the scalability and efficiency of our proposal.

Introduction

According to the World Health Organization report, the people in Suboptimal Health Status (SHS), also known as “the third state” (between being healthy and falling sick), account for 75% among the world population (He et al., 2013). In China, the number of people in such status has reached up to 900 million (Ding et al., 2009). A considerable part of them would pay close attention to their health hoping to get preventive health examination or educate themselves with similar patient's medical records. Moreover, with the aging population growing, some measurements to launch chronic disease surveillance in daily life for elderly people should be performed. Therefore, to satisfy the demands of “SHS” groups and aging population, on-demand self-caring services should be developed to help people get disease precaution knowledge conveniently at home (Rashidi and Cook, 2009, Cook et al., 2003, Doctor et al., 2005).

On the other hand, the ever-increasing amount of medical and diagnostic data produced in daily clinical activities, makes it possible to develop self-caring services to satisfy requirements of SHS groups or elderly people. However, the massive amount of medical data as well as its various formats, pose a challenge on on-scale data management and efficient knowledge mining technologies, which is also known as “big data” issue (Chaudhuri, 2012). Due to the salient characteristics of cloud computing such as elastic computing power, and pervasive service-oriented nature (Shang et al., 2013, Xu et al., 2012), cloud computing technologies have been widely researched and used in big data area (Canny and Zhao, 2013, Cheng et al., 2012), and many healthcare services have been migrated into the cloud environment.

More specifically, the research presented in this paper is based on a large research project of building a Healthcare Information Cloud Platform for the Health Bureau in Lianyungang, a city locates in Jiangsu Province and near to Shanghai, China. The Health Bureau is going to build a platform to gather all medical information such as medical records from each local healthcare practitioner (e.g., hospitals, clinics). By doing that, various healthcare services could be developed to satisfy real healthcare requirements. For example, each patient can have his/her personal health profile consisting of all his/her clinical records associated with each clinical visit.

Here, a motivating example is discussed to highlight the problem we are going to address.

Suppose there is a patient named Lee, who is getting sick someday. And he knows his symptoms include “fever” and “dyspnea”. Before he goes to a hospital for diagnosis, he wants to have a primary diagnosis through the Internet, so as to know which kind of disease he probably suffers from. Thus, he can have a proper appointment through a hospital's homepage in advance.

In the above example, if there is a self-caring service to providing diagnosis assistance with similar historical medical records according to Lee's disease symptoms, it would help Lee to make proper appointments. Moreover, with similar historical medical records, Lee could have a more detailed knowledge on his disease condition. Therefore, it will take both Lee and physicians less time to make the right treatment when Lee goes to hospital for diagnosis, thereby improving diagnosis efficiency.

However, the problem arises with how to provide such a self-caring service. Concretely, the problem embodies three issues. The first one is how to provide online real-time medical record retrieval among massive and ever-increasing medical records according to Lee's disease symptoms. Another one is how to extract useful diagnosis knowledge to help Lee figure out which kind of disease he is probably infected with, from the large number of retrieved medical records. The last one is how to avoid exposure of privacy information in the medical records, since medical records are privacy sensitive.

Motivated by these observations, in this paper, we address these challenges through the following contributions. (1) We propose a cloud-based framework for implementing a self-caring service named Home-diagnosis in this paper. Concretely, a distributed Lucene-based search cluster is designed to provide highly concurrent and scalable online medical record retrieval, data analysis and privacy protection functions. To speed up medical record retrieval, a Hadoop cluster is adopted for offline data storage and index building. (2) More specifically, the implementation of the Home-diagnosis service consists of four steps. Firstly, a user submits a query associated to his/her disease information. Then medical records matched the user's disease symptoms, gender and age are retrieved in Step 2. With retrieved medical records, data analysis is conducted in Step 3, to compute a disease-symptom lattice, which discloses the relations among diseases with common symptoms. At last, privacy sensitive information in the medical records is filtered according to an access control policy. Therefore, the disease-symptom lattice, as well as medical records with privacy processing is returned to users, which provide a detailed diagnosis basis for users to have a primary diagnosis by themselves.

The remainder of this paper is organized as follows. Section 2 discusses the preliminary knowledge of medical records, and some key technologies such as Hadoop computing framework and Lucene library adopted in the cloud based framework. The cloud-based framework for Home-diagnosis service is presented in Section 3. Section 4 presents the details of how the Home-diagnosis service provides diagnosis assistance for users. Evaluation of the cloud-based framework for Home-diagnosis service is discussed in Section 5, where a prototype system is designed and a running example is presented to demonstrate the scalability and efficiency of our proposal. Section 6 discusses related work on big medical data applications both in industry and academia area. Section 7 concludes the paper and discusses our future work.

Section snippets

Preliminary knowledge

In this section, preliminary knowledge is discussed, to introduce medical record, and some technologies applied in the cloud-based framework.

Application scenarios

In this paper, we propose a cloud-based framework for implementing the Home-diagnosis service to provide users diagnosis assistances mined from historical medical records. More specifically, the Home-diagnosis service enables symptom-based medical record retrieval according to a target user's query. Also, to help users distinguish diseases from the retrieved medical records, data analysis is conducted to build a disease-symptom lattice. The disease-symptom lattice discloses relations among

Implementation of Home-diagnosis service

In this section, we are going to discuss the implementation of the Home-diagnosis service supported by the cloud-based framework. The implementation of the Home-diagnosis service consists of 4 steps (see Fig. 7), including query submission, medical record retrieval, data analysis and privacy information filtering on returned results.

Step 1. Query submission

In this step, a target user submits a user query, associated with his/her disease information. On receiving a query associated with a set of

Evaluation

In this section, a prototype system is designed and a running example is presented to demonstrate the scalability and efficiency of our proposal. Specifically, the scalability of the Lucene-based distributed search cluster is evaluated with a set of experimental tries. Moreover, to better illustrate how the Home-diagnosis service provide diagnosis basis for a target user, a running example is discussed.

Related work and comparison analysis

With the wide adoption of information technologies, big data is produced in multiple areas, including aeronautics, medical area, and biology, to name a few. To mine useful knowledge from the big data has attracted great attention from both academia and industry. The main challenges for processing big data embody on-scale data storage, efficient knowledge mining technologies. Due to the salient nature of cloud computing such as elastic storage and computing power, and pervasive service-oriented

Conclusion and future work

In this paper, we have proposed a cloud-based framework for a self-caring service named Home-diagnosis. Enabled by the cloud-based framework, the Home-diagnosis service could provide users to get diagnosis assistance with historical medical records at home. Moreover, a disease-symptom lattice as well as similar historical medical records provides a detailed diagnosis basis to help users figure out which kind of disease they probably infected with. To sum up, the cloud-based framework could

Acknowledgement

This paper is partly supported by project National Science Foundation of China under Grants 91318301, and 61321491; National Key Technology R&D Program of the Ministry of Science and Technology under Grant 2011BAK21B06.

Wenmin Lin is currently working towards the Ph.D. degree at the Department of Computer Science and Technology, Nanjing University, China. She has received her Bachelor's degree in Software Engineering from Nanjing University of Science and Technology. Her research interests include cloud computing, service computing, Big Data and medical applications.

References (39)

  • Shu-Hsien Liao

    Expert system methodologies and applications – a decade review from 1995 to 2004

    Exp. Syst. Appl.

    (2005)
  • X. Zhang et al.

    A hybrid approach for scalable sub-tree anonymization over big data using mapreduce on cloud

    J. Comput. Syst. Sci.

    (2014)
  • N. Ao et al.

    Efficient parallel lists intersection and index compression algorithms using graphics processing units

    Proc. VLDB Endowment

    (2011)
  • Apache Hadoop, http://hadoop.apache.org (accessed...
  • Apache Lucene, http://en.wikipedia.org/wiki/Lucene (accessed...
  • D. Arroyuelo et al.

    To index or not to index: time-space trade-offs in search engines with positional ranking functions

  • A. Bahga et al.

    Analyzing massive machine maintenance data in a computing cloud

    IEEE Trans. Parallel Distributed Syst.

    (2012)
  • R. Belohlavek et al.

    Formal concept analysis with background knowledge: attribute priorities

    IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev.

    (2009)
  • B.H. Bloom

    Space/time trade-offs in hash coding with allowable errors

    Commun. ACM

    (1970)
  • J. Canny et al.

    Big data analytics with small footprint: squaring the cloud

  • B. Chandramouli et al.

    Scalable progressive analytics on big data in the cloud

    Proceeding of VLDB Endowment

    (2013)
  • S. Chaudhuri

    What next?.: a half-dozen data management research goals for big data and the cloud

  • Y. Cheng et al.

    Glade: big data analytics made easy

  • D.J. Cook et al.

    Mavhome an agent-based smart home

  • M. Crampes et al.

    Visualizing social photos on a Hasse diagram for eliciting relations and indexing new photos

    IEEE Trans. Visual. Comput. Graph.

    (2009)
  • H. Ding et al.

    The sub-health evaluation based on the modern diagnostic technique of traditional Chinese medicine

  • F. Doctor et al.

    A fuzzy embedded agent-based approach for realizing ambient intelligence in intelligent inhabited environments

    IEEE Trans. Syst. Man Cybern. Part A: Syst. Hum.

    (2005)
  • J. Ekanayake et al.

    Cloud technologies for bioinformatics applications

    IEEE Trans. Parallel Distributed Syst.

    (2011)
  • Expert system, http://en.wikipedia.org/wiki/Expert_system (accessed...
  • Cited by (70)

    • Lung cancer disease detection using service-oriented architectures and multivariate boosting classifier

      2022, Applied Soft Computing
      Citation Excerpt :

      An SOA-based approach to maintain clinical data sharing was previously designed in [14] for offering interoperability but failed to achieve an accurate analysis while handling a large volume of clinical data. A cloud-based approach was designed in [15] for providing home-diagnosis services with historical medical records. A disease-symptom lattice was measured to assist users’ judge and their illness.

    • Big data-based frameworks for healthcare systems

      2021, Demystifying Big Data, Machine Learning, and Deep Learning for Healthcare Analytics
    • Implications of big data analytics in developing healthcare frameworks – A review

      2019, Journal of King Saud University - Computer and Information Sciences
    View all citing articles on Scopus

    Wenmin Lin is currently working towards the Ph.D. degree at the Department of Computer Science and Technology, Nanjing University, China. She has received her Bachelor's degree in Software Engineering from Nanjing University of Science and Technology. Her research interests include cloud computing, service computing, Big Data and medical applications.

    Wanchun Dou received his Ph.D. degree in Mechanical and Electronic Engineering from Nanjing University of Science and Technology, China, in 2001. Now, he is a full professor of the State Key Laboratory for Novel Software Technology, Nanjing University. Up to now, he has chaired three NSFC projects and published more than 60 research papers in international journals and international conferences. His research interests include workflow, cloud computing and service computing.

    Zuojian Zhou is a Senior Engineer and Senior Project Manager. He has been working as a senior engineer in Health Information Technology, including the Regional Medical & Health Information Technology, and Electronic Medical Record System. From September 2012 to now, he is a Ph.D. candidate student at the Department of Computer Science and Technology, Nanjing University, China. His research interests include cloud computing, medical and health services.

    Chang Liu is currently working towards the Ph.D. degree at the University of Technology, Sydney, Australia. His research interests include cloud computing, resource management, cryptography, and data security.

    View full text