Can the development of a patient’s condition be predicted through intelligent inquiry under the e-health business mode? Sequential feature map-based disease risk prediction upon features selected from cognitive diagnosis big data

https://doi.org/10.1016/j.ijinfomgt.2019.05.006Get rights and content

Highlights

  • Online intelligent medical inquiry-based physician’s cognitive diagnosis big data was fused with offline EMR to obtain VEMR.

  • A sequential feature map-based model for disease risk prediction was presented to obtain online users’ medical conditions.

  • The optimized TRApriori method is used to mine a frequent feature map on the basis of the temporal graph.

  • The frequent feature graph to obtain the online user’s reconstruction coefficient and to realize disease risk prediction.

  • A neighborhood-based collaborative prediction model was presented for prediction of an online user’s possible diseases.

Abstract

The data-driven mode has promoted the researches of preventive medicine. In prediction of disease risks, physicians’ clinical cognitive diagnosis data can be used for early prevention of diseases and, therefore, to reduce medical cost, to improve accessibility of medical services and to lower medical risk. However, researches involved no physicians’ cognition of patients’ conditions in intelligent inquiry under e-health business mode, offered no diagnosis big data, neglected the values of the fused text information generated by joint activities of online and offline medical data, and failed to thoroughly analyze the phenomenon of redundancy-complementarity dispersion caused by high-order information shortage from the online inquiry data-driven perspective. Besides, the risk prediction simply based on offline clinical cognitive diagnosis data undoubtedly reduces prediction precision. Importantly, relevant researches rarely considered temporal relationships of different medical events, did not conduct detailed analysis on practical problems of pattern explosion, did not offer a thought of intelligent portrayal map, and did not conduct relevant risk prediction based on the sub-maps obtained from the map. In consequence, the paper presents a disease risk prediction method with the model for redundancy-complementarity dispersion-based feature selection from physicians’ online cognitive diagnosis big data to realize features selection from the cognitive diagnosis big data of online intelligent inquiry; the obtained features were ranked intelligently for subsequent high-dimensional information shortage compensation; the compensated key feature information of the cognitive diagnosis big data was fused with offline electronic medical record (EMR) to form the virtual electronic medical record (VEMR). The formed VEMR was combined with the method of the sequential feature map for modelling, and a sequential feature map-based model for disease risk prediction was presented to obtain online users’ medical conditions. A neighborhood-based collaborative prediction model was presented for prediction of an online intelligent medical inquiry user’s possible diseases in the future and to intelligently rank the risk probabilities of the diseases. In the experiments, the online intelligent medical inquiry users’ VEMRs were used as the foundation of the simulation experiments to predict disease risks in chronic obstructive pulmonary disease (OCPD) population and rheumatic heart disease (RHD) population. The experiments demonstrated that the presented method showed relatively good metric performances in the VEMR and improved disease risk prediction.

Introduction

With the growth in the living standard, people are paying increasingly attention to health. The rapid development of the internet industry has facilitated lots of online medical communities and the business mode of information platform-based intelligent inquiry to provide users with multiple accesses to medical information. These e-health business platforms mostly focus on health knowledge, disease information, and medical news and so on, and also serve users with functions of online medical consultation. Well-known commercial e-health websites in China include Sina Health, QIUYI.CN, haoyisheng.com, ask.39.net, etc., while famous foreign websites include patientslikeme, Daily Strength, well Sphere and MDJunction. It is suggested by investigation that only XYWY.com contains diseases-concerned Q&A data for above a decade from November 24, 2004, and has thousands of newly raised questions every day. As time goes on, the diseases-concerned information will accumulate to form big data. Such big data is a product of people’s extensive involvement, includes substantial true cases, and has high medical values. However, the big data presents a trend of high complexity and high dimensions on the commercial medical platforms. A high amount of feature dimension would increase the complexity of learning algorithm complexity for online medical data, reduce classification performance, and results in failure concerning the problem of feature sparsity in intelligent online medical inquiry-based physicians’ cognitive diagnosis big data; consequently, offline EMR cannot be fused for analysis based on sequential feature map, and online intelligent medical inquiry users’diseases risks will not be obtained.

Feature selection is a powerful tool to overcome the “dimension curse” faced with a learning algorithm. (Pascucci, 2002) Namely, feature selection can be used to select specific inquiry data on an online medical platform, and the selected feature subsets are then adopted for construction and training of a learning model. Therefore, under the background of e-health commercial platforms and big data-driven science, feature selection has become an important research orientation for network economic decision-making and smart business decision-making. On that account, feature selection methods fitting for physicians’ cognitive diagnosis big data of specific intelligent inquiry on e-health commercial platforms were studied thoroughly in this paper. Feature selection in physicians’ cognitive diagnosis big data of intelligent online inquiry is generally based on feature evaluation, which has been a hot and focus problem in feature selection researches and has led to plentiful measures, standards and methods for feature evaluation. Unfortunately, most of the methods carry a presupposed assumption or a given constant; more impertinently, weights of relevance and redundancy in physicians’ cognitive diagnosis big data of intelligent online inquiry cannot be identified with these measures, standards and methods. Therefore, it is very necessary to find a more effective non-prior parameter-based feature evaluation method for cognitive diagnosis big data. Based on in-depth exploration on redundancy and complementarity dispersions caused by high-order information loss in physicians’ cognitive diagnosis big data of intelligent online inquiry, possible conditions of high-order information was judged from low-order inter-feature relationships from the perspective of high-order mutual information “projection” in a low-order domain, and parameters were hereby determined according to the relationships of low-order item features (Bolón-Canedo, Sánchez-Maroño, & Alonso-Betanzos, 2016). Then, a feature selection method based on big data concerning redundancy- complementarity dispersion of intelligent online inquiry-based physicians’ cognitive diagnosis big data was presented.

Based on abovementioned feature selection in physicians’ cognitive diagnosis big data, the key feature information was fused with offline EMR for prediction of online intelligent medical inquiry users’ diseases risks, suggesting a promising future for change of current medical risks. With offline EMR as the principal part, VEMR is digital medical record centered on online intelligent medical inquiry-based physicians’ cognitive diagnosis big data, allows collection of one user-concerned physicians’ cognitive diagnosis information at different timings, integrates offline EMR in the form of online text, and acts as a main carrier guiding big data-driven medical researches (Nambiar, Bhardwaj, Sethi, & Vargheese, 2013; Wu, Roy, & Stewart, 2010). Firstly, obtain key feature information concerning an online user over time from feature selection of online intelligent medical inquiry-based physician’s cognitive diagnosis big data. Secondly, obtain key feature information, including some typical heterogeneous medical data, such as demographic statistics (i.e. age and gender), disease (i.e. disease name and symptom), therapeutic approach and test (i.e. drugs, lab results and physician’s instruction) and so on (Jensen, Jensen, & Brunak, 2012); data heterogeneity requires different modeling technologies for effective analysis and multiple options for their combinations. However, direction application of the key information will be highly challenging due to the heterogeneous sparsity of such key information, so, before specific analysis of medical risks, consistency processing is required for key feature information of every online intelligent medical inquiry user’s data (Hripcsak & Albers, 2012; Pathak, Kho, & Denny, 2013); namely, fuse the key feature information after feature selection with offline EMR, and transform them into clinically relevant VEMR feature information. There have been some researches expressing offline medical big data after consistency processing and exploring medical application with the obtained data, but there is rare research considering temporal relationships between online and offline medical events in the expressing form of a map. Presentation with a map is simple, direct and understandable, and facilitates a physical’s understanding on a patient’s health condition. Temporal relationship is a key factor for medical risk prediction, provides important information concerning possible disease(s) in the future, and contributes to timely adoption of effective preventive measures. Integrate a map with temporal relationships; fusion of the key feature information after feature selection of online users’ intelligent medical inquiry-based physician’s cognitive diagnosis big data with offline EMR is expressed in a sequential feature map; the method of integrating map evolution-covered time dimension information in common map information will effectively solve the problem concerning sparsity of online physicians’ cognitive diagnosis big data and, therefore, allow the sequential feature map-based disease risk prediction research in online users.

Therefore, the key feature information after feature selection of online intelligent medical inquiry-based physician’s cognitive diagnosis big data was fused with offline EMR in the paper to form VEMR. A map was integrated with temporal relationships to form a sequential feature map, which was used to describe online intelligent medical inquiry users’ VEMR data and based on which disease risk prediction was analyzed. The key feature information was obtained via redundancy- complementarity dispersion data-driven features; based on the presentation form of map and the VEMR temporal relationships, optimized TRApriori algorithm was used to mine a frequent-feature map upon temporal profile graph; relying on graph reconstruction theories, the frequent-feature map was used to reconstruct online intelligent medical inquiry users’ portraits, to obtain their reconstruction coefficients and to realize prediction of diseases risks. In addition, a neighborhood-based collaborative prediction model of online intelligent medical inquiry user’s sequential feature map was proposed and then used to obtain a table of ranked diseases probabilities and corresponding diseases’ possible risk initiating times existing in an online intelligent medical inquiry user’s disease risk portrait.

Section snippets

Cognitive diagnosis

In the process of doctor modeling and doctor evaluation, the cognitive diagnosis model can better model the cognitive state of the doctor, and the modeling results play a favorable role in the doctor’s cognition on a patient’s condition, disease diagnosis and other applications.

In recent years, medical experts have proposed various cognitive diagnosis models for doctor modeling, while different modeling methods are usually used in different doctor modeling scenarios with different doctor’s

Redundancy-complementarity dispersion-based method for feature acquisition in online intelligent medical inquiry big data

The redundancy-complementarity dispersion-based method for feature acquisition in cognitive diagnosis big data of intelligent medical inquiry is employed to obtain key feature information in cognitive diagnosis big data of intelligent medical inquiry, which is then fused with offline EMR to form VEMR. Relationships of cognitive diagnosis big data in intelligent medical inquiry are determined according to the redundancy-complementarity dispersion; the problem of redundancy-complementarity is

Disease risk prediction method based on a sequential feature map of online intelligent medical inquiry users

Regarding traditional methods, EMR without sorting treatment and processing of medical data features is used for statistical analysis of specific features, while temporal relationship between/among different medical events (diseases, drugs and medical tests) in online intelligent medical inquiry is not comprehensively analyzed (Davis, Chawla, Christakis, & Barabási, 2010). Therefore, modeling in the paper is based on the constructed VEMR and the sequential feature map. Since, the constructed

Experiment I

Based on the user inquiry big data of an online medical platform and TCGA EMR, the experiment is to verify whether the sequential feature map-based disease risk prediction method under feature selection for cognitive diagnosis big data of in online intelligent medical inquiry is useful. The data set selected for the experiment is as follows.

Measure standard

The following three traditional disease prediction methods and the method presented in the paper are compared in the experiment.

  • (1)

    Mean Vector Representation (MVR). When the method is utilized to predict disease risk, the average time of online medical event in user’s different inquiry sequences is computed. Besides, the representations of the denominator in two different online medical events are different; for example, denominators are the totals of administered drugs or test amount. Based on

Theoretical contribution

The research results have yielded some key theoretical implications.

First of all, based on reviewing existing literatures, it is found in the paper that many related researchers have carried out various and diverse researches on doctors’ cognitive diagnosis and offline intelligent diagnosis and treatment auxiliary system operations. However, it can be obviously viewed from the scarce researches on cognitive diagnosis model, they have been focused on design models, technical improvements and

Implications on practice

Regarding the contributions of the research to practical application, the online key feature information in the online intelligent medical inquiry environment is integrated with the offline EMR and converted into clinically relevant VEMR feature information, and the VEMR feature information is used to predict disease risk.

  • (1)

    Facilitate to enhance the accuracy of feature selection in the cognitive diagnosis big data under online intelligent medical inquiry.

Based on the in-depth analysis of the

Research results

Under the perspective driven by cognitive diagnosis big data of online intelligent medical inquiry, negligence of high-order mutual information existing in original methods is an effective means to eliminating big data-concerned “dimension curse”, but objectively causes bias in measure of inter-feature relevance. The redundancy-complementarity dispersion phenomenon caused by high-dimensional information deficiency in cognitive diagnosis big data of online intelligent medical inquiry is analyzed

Acknowledgments

The research described in this paper was substantially supported by Grants from the National Natural Science Foundation of China (nos. 71471178, 71871232, 71371194, and 71171201) and the State Key Program of National Natural Science Foundation of China (nos. 71431006, 71631008) and Major Project for National Natural Science Foundation of China (91846301, 71790615) and Projects of International Cooperation and Exchanges NSFC (no. 71210003) and the Fundamental Research Funds for the Central

Xin Liu In 2006, graduated from Xiangtan University with a Bachelor of Science in Management; in 2009, graduated from Wuhan University with a master’s degree in Software Engineering; and in 2014, graduated from Business School of Hunan University of Technology with a master’s degree in Business Administration. He is currently PhD student in Management Science and Engineering with the school of Business, Central South University. He is mainly engaged in the application of big data and natural

References (59)

  • A.M. Beltz et al.

    Bridging the nomothetic and idiographic approaches to the analysis of clinical data

    Assessment

    (2016)
  • V. Bolón-Canedo et al.

    Feature selection for high-dimensional data

    Progress in Artificial Intelligence

    (2016)
  • G. Brown et al.

    Conditional likelihood maximisation: A unifying framework for information theoretic feature selection

    Journal of Machine Learning Research

    (2012)
  • C.L.P. Chen et al.

    Data-intensive applications, challenges, techniques and technologies: A survey on big data

    Information Sciences

    (2014)
  • C.Y. Chiu et al.

    Consistency of cluster analysis for cognitive diagnosis: The DINO model and the DINA model revisited

    Applied Psychological Measurement

    (2015)
  • S.A. Chun et al.

    Collaborative and trajectory prediction models of medical conditions by mining patients’ social data

    IEEE International Conference on Bioinformatics and Biomedicine

    (2015)
  • M.R. Cowie et al.

    Adaptive servo-ventilation for central sleep apnea in systolic heart failure

    The New England Journal of Medicine

    (2015)
  • D.A. Davis et al.

    Time to care: A collaborative engine for practical disease prediction

    Data Mining and Knowledge Discovery

    (2010)
  • D.A. Davis et al.

    Time to care: A collaborative engine for practical disease prediction

    Data Mining and Knowledge Discovery

    (2010)
  • H. Ekbia et al.

    Big data, bigger dilemmas: A critical review

    Journal of the Association for Information Science and Technology

    (2015)
  • F. Fleuret

    Fast binary feature selection with conditional mutual information

    Journal of Machine Learning Research

    (2004)
  • M. Ghassemi et al.

    State of the art review: The data revolution in critical care

    Critical Care

    (2015)
  • K. Gu et al.

    The analysis of image contrast: From quality assessment to automatic enhancement

    IEEE Transactions on Cybernetics

    (2016)
  • M.A. Hall

    Correlation-based feature selection for discrete and numeric class machine learning

    Proceedings of the Seventeenth International Conference on Machine Learning

    (2000)
  • G. Hripcsak et al.

    Next-generation phenotyping of electronic health records

    Journal of the American Medical Informatics Association

    (2012)
  • P.B. Jensen et al.

    Mining electronic health records: Towards better research applications and clinical care

    Nature Reviews Genetics

    (2012)
  • D. Kale et al.

    Computational discovery of physiomes in critically ill children using deep learning

    DMMI Workshop

    (2014)
  • N. Kwak et al.

    Input feature selection for classification problems

    IEEE Transactions on Neural Networks

    (2002)
  • T.A. Lasko et al.

    Computational phenotype discovery using unsupervised feature learning over noisy, sparse, and irregular clinical data

    PloS One

    (2013)
  • Cited by (12)

    • How health care delivery organizations can exploit eHealth innovations: An integrated absorptive capacity and IT governance explanation

      2022, International Journal of Information Management
      Citation Excerpt :

      Rapid IT innovation is substantially influencing health care service use and delivery, HDO strategy implementation, and the relationships between health care providers, partners, and consumers. Health care reforms are introducing incentives for HDOs to incorporate eHealth innovations into their organizations to provide safe, effective, and quality health care services (Kim, Kim, Lee, & Kim, 2019; Liu & Zongrun, 2020; Sodhro, Luo, Sangaiah, & Baik, 2019). Therefore, modern HDOs increasingly exploit various innovative eHealth technologies to effectively incorporate clinical, administrative, and health care delivery systems into health care; improve patient-doctor relationships; initiate new health care services; facilitate partnerships of organizational governance with stakeholders; and add long-term value for clients (Al-Sharhan, Omran, & Lari, 2019; Ogu, 2022; Pieterse, Kip, & Cruz-Martínez, 2018).

    • Eye tracking technology to audit google analytics: Analysing digital consumer shopping journey in fashion m-retail

      2021, International Journal of Information Management
      Citation Excerpt :

      Most recent studies used business analytics approach to segment digital customers based on basket sales data (Griva, Bardaki, Pramatari, & Papakiriakopoulos, 2018; Musalem, Aburto, & Bosch, 2018). Big data analytics can enhance companies’ competitive advantage (Albarrak, Elnahass, Papagiannidis, & Salama, 2020; Upadhyay & Kumar, 2020), inform different options available to decision-makers (Fu et al., 2020; Galetsi, Katsaliaki, & Kumar, 2020; Georgiadou, Angelopoulos, & Drake, 2020; Liu, zhou, & Zongrun, 2020; Liu, Soroka, Han, Jian, & Tang, 2020) and uncover new challenges of the sharing economy (Albergaria & Chiappetta Jabbour, 2020; Sun, Zhao, & Sun, 2020). Furthermore, many companies still face a number of challenges, costs (Ghasemaghaei, 2020; Yang, Yu et al., 2020) and risks (Liu, zhou et al., 2020, Liu, Soroka et al., 2020) involved in big data analytics.

    • Enhancing Traceability of Infectious Diseases: A Blockchain-Based Approach

      2021, Information Processing and Management
      Citation Excerpt :

      4) Articulate the process flow of infectious disease information collection, storage, query, and tracing. 5) Improve the orderly accumulation of the information of infectious diseases’ confirmed and suspected cases, and infectious time and location, providing more accurate inputs for the disease control authorities in disease monitoring and pandemic response (Liu et al., 2020). After a comparative analysis, we demonstrate the main novelty and contributions of our study as follows: 1).

    • Big Data Mining for Heart Attack Diagnosis from Medical Records

      2023, International Journal of Intelligent Systems and Applications in Engineering
    View all citing articles on Scopus

    Xin Liu In 2006, graduated from Xiangtan University with a Bachelor of Science in Management; in 2009, graduated from Wuhan University with a master’s degree in Software Engineering; and in 2014, graduated from Business School of Hunan University of Technology with a master’s degree in Business Administration. He is currently PhD student in Management Science and Engineering with the school of Business, Central South University. He is mainly engaged in the application of big data and natural language processing in medical fields. His research interests include data analysis, classification, feature selection, deep learning and transfer learning.

    Yanju Zhou Professor and Doctoral Tutor. She is an expert in Decision science and supply chain management. She published about 30 articles in academic journals including International Journal of Production Economics, Knowledge-Based Systems, Human and Ecological Risk Assessment: An International Journal, Journal of Intelligent & Fuzzy Systems and so on. She obtained her bachelor’s degree in Economics and her M.S. in Management Science from Central South University in 1996 and 2002 respectively. She received her Ph.D. in Management Science From Beihang University in 2007.

    Zongrun Wang Ph.D. Management, Central South University 2004. 2014-, Vice Dean, School of Business, Central South University, China. 2010-, Professor, Doctoral Supervisor. 2008–2009, Visiting Scholar at California State University, Northridge. Published some 40 articles in academic journals including The Australian Economic Review, Economic Modelling, Journal of Applied Statistics, Physica A: Statistical Mechanics and its Applications, International Journal of Production, among others. Published three academic works. Much of my research is substantially supported by grants from National Natural Science Foundation of China. Peer reviewer for Journal of Financial Stability, Journal of Banking and Finance, International Journal of Production Economics.

    View full text