Hypoplastic left heart syndrome: knowledge discovery with a data mining approach

https://doi.org/10.1016/j.compbiomed.2004.07.007Get rights and content

Abstract

Hypoplastic left heart syndrome (HLHS) affects infants and is uniformly fatal without surgical palliation. Post-surgery mortality rates are highly variable and dependent on postoperative management. A data acquisition system was developed for collection of 73 physiologic, laboratory, and nurse-assessed parameters. The acquisition system was designed for the collection on numerous patients. Data records were created at 30 s intervals. An expert-validated wellness score was computed for each data record. To efficiently analyze the data, a new metric for assessment of data utility, the combined classification quality measure, was developed. This measure assesses the impact of a feature on classification accuracy without performing computationally expensive cross-validation. The proposed measure can be also used to derive new features that enhance classification accuracy. The knowledge discovery approach allows for instantaneous prediction of interventions for the patient in an intensive care unit. The discovered knowledge can improve care of complex to manage infants by the development of an intelligent bedside advisory system.

Introduction

Hypoplastic left heart syndrome (HLHS) is a heart disease of newborn infants. The occurrence of HLHS is rare, effecting between 0.16 and 0.36 in every 1000 births, but it is inevitably fatal without surgical intervention [1]. The Norwood procedure has emerged in the last decade as the most common treatment [2] and consists of a three stage surgical intervention [3]. Stage I of the Norwood procedure includes three main components: an atrial septectomy, an anastomosis of the proximal pulmonary artery to the aorta with homograft augmentation of the aortic arch, and an aortopulmanary shunt. As a result of this procedure the patient's right ventricle is connected to the aorta so that it can force the delivery of oxygenated blood through the braches of the aorta.

Although the procedure is lifesaving, a consequence of this reconstruction is the creation of a “balanced circulation” which implies a precarious metastable balance between the pulmonary and systemic circulations. The most critical time for the neonate is the surgery itself and the time immediately following surgery spent in the pediatric intensive care unit (PICU) [3]. Typically, complications are attributed to the unstable balance between the pulmonary and systemic circulation. There are rapid and massive shifts in the cardiac output, pulmonary resistance, and systemic resistance for the first 3–4 days after surgery. An experienced team of physicians, nurses, and therapists is required to successfully navigate the changes in this period. However, even the most experienced teams report significant mortality due to the extremely complex relationships among physiologic parameters in a given patient. The mortality rate for the three-stage procedure is highest following the first stage [4] and can reach as high as 42% [2].

Specifically, the inability to directly measure crucial parameters in the postoperative infant results in the need for physicians to infer the value of crucial, but immeasurable, parameters from a group of obtainable parameters used to monitor the infant. Obtainable postoperative parameters include: pulse, heart rhythm, systemic blood pressure, common atrial filling pressure, urine output, physical exam, and systemic and mixed venous oxygen saturations. Based on these values, inferences are made as to the value of crucial life-saving parameters (e.g., pulmonary and systemic blood flow). These parameters change rapidly in the postoperative period, and subtle constellations of changes in the obtainable parameters are often unnoticed by the inexperienced caregiver but lead to a “sudden” postoperative death. Closer analysis of the medical record often reveals clusters of changes that should have signaled a modification of the direction of postoperative therapy.

An experienced team can assimilate the numerous measurable parameters and, using experience-based intuition, infer the value of crucial control parameters, thereby managing a neonate with more success during the critical postoperative period.

There are two categories of problems involved in caring for the infants after surgery. The first issue involves the nature of the decision-making required for the care. Decisions made even by the most experienced physician are far from being ideal as the relationships between the measurable and inferred parameters are highly complex, nonlinear, and frequently not known.

The second problem involves the difficulty in communicating correct response patterns. These response patterns are commonly referred to as “wisdom”. In this setting, wisdom is the ability to successfully interpret the multiple, complex, and unknown relationships between the parameters available and to generate a successful therapeutic plan. Because these relationships are complex, the transfer of this wisdom from experienced to inexperienced personnel is extremely difficult, even within a single institution. Furthermore, the transfer of wisdom from a high-volume health center to a low-volume center is even more difficult. Consequently, infants treated in low-volume centers are denied the benefit of the wisdom available in high-volume centers. Furthermore, even within a high-volume institution, the lack of continuous (24 h a day) supervision by experienced personnel at the bedside can deny a critically-ill infant the benefit of available wisdom.

An essential step to understanding the complex relationships between parameters is being able to define and predict the health of a patient. In this paper, a data mining approach is proposed to capture the complex interactions among physiological variables and therapeutic interventions. The approach yields a set of rules that are both easily interpretable and highly accurate. These rules will be used to predict the health status of a postoperative neonate, interventions, and other user defined outcomes. The data mining approach is aided by a new metric for the selection of features (parameters) to be transformed that in turn leads to higher accuracy in predicating the “wellness score” of a patient. Every percent increase in prediction accuracy is significant because it improves the overall understanding of postoperative management.

The next section details the collection of data and the development of the “wellness score”. The score is essential to the data mining algorithms and it is used as an indicator of the patient's health.

Section snippets

Data collection

The patients who were subjects in this study had been diagnosed with HLHS and had undergone the first stage Norwood procedure for palliation of HLHS. There were no other selection criteria.

Data collection began upon admission of the patient to the PICU immediately after Norwood surgery and lasted between 18 and 36 h. Three categories of data were collected: continuously monitored physiologic parameters, intermittently monitored physiologic parameters, and interventions.

The continuously monitored

Methods

In this paper, a machine learning algorithm based on the rough set theory [6] is used. This algorithm represents a large class of algorithms generating decision rules from data. The use of learning algorithms for extraction of explicit knowledge from the HLHS data is novel. The literature on data analysis of HLHS is limited. Alonso–Betanzo [7] discussed the application of a neural network (NN) to predict fetal outcome based on the nonstress test (NST), which is used to evaluate intrauterine

Extension of the classification accuracy measure

Classification quality (CQ) is the measure of association between a feature and the outcome (e.g., wellness score). Its role in rough set theory can be compared to that of the correlation coefficient in statistics. For a given feature, CQ can be loosely defined as the ratio of objects with non-conflicting feature values to the total number of objects in the data set. Formal definitions of the classification quality and its extensions are presented in the Appendix. An example calculation of CQ

Feature transformation

In this section, the relationship between the combined classification quality measure and classification accuracy is discussed. In addition, the combined classification quality measure is used to derive new features increasing classification accuracy of the rules extracted from a transformed data set.

To establish the relationship between the combined classification quality measure and classification accuracy, four different data scenarios (each with one feature only) of the data set in Table 8

Benefits of the combined classification quality measure

The combined classification quality measure offers the following benefits:

  • (a)

    Feature selection: The combined classification quality measure evaluates features in the same way as the entropy measure, Gini index, and other metrics.

  • (b)

    Quick and low computational cost evaluation of the newly introduced features: The computational complexity of the combined classification quality measure is much lower than the cross-validation task involving multiple runs of machine learning algorithms.

  • (c)

    Tool for deriving

Computational results

To demonstrate that data mining algorithms could be applied to discover knowledge in the postoperative care of HLHS patients we designed a five staged experiment. The experiment demonstrates the effectiveness of both the data mining approach and the use of the proposed metric, CCQ, to select features for transformation and the benefits of using transformed features. The experiment was set up as follows:

  • Step 1:

    Calculate the CQ, RCQ, and CCQ for all of the collected features in the data set.

  • Step 2:

    Determine

Conclusions

In this paper, a data mining approach to postoperative management of infants with hypoplastic left heart syndrome was considered. To efficiently analyze the data, a new metric for assessment of data utility, called the combined classification quality (CCQ) measure, was developed. The power of the combined classification quality measure has resulted in improvement of classification accuracy of the wellness score.

The use of a data mining approach with the combined classification quality metric

Acknowledgements

The authors would like to express appreciation to A. Glick for organizing some of the data sets used in the study, C.F. Yu for design and coding the data collection system, and Y. Gan for design and development of the user interface. The research has been partially funded by the Children's Miracle Network.

Alex Burns graduated from the University of Michigan with a BSE degree in Industrial and Operations Engineering. He is currently pursing his Master's degree in industrial engineering at The University of Iowa. His fields of interest include data mining, feature transformations, and knowledge discovery.

References (13)

  • C.L. Tsien et al.

    Multiple signal integration by decision tree induction to detect artifacts in the neonatal intensive care unit

    Artif. Intell. Med.

    (2000)
  • M. McConnell et al.

    The neonate suspected with congenital heart disease

    Crit. Care Nurs. Q.

    (2002)
  • H.P. Gutgeell et al.

    Management of hypoplastic left heart syndrome in the 1990s

    Am. J. Cardiol.

    (2002)
  • C. Wright

    Cardiac surgery 2002staged repair of hypoplastic left heart syndrome

    Crit. Care Nurs. Q.

    (2002)
  • A. Tulloh et al.

    Outcome of staged reconstructive surgery for hypoplastic left heart syndrome following antenatal diagnosis

    Arch. Dis. Childhood

    (2001)
  • O. Barnea et al.

    Estimation of oxygen delivery in newborns with a univentricular circulation

    Circulation

    (1998)
There are more references available in the full text version of this article.

Cited by (24)

  • Optimized neural network-based fault diagnosis strategy for VRF system in heating mode using data mining

    2017, Applied Thermal Engineering
    Citation Excerpt :

    Then, a data mining method is used to optimize the feature variables selection. Data mining [32–34] is a kind of complex process to discover hidden knowledge among large data sets. The widely used data mining technique, association rule mining [35–39], is adopted in this paper to discover variables related faults.

  • Knowledge discovery in medicine: Current issue and future trend

    2014, Expert Systems with Applications
    Citation Excerpt :

    The most important challenge of surgical success is existence of numerous variables and requires post-operative management. Kusiak et al. (2006) discovered relations among these variables by rough set and extracted a real-time prediction model. This model predicted intervention of different variables for patients admitted to the ICU.

  • Results on mining NHANES data: A case study in evidence-based medicine

    2013, Computers in Biology and Medicine
    Citation Excerpt :

    On the other hand, RCTs require dedicated resources and careful experimental design (e.g., volunteers, control groups, etc.), and may, in some instances, be entirely unsuitable [65]. By contrast, evidence-based medicine takes advantage of the vast amount of information collected in standard medical encounters (e.g., visits to the doctor, results of laboratory analyses, etc.) or targeted health questionnaires (e.g., NHANES), which can be analyzed with statistical and data mining tools (e.g., see [66–69]). Online activities, such as writing blogs or exchanging messages in social media applications (e.g., Facebook, Twitter), as well as increased use of mobile technology, have also been shown to leave behind “digital breadcrumbs—tiny records of daily experiences” that, when mined and analyzed, can provide insight into health behavior and health outcomes [70,71].

View all citing articles on Scopus

Alex Burns graduated from the University of Michigan with a BSE degree in Industrial and Operations Engineering. He is currently pursing his Master's degree in industrial engineering at The University of Iowa. His fields of interest include data mining, feature transformations, and knowledge discovery.

Christopher Caldarone is an Associate Professor at the University of Toronto and a Staff Cardiovascular Surgeon at the Hospital for Sick Children in Toronto. As a full time congenital heart surgeon, one of his primary clinical interests is the care of neonates. He has published in the numerous journals including the Journal of Thoracic and Cardiovascular Surgery, the Annals of Thoracic Surgery, and Circulation. He is a member of the American College of Surgeons, the American Academy of Pediatrics, and the Society of Thoracic Surgeons. His email address is [email protected].

Michael Kelleher is an Associate Professor of Pediatrics at the Northwestern University and is engaged in the practice of pediatric intensive care medicine at Children's Memorial Hospital in Chicago, Illinois. He is interested in medical informatics and the application of computational intelligence to the care of critically ill children. He is a Fellow of the American Academy of Pediatrics, a member of the American Thoracic Society, the American Medical Informatics Association, and the Society for Critical Care Medicine. He has published in journals sponsored by the ATS and serves as a reviewer for Critical Care Medicine. His E-mail address is [email protected].

Andrew Kusiak is a Professor of Industrial Engineering at the University of Iowa, Iowa City. He is interested in theory and applications of computational intelligence, data mining, and optimization in healthcare, pharmaceutical industry, product development, and manufacturing. He has published research papers in journals sponsored by AAAI, IEEE, IIE, INFORMS, ESOR, IFIP, IFAC, IPE, ISPE, and SME. He speaks frequently on international meetings, conducts professional seminars, and consults for industrial corporations. He serves on the editorial boards of 18 journals, and edits book series. He is the Editor-in-Chief of the Journal of Intelligent Manufacturing. His E-mail address is [email protected].

Fred S. Lamb is an Associate Professor of Pediatrics at the University of Iowa, Iowa City, Iowa. Trained as a Pediatric Cardiologist, he is the head of the Division of Pediatric Critical Care and Medical Director of the Pediatric Intensive Care Unit. His clinical interests include management of postoperative congenital heart repair patients with a particular emphasis on factors regulating vascular tone. His basic science research laboratory is funded by the National Institutes of Health and the American Heart Association to study the physiologic role of chloride ion channels in determining the contractility of vascular smooth muscle. His E-mail address is [email protected].

Thomas Persoon is a Management Engineer in the Department of Pathology and an Adjunct Instructor of Industrial Engineering at the University of Iowa, Iowa City, Iowa. He graduated from the University of Iowa with an MS degree in Industrial Engineering. He is interested in process modeling, data flow, project management, and data mining.

View full text