Predicting the need for CT imaging in children with minor head injury using an ensemble of Naive Bayes classifiers

doi:10.1016/j.artmed.2011.11.005

Artificial Intelligence in Medicine

Volume 54, Issue 3, March 2012, Pages 163-170

https://doi.org/10.1016/j.artmed.2011.11.005 Get rights and content

Abstract

Objective

Using an automatic data-driven approach, this paper develops a prediction model that achieves more balanced performance (in terms of sensitivity and specificity) than the Canadian Assessment of Tomography for Childhood Head Injury (CATCH) rule, when predicting the need for computed tomography (CT) imaging of children after a minor head injury.

Methods and materials

CT is widely considered an effective tool for evaluating patients with minor head trauma who have potentially suffered serious intracranial injury. However, its use poses possible harmful effects, particularly for children, due to exposure to radiation. Safety concerns, along with issues of cost and practice variability, have led to calls for the development of effective methods to decide when CT imaging is needed. Clinical decision rules represent such methods and are normally derived from the analysis of large prospectively collected patient data sets. The CATCH rule was created by a group of Canadian pediatric emergency physicians to support the decision of referring children with minor head injury to CT imaging. The goal of the CATCH rule was to maximize the sensitivity of predictions of potential intracranial lesion while keeping specificity at a reasonable level. After extensive analysis of the CATCH data set, characterized by severe class imbalance, and after a thorough evaluation of several data mining methods, we derived an ensemble of multiple Naive Bayes classifiers as the prediction model for CT imaging decisions.

Results

In the first phase of the experiment we compared the proposed ensemble model to other ensemble models employing rule-, tree- and instance-based member classifiers. Our prediction model demonstrated the best performance in terms of AUC, G-mean and sensitivity measures. In the second phase, using a bootstrapping experiment similar to that reported by the CATCH investigators, we showed that the proposed ensemble model achieved a more balanced predictive performance than the CATCH rule with an average sensitivity of 82.8% and an average specificity of 74.4% (vs. 98.1% and 50.0% for the CATCH rule respectively).

Conclusion

Automatically derived prediction models cannot replace a physician's acumen. However, they help establish reference performance indicators for the purpose of developing clinical decision rules so the trade-off between prediction sensitivity and specificity is better understood.

Introduction

Computed tomography (CT) is widely accepted as an effective diagnostic modality to detect rare but clinically significant intracranial injuries in patients suffering minor head injury. As such, it has been increasingly utilized as a routine test for these patients [1]. However, a seminal study by Brenner and Hall [2] warns against its harmful effects (particularly for children) due to the radiation exposure associated with CT. Independent CT imaging studies [1], [3], [4] advocate the adoption of a comprehensive approach that targets physicians’ education to reduce the over-reliance on CT imaging for head injury patients.

The diagnosis of a potentially serious brain injury following a minor head trauma is a well-documented challenge [5]. It is believed that clinical decision rules could help with this challenge and reduce unnecessary CT imaging. Broder [4] recommends that such decision rules rely on readily available patient data including physical examination and a patient's history. In line with these recommendations and in response to a growing need to improve the management of pediatric patients with minor head trauma in the emergency department (ED), Osmond et al. [6] developed the Canadian Assessment of Tomography for Childhood Head Injury (CATCH) clinical decision rule. The prospective cohort study was conducted in ten Canadian pediatric teaching hospitals and enrolled children brought to the ED who had blunt head trauma characterized by loss of consciousness, amnesia, disorientation or repeated vomiting along with a score of at least 13 on the Glasgow Coma Scale. Such patients have often, but not always in a consistent manner [7], been referred to CT imaging to rule out a potential intracranial lesion that might necessitate a neurologic intervention.

The CATCH data set contains 3866 patient records described by 26 clinical attributes (standardized clinical findings from a patient's medical history, general examination and neurological status). These patient records are partitioned according to two classification schemes; the primary classification distinguishes between patients who had a brain injury and those who had no injury, where “brain injury” is defined as any acute intracranial finding revealed on a CT image and attributable to acute head trauma. Because this classification corresponds directly to the need for CT imaging (patients with the suspected injury require this test, and the remaining ones do not), we label these two classes as CT = yes and CT = no respectively. The secondary classification indicates whether or not a neurologic intervention was needed, and hence, we refer to these two secondary classes as neurologic intervention = yes and neurologic intervention = no. It is important to note that records in the neurologic intervention = yes class form a subset of the CT = yes class. Retrospectively, the need for neurological intervention was defined in the CATCH data set by the death of the patient within a week after the head injury or by the need of any of the following procedures within the same time period: craniotomy, an elevation of skull fracture, intracranial pressure monitoring, or intubation for head injury (demonstrated on the CT image).

In order to assess the physician's perception on the use of a clinical decision rule for minor head trauma patients, Osmond conducted a survey among Canadian pediatric ED physicians to determine a clinically acceptable level of prediction performance, so that, ED physicians will be confident with the rule. Results of this survey (personal communication, 80% response rate) revealed that the detection of a serious intracranial lesion is important for clinicians. Consequently, the CATCH study targeted to achieve a sensitivity of 95% when predicting the need for CT imaging. For those patients who subsequently required neurologic intervention, the CATCH rule aimed for 98% sensitivity.

With these findings in mind, Osmond and colleagues set to create a decision rule that maximized sensitivity of prediction at an inherent cost to specificity. Using recursive partitioning they developed a rule that is clear and intuitive for ED physicians to apply for the identification of two levels of risk among children with minor head trauma. According to the derived rule, the CT decision is made through a stratified evaluation of a patient's risk factors, where the presence of any of these factors indicates the need for CT imaging to detect a serious injury. The structure of the CATCH rule is presented in Fig. 1. This rule can be interpreted as a disjunction of the risk factors as the rule's premise, and the decision to perform CT imaging as a conclusion. In case of the high risk factors, the conclusion also indicates that neurosurgical intervention is necessary. Osmond et al. evaluated the performance of the CATCH rule on 1000 bootstrapped tests and reported the sensitivity and the specificity of the high risk (top four factors in the CATCH rule) for neurologic intervention as 97.9% and 70.2% respectively. They also reported the sensitivity and the specificity of all risk factors for the need of CT imaging to detect any brain injury as 98.1% and 50.0% respectively.

We were granted a unique opportunity to work with CATCH data to develop a prediction model that indicates the need for CT imaging. Following the CATCH study, Osmond and colleagues have initiated a prospective evaluation of the CATCH rule in selected Canadian hospitals. For this evaluation, patient information was limited to 17 out of the original 26 attributes. In order to maintain compatibility and continuity of the CATCH study, we decided to use the same 17 attributes for the construction of our prediction models from the CATCH data. In this way, the model discussed in the paper can be tested again when prospectively collected data becomes available.

The decision of whether a minor head injury patient requires CT imaging is a binary classification problem. The objective is to distinguish between patients who require a CT scan (CT = yes) and those who do not (CT = no). Thus, our research question is: can a balanced (in terms of sensitivity and specificity) and well performing prediction model be automatically derived from the CATCH data? As a corollary to this question, we do not constrain the prediction model with respect to its interpretability and comprehensibility by non-computer science experts.

An argument for having a balanced prediction model relies on the need to mitigate long-term effects of ionizing radiation associated with the potential overuse of CT imaging that might occur when maximization of sensitivity drives model's development. We are aware that the CATCH rule developed in conjunction with physicians’ expertise according to a conservative approach is likely to outperform (in terms of sensitivity) an automatically constructed prediction model. However, we believe that such model may help in establishing reference performance indicators for the CATCH rule and estimating a trade-off between the sensitivity and specificity of prediction.

Additionally, we want to show how to automatically develop a prediction model from severely imbalanced data. This class imbalance situation is commonly encountered when analyzing clinical data where the population of patients with an acute health condition is usually significantly smaller than the population of relatively healthy ones. Our research demonstrates that well-performing model can be developed by utilizing data under-sampling when constructing an ensemble prediction classifier composed of multiple Naive Bayes (NB) classifiers.

While the CATCH study explicitly identifies a high-risk subgroup of those patients who need neurologic intervention (neurologic intervention = yes class), we do not make this distinction, and therefore, we do not consider maximal sensitivity of prediction for this group to be a driving objective for the development of our model. However, for the purpose of consistency with Osmond's study, we report separately, the model's performance for patients in the neurologic intervention = yes class (i.e., the high risk patients according to the CATCH rule).

The paper is organized as follows. In the next section, we present related research on applying data mining techniques to clinical problems. Section 3 describes the data set used in this research, briefly characterizes data mining methods selected for the study, and reviews the experimental design. Section 4 presents experimental results, and the last section concludes with a discussion.

Section snippets

Related research

Data mining techniques allow for the development of sophisticated prediction models capable of analyzing high-dimensional data [8] without relying on domain expertise during the model development process. Techniques that are suitable for medical domains are discussed and summarized in [9] – they include rule and decision tree induction, instance-based learning, Bayesian classification, and inductive logic programming.

Clinical data that describes a specific patient condition or disease poses

Data set

Attributes describing the CATCH data, which were used in our analysis, are listed in Table 1. We applied an automatic approach to discretizing values for Age and with the aid of a clinical expert we discretized values for VomitNum. Both discretizations were verified and approved by physicians involved in the CATCH study. To replicate the CATCH study design, we imputed missing attribute values with clinically reasonable values (they usually corresponded to a negative answer, e.g., no or none in

Evaluation of the E-NB model

Table 3 contains evaluations of E-NB and the three other ensemble models. It shows the mean and standard deviations of the evaluation measures as well as their confidence intervals (CI) with 95% confidence.

The E-NB model outperformed other models in terms of AUC values, thus it demonstrated the best capability to separate decision classes, and all differences between AUC values were statistically significant. E-NB was also best in terms of G-mean, only the E-TB model achieved comparable value –

Discussion

The decision to order a diagnostic test and the timing of this test are two important facets of medical decision-making. The CATCH rule was developed to help identify children with minor head injury who require CT imaging. It was created from prospectively collected data and designed to eliminate the false negatives for critical patients who require neurologic intervention. Therefore, the rule's performance is characterized by an almost perfect sensitivity at a cost of low specificity.

Having

Conflict of interest statement

No conflicts of interest exist.

Acknowledgements

The authors would like to thank Terry P. Klassen MD, George A. Wells PhD, Rhonda Correll RN, Anna Jarvis MD, Gary Joubert MD, Benoit Bailey MD, Laurel Chauvin-Kimoff MD CM, Martin Pusic MD, Don McConnell MD, Cheri Nijssen-Jordan MD, Norm Silver MD, Brett Taylor MD, Ian G. Stiell MD; of the Pediatric Emergency Research Canada (PERC) Head Injury Study Group for providing access to the CATCH data.

The current version of the paper benefited from the insightful comments of the reviewers.

This research

References (45)

N. Lavrac
Selected techniques for data mining in medicine
Artif Intell Med
(1999)
K.J. Cios et al.
Uniqueness of medical data mining
Artif Intell Med
(2002)
W.W. Cohen
Fast effective rule induction
M. Smits et al.
Minor head injury: CT-based strategies for management – a cost-effectiveness analysis
Radiology
(2010)
D.J. Brenner et al.
Computed tomography – an increasing source of radiation exposure
N Engl J Med
(2007)
P.J. Bairstow et al.
Reducing inappropriate diagnostic practice through education and decision support
Int J Qual Health Care
(2010)
J.S. Broder
CT utilization: the emergency department perspective
Pediatr Radiol
(2008)
F. Rivara et al.
Poor prediction of positive computed tomographic scans by clinical criteria in symptomatic pediatric head trauma
Pediatrics
(1987)
M.H. Osmond et al.
CATCH: a clinical decision rule for the use of computed tomography in children with minor head injury
CMAJ
(2010)
T.P. Klassen et al.
Variation in utilization of computed tomography scanning for the investigation of minor head trauma in childr a Canadian experience
Acad Emerg Med
(2000)

P. Sajda

Machine learning for detection and diagnosis of disease

Annu Rev Biomed Eng

(2006)

C. Drummond et al.

Severe class imbalance: why better algorithms aren’t the answer

H. He et al.

Learning from imbalanced data

IEEE Trans Knowl Data Eng

(2009)

N.V. Chawla

Data mining for imbalanced data sets: an overview

C. Drummond et al.

C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling

N.V. Chawla et al.

SMOTE: synthetic minority over-sampling technique

J Artif Intell Res

(2002)

M. Kubat et al.

Addressing the curse of imbalanced training sets: one-sided selection

H. Han et al.

Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning

V. Garcia et al.

The class imbalance problem in pattern classification and learning

R.E. Schapire

The boosting approach to machine learning: an overview

C. Drummond et al.

Exploiting the cost (in)sensitivity of decision tree splitting criteria

W. Fan et al.

AdaCost: misclassification cost-sensitive boosting

Cited by (23)

Deep Neural Networks Predict the Need for CT in Pediatric Mild Traumatic Brain Injury: A Corroboration of the PECARN Rule
2022, Journal of the American College of Radiology
Citation Excerpt :
The DANN model was trained with the 15 predictors used to drive the PECARN-A and PECARN-B rules for children younger than 18 years. The performance was superior to that of state-of-the-art models that replicated such clinical rules in terms of balanced sensitivity of 98.6% and specificity of 99.7% [26,27]. Table 10 lists the limited studies that have applied ML to mimic the developed mTBI clinical decision rules and predict the need for CT scans in different populations.
Only 10% of CT scans unveil positive findings in mild traumatic brain injury, raising concerns of its overuse in this population. A number of clinical rules have been developed to address this issue, but they still suffer limitations in their specificity. Machine learning models have been applied in limited studies to mimic clinical rules; however, further improvement in terms of balanced sensitivity and specificity is still needed. In this work, the authors applied a deep artificial neural networks (DANN) model and an instance hardness threshold algorithm to reproduce the Pediatric Emergency Care Applied Research Network (PECARN) clinical rule in a pediatric population collected as a part of the PECARN study between 2004 and 2006.
The DANN model was applied using 14,983 patients younger than 18 years with Glasgow Coma Scale scores ≥ 14 who had head CT reports. The clinical features of the PECARN rules, PECARN-A (group A, age < 2 years) and PECARN-B (group B, age ≤ 2 years), were used to directly evaluate the model. The average accuracy, sensitivity, precision, and specificity were calculated by comparing the model’s prediction outcome to that reported by the PECARN investigators. The instance hardness threshold and DANN model were applied to predict the need for CT in pediatric patients using 5-fold cross-validation.
In the first phase, the DANN model resulted in 98.6% sensitivity and 99.7% specificity for predicting the need for CT using the predictors of the two PECARN clinical rules combined to train the model. In the second phase, the DANN model was superior to both the PECARN-A and PECARN-B rules using the predictors for each age group separately to train the model. Compared with the clinical rule, for group A, the model achieved an average sensitivity (93.7% versus 100%) and specificity (97.5% versus 53.6%); for group B, the average sensitivity of the model was 99.2% versus 98.6%, and the specificity was 98.8% versus 58.2%.
In this study, a DANN model achieved comparable sensitivity and outstanding specificity for replicating the PECARN clinical rule and predicting the need for CT in pediatric patients after mild traumatic brain injury compared with the original statistically derived clinical rule.
The detection of mild traumatic brain injury in paediatrics using artificial neural networks
2021, Computers in Biology and Medicine
Head computed tomography (CT) is the gold standard in emergency departments (EDs) to evaluate mild traumatic brain injury (mTBI) patients, especially for paediatrics. Data-driven models for successfully classifying head CT scans that have mTBI will be valuable in terms of timeliness and cost-effectiveness for TBI diagnosis. This study applied two different machine learning (ML) models to diagnose mTBI in a paediatric population collected as part of the paediatric emergency care applied research network (PECARN) study between 2004 and 2006. The models were conducted using 15,271 patients under the age of 18 years with mTBI and had a head CT report. In the conventional model, random forest (RF) ranked the features to reduce data dimensionality and the top ranked features were used to train a shallow artificial neural network (ANN) model. In the second model, a deep ANN applied to classify positive and negative mTBI patients using the entirety of the features available. The dataset was divided into two subsets: 80% for training and 20% for testing using five-fold cross-validation. Accuracy, sensitivity, precision, and specificity were calculated by comparing the model's prediction outcome to the actual diagnosis for each patient. RF ranked ten clinical demographic features and twelve CT-findings; the hybrid RF-ANN model achieved an average specificity of 99.96%, sensitivity of 95.98%, precision of 99.25%, and accuracy of 99.74% in identifying positive mTBI from negative mTBI subjects. The deep ANN proved its ability to carry out the task efficiently with an average specificity of 99.9%, sensitivity of 99.2%, precision of 99.9%, and accuracy of 99.9%. The performance of the two proposed models demonstrated the feasibility of using ANN to diagnose mTBI in a paediatric population. This is the first study to investigate deep ANN in a paediatric cohort with mTBI using clinical and non-imaging data and diagnose mTBI with balanced sensitivity and specificity using shallow and deep ML models. This method, if validated, would have the potential to reduce the burden of TBI evaluation in EDs and aide clinicians in the decision-making process.
Knowledge discovery in medicine: Current issue and future trend
2014, Expert Systems with Applications
Citation Excerpt :
This was done based on decision tree, multilayer perceptron and general regression neural network. Naïve bayes can be used for predict the need for CT scanning in children with minor head injuries (Klement et al., 2012). Crash scene data can be used to evaluate severe and vehicular injuries by CART algorithm.
Data mining is a powerful method to extract knowledge from data. Raw data faces various challenges that make traditional method improper for knowledge extraction. Data mining is supposed to be able to handle various data types in all formats. Relevance of this paper is emphasized by the fact that data mining is an object of research in different areas. In this paper, we review previous works in the context of knowledge extraction from medical data. The main idea in this paper is to describe key papers and provide some guidelines to help medical practitioners. Medical data mining is a multidisciplinary field with contribution of medicine and data mining. Due to this fact, previous works should be classified to cover all users’ requirements from various fields. Because of this, we have studied papers with the aim of extracting knowledge from structural medical data published between 1999 and 2013. We clarify medical data mining and its main goals. Therefore, each paper is studied based on the six medical tasks: screening, diagnosis, treatment, prognosis, monitoring and management. In each task, five data mining approaches are considered: classification, regression, clustering, association and hybrid. At the end of each task, a brief summarization and discussion are stated. A standard framework according to CRISP-DM is additionally adapted to manage all activities. As a discussion, current issue and future trend are mentioned. The amount of the works published in this scope is substantial and it is impossible to discuss all of them on a single work. We hope this paper will make it possible to explore previous works and identify interesting areas for future research.
Influence of data discretization on efficiency of Bayesian classifier for authorship attribution
2014, Procedia Computer Science
Authorship attribution is one of the research areas in data mining domain and various methods can be employed for performing that task. The paper presents results of research on influence of data discretization on efficiency of Naive Bayes classifier. The analysis has been carried on datasets founded on texts of two male and two female authors using the WEKA data mining software framework. The binary classification was performed separately for both datasets for wide range of parameters of discretization process in order to investigate dependency between ways of discretization and quality of classification using Naive Bayes method. The numerical results of tests have been compared and discussed and some observations and conclusions formulated.
Knowledge Discovery in Biomedical Data: Theory and Methods
2013, Methods in Biomedical Informatics: A Pragmatic Approach
As increasing amounts and types of biomedical data are collected, there is a corresponding need to identify patterns in these data to assist researchers, to monitor the health of individuals and populations, and for administrative purposes. The identification of these patterns is often referred to as “data mining,” or more broadly, “knowledge discovery in databases,” or KDD. KDD is an evolutionary process, with its own lifecycle and is a means to an end, not an end in and of itself. Numerous tools exist for this endeavor, many coming from statistics and machine learning. Statistical methods include descriptive statistics as well as multivariable models and statistical classifiers. Machine learning methods include decision tree induction, rule discovery algorithms, and naturally-inspired methods such as those used in the field of evolutionary computation. In addition, the importance of domain experts in the KDD process cannot be overestimated.
Mild head injury in pediatrics: Algorithms for management in the ED and in young athletes
2013, American Journal of Emergency Medicine
Citation Excerpt :
However, the rate of cranial CT is still around 30% [2] and maybe even higher in certain populations or practice settings [48]. Indiscriminate use of cranial CT should not be substituted for an experienced physician's clinical judgment [49]. The rate of sport-related minor head injuries or concussions is underestimated because they often go unreported and are not recognized by young athletes, and often only concussions with loss of consciousness are regarded as significant [8].
Mild head injury is of interest because of a history of under diagnosis and underestimated clinical importance. Half of the patients with mild head injuries or concussions have sport-related injuries. Knowledge of symptoms and appropriate management can be improved and is a matter of practical interest. Several algorithms exist for discharge, admission or for cranial computed tomography (CT).These employ different risk factors and calculate their sensitivity of correctly identifying children with traumatic brain injury (TBI). In contrast, a multicenter, prospective study in the United States developed a prediction model to diagnose the absence of intracranial injury when certain symptoms are missing (negative prediction value).
An acute concussion presents with a combination of physical, cognitive, and emotional symptoms, which are usually self-limited. In young athletes, a second impact before full recovery from the first may have deleterious consequences and should be avoided by strict „return to play" rules. Recent research suggests that repetitive minor hits may cause delayed brain damage (dementia pugilistica, "punch-drunk syndrome"). A link to neurodegenerative diseases such as dementia, Alzheimer's disease and parkinsonism (tauopathies) is described by amyloid β plaques in the brain of such patients. A genetic predisposition (apolipoprotein) is discussed.
This review focuses on the rules attempting to determine the need for cranial CT in the emergency department and the impact of mild head injuries in young athletes. We describe in detail standardized guidelines for appropriate diagnosis and treatment and discuss the association between repetitive minor injuries and chronic traumatic encephalopathy and neurodegenerative diseases.

View all citing articles on Scopus

View full text

Predicting the need for CT imaging in children with minor head injury using an ensemble of Naive Bayes classifiers

Abstract

Objective

Methods and materials

Results

Conclusion

Introduction

Section snippets

Related research

Data set

Evaluation of the E-NB model

Discussion

Conflict of interest statement

Acknowledgements

Artif Intell Med

Artif Intell Med

Minor head injury: CT-based strategies for management – a cost-effectiveness analysis

Radiology

Computed tomography – an increasing source of radiation exposure

N Engl J Med

Reducing inappropriate diagnostic practice through education and decision support

Int J Qual Health Care

CT utilization: the emergency department perspective

Pediatr Radiol

Poor prediction of positive computed tomographic scans by clinical criteria in symptomatic pediatric head trauma

Pediatrics

CATCH: a clinical decision rule for the use of computed tomography in children with minor head injury

CMAJ

Variation in utilization of computed tomography scanning for the investigation of minor head trauma in childr a Canadian experience

Acad Emerg Med

Machine learning for detection and diagnosis of disease

Annu Rev Biomed Eng

Severe class imbalance: why better algorithms aren’t the answer

Learning from imbalanced data

IEEE Trans Knowl Data Eng

Data mining for imbalanced data sets: an overview

C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling

SMOTE: synthetic minority over-sampling technique

J Artif Intell Res

Addressing the curse of imbalanced training sets: one-sided selection

Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning

The class imbalance problem in pattern classification and learning

The boosting approach to machine learning: an overview

Exploiting the cost (in)sensitivity of decision tree splitting criteria

AdaCost: misclassification cost-sensitive boosting