Mining association rules with improved semantics in medical databases

doi:10.1016/S0933-3657(00)00092-0

Artificial Intelligence in Medicine

Volume 21, Issues 1–3, January–March 2001, Pages 241-245

https://doi.org/10.1016/S0933-3657(00)00092-0 Get rights and content

Abstract

The discovery of new knowledge by mining medical databases is crucial in order to make an effective use of stored data, enhancing patient management tasks. One of the main objectives of data mining methods is to provide a clear and understandable description of patterns held in data. We introduce a new approach to find association rules among quantitative values in relational databases. The semantics of such rules are improved by introducing imprecise terms in both the antecedent and the consequent, as these terms are the most commonly used in human conversation and reasoning. The terms are modeled by means of fuzzy sets defined in the appropriate domains. However, the mining task is performed on the precise data. These “fuzzy association rules” are more informative than rules relating precise values. We also introduce a new measure of accuracy, based on Shortliffe and Buchanan’s certainty factors [Shortliffe E, Buchanan B. Math Biosci 1975;23:351–79]. Also, the semantics of the usual measure of usefulness of an association rule, called support are discussed and some new criteria are introduced. Our new measures have been shown to be more understandable and appropriate than ordinary ones. Several experiments on large medical databases show that our new approach can provide useful knowledge with better semantics in this field.

Section snippets

Application domain

Nowadays, data stored in medical databases are growing in an increasingly rapid way. Analyzing that data is crucial for medical decision making and management. It has been widely recognized that medical data analysis can lead to an enhancement of health care by improving the performance of patient management tasks [6], [7]. There are two main aspects that define the need for medical data analysis [6].

•
Support of specific knowledge-based problem solving activities through the analysis of patients

Problem statement

There is an increasing interest in finding association rules among values of quantitative attributes in relational databases [11], as these kind of attributes are rather frequent. Quantitative attributes are those whose domain contain many precise values. Medical databases are used to store a big amount of quantitative attributes. But in common conversation and reasoning, humans employ rules relating imprecise terms rather than precise values. For instance, a physician will find more

Applied methods

We have employed several techniques in order to reach our goal.

1.
One of the best tools to represent linguistic imprecise terms with clear semantic content is the theory of fuzzy sets. By using this theory, the meaning of imprecise terms can be modeled by means of fuzzy sets in the appropriate domain. For example, a possible representation of imprecise terms related to the “Age”, by means of fuzzy sets, is shown in Fig. 1. Fig. 2 shows a set of imprecise terms for the “Hour”. The definition of the

Results

We have performed several experiments on large medical databases obtained from the University Hospital of Granada, specifically the relations URGENCY and SURGICAL OPERATIONS, containing 81,368 and 15,766 tuples, respectively. We show in [8] that fuzzy association rules allow us to (a) obtain rules with better semantics, and (b) obtain rules with enough support among quantitative attributes (otherwise, the high number of distinct and precise values makes the support of rules relating values of

Outlook

At this moment we are about to start the analysis of new medical databases obtained from several health services of Granada. From a theoretical point of view, we are concerned with the study of fuzzy hierarchies to find association rules with different granularity (precision) levels. Also, we are studying the unification of approximate dependencies (functional dependencies with a few exceptions) and fuzzy functional dependencies, in order to obtain “almost functional dependencies” described by

References (12)

M. Delgado et al.
Fuzzy cardinality based evaluation of quantified sentences
Int. J. Approx. Reasoning
(2000)
E. Shortliffe et al.
A model of inexact resoning in medicine
Math. Biosci.
(1975)
L.A. Zadeh
A computational approach to fuzzy quantifiers in natural languages
Comput. Math. Appl.
(1983)
Agrawal R, Imielinski T, Swami A. Mining association rules between sets of items in large databases. In: Proceedings of...
S. Brin et al.
Dynamic itemset counting and implication rules for market basket data
SIGMOD Record
(1997)
Delgado M, Sánchez D, Vila MA. Acquisition of fuzzy association rules from medical data. In: Barro S, Marı́n R,...

There are more references available in the full text version of this article.

Cited by (62)

Identifying risk factors for adverse diseases using dynamic rare association rule mining
2018, Expert Systems with Applications
Citation Excerpt :
Computational intelligence techniques have gained much importance of let in identification of risk factors for harmful diseases (Alizadehsani et al., 2013; Anooj, 2012; Nahar et al., 2013; Nahato et al., 2015). One of the commonly used techniques for disease diagnosis is association rule mining (Delgado, SáNchez, MartıN-Bautista, & Vila, 2001; Ordonez, 2006a; Ordonez et al., 2006). Association rule mining has several applications in medical domain including disease co-occurrence detection (Cao, Mamoulis, & Cheung, 2005), discovering adverse drug reactions (Wang et al., 2012), identifying risk factors for heart disease (Nahar et al., 2013) and public health surveillance (Mullins et al., 2006).
The increase in mortality rate due to life-threatening diseases has become an issue of concern in today’s world. Early detection and diagnosis of diseases thus becomes necessary to reduce the severity of their side effects. Computational intelligence techniques like rare association rule mining can be extensively used for the analysis of diseases. This paper introduces an efficient technique to identify the symptoms and risk factors for three adverse diseases: cardiovascular disease, hepatitis and breast cancer, in terms of rare association rules. Existing research on rare association rule mining is based on the notion that the entire data to be operated on is available at the onset of the mining process. The medical databases in practice may get modified over time due to the addition of new records or deletion of previous records. Moreover, the user may switch to a new threshold for generating the desired set of rare association rules when the database gets updated. A straightforward yet incompetent solution for generating the current set of rare association rules would be to re-execute the entire mining algorithm from scratch, for each modified bunch of data and updated threshold. The algorithm proposed in this study is capable of generating the new set of rare association rules from updated medical databases in a single database scan without re-executing the entire mining process. It can efficiently handle the cases of transaction insertion and deletion and also provides flexibility to the user to generate the new set of rare association rules when threshold is updated. Experimental analysis illustrates the significance of proposed approach over traditional approach of repeatedly mining the entire updated database.
Fuzzy quantification: A state of the art
2014, Fuzzy Sets and Systems
Quantified sentences are a very powerful notion for modelling statements in Natural Language (NL), but in practice they have been used to solve several problems. This paper is intended to offer a global view of the development on this branch until now, focusing in the different approaches dealing with quantification, specially those involving imprecision, called fuzzy quantification. We put attention to the different mechanisms for defining them, the evaluation methods for measuring their fulfilment, as well as the properties they should satisfy.
Compass: A hybrid method for clinical and biobank data mining
2014, Journal of Biomedical Informatics
Citation Excerpt :
The various types of existing AM methods typically address relatively simple dichotomous data sets containing only 1s and 0s. Applying AM to mine other types of data, such as clinical data, has been done in several, previous studies [6]. These approaches utilized the support measure to control the size and shape of the search space of associations.
We describe a new method for identification of confident associations within large clinical data sets. The method is a hybrid of two existing methods; Self-Organizing Maps and Association Mining. We utilize Self-Organizing Maps as the initial step to reduce the search space, and then apply Association Mining in order to find association rules. We demonstrate that this procedure has a number of advantages compared to traditional Association Mining; it allows for handling numerical variables without a priori binning and is able to generate variable groups which act as “hotspots” for statistically significant associations. We showcase the method on infertility-related data from Danish military conscripts. The clinical data we analyzed contained both categorical type questionnaire data and continuous variables generated from biological measurements, including missing values. From this data set, we successfully generated a number of interesting association rules, which relate an observation with a specific consequence and the p-value for that finding. Additionally, we demonstrate that the method can be used on non-clinical data containing chemical–disease associations in order to find associations between different phenotypes, such as prostate cancer and breast cancer.
A preclustering-based ensemble learning technique for acute appendicitis diagnoses
2013, Artificial Intelligence in Medicine
Citation Excerpt :
Alternatively, considering the close association with important patient symptoms and laboratory results, acute appendicitis diagnoses could be supported by a data-driven approach, which is particularly appealing considering the limited patient data and laboratory results available to healthcare organizations. The use of data mining seems promising in this context, because it could reduce the likelihood of misdiagnoses and avoid unnecessary surgical procedures or corrective therapeutic treatments [2–4,17,22,33]. Previous studies suggest the clinical value of data mining for clinical decision support [6,8,9,22,29]; however, a fundamental challenge remains in the form of the skewed outcome class distribution among the instances in a training sample.
Acute appendicitis is a common medical condition, whose effective, timely diagnosis can be difficult. A missed diagnosis not only puts the patient in danger but also requires additional resources for corrective treatments. An acute appendicitis diagnosis constitutes a classification problem, for which a further fundamental challenge pertains to the skewed outcome class distribution of instances in the training sample. A preclustering-based ensemble learning (PEL) technique aims to address the associated imbalanced sample learning problems and thereby support the timely, accurate diagnosis of acute appendicitis.
The proposed PEL technique employs undersampling to reduce the number of majority-class instances in a training sample, uses preclustering to group similar majority-class instances into multiple groups, and selects from each group representative instances to create more balanced samples. The PEL technique thereby reduces potential information loss from random undersampling. It also takes advantage of ensemble learning to improve performance. We empirically evaluate this proposed technique with 574 clinical cases obtained from a comprehensive tertiary hospital in southern Taiwan, using several prevalent techniques and a salient scoring system as benchmarks.
The comparative results show that PEL is more effective and less biased than any benchmarks. The proposed PEL technique seems more sensitive to identifying positive acute appendicitis than the commonly used Alvarado scoring system and exhibits higher specificity in identifying negative acute appendicitis. In addition, the sensitivity and specificity values of PEL appear higher than those of the investigated benchmarks that follow the resampling approach. Our analysis suggests PEL benefits from the more representative majority-class instances in the training sample. According to our overall evaluation results, PEL records the best overall performance, and its area under the curve measure reaches 0.619.
The PEL technique is capable of addressing imbalanced sample learning associated with acute appendicitis diagnosis. Our evaluation results suggest PEL is less biased toward a positive or negative class than the investigated benchmark techniques. In addition, our results indicate the overall effectiveness of the proposed technique, compared with prevalent scoring systems or salient classification techniques that follow the resampling approach.
Uncertainty in Automated Ontology Matching: Lessons Learned from an Empirical Experimentation
2023, arXiv
Discovering associations between radiological features and COVID-19 patients' deterioration
2023, Health Science Reports

View all citing articles on Scopus

View full text

Mining association rules with improved semantics in medical databases

Abstract

Section snippets

Application domain

Problem statement

Applied methods

Results

Outlook

Int. J. Approx. Reasoning

Math. Biosci.

Comput. Math. Appl.

Dynamic itemset counting and implication rules for market basket data

SIGMOD Record