Particle Swarm Optimisation-Support Vector Machine Optimised by Association Rules for Detecting Factors Inducing Heart Diseases

Lu Wei-Jia; Ma Liang; Chen Hao

doi:10.1515/jisys-2016-0014

Open Access Published by De Gruyter August 17, 2016

Particle Swarm Optimisation-Support Vector Machine Optimised by Association Rules for Detecting Factors Inducing Heart Diseases

Lu Wei-Jia , Ma Liang and Chen Hao

From the journal Journal of Intelligent Systems

https://doi.org/10.1515/jisys-2016-0014

Abstract

Existing systems for diagnosing heart diseases are time consuming, expensive, and error prone. Aiming at this, a detection algorithm for factors inducing heart diseases based on a particle swarm optimisation-support vector machine (PSO-SVM) optimised by association rules (ARs) was proposed. Firstly, AR was used to select features from a disease data set so as to train feature sets. Then, PSO-SVM was used to classify training and testing sets, and then the factors inducing heart diseases were analysed. Finally, the effectiveness and reliability of the proposed algorithm was verified by experiments on the UCI Cleveland data set with confidence as the index. The experimental results showed that females have less risk of having a heart attack than males. Irrespective of gender, once diagnosed with chest pain without symptoms and angina caused by exercise, people are more likely to suffer from heart disease. Moreover, compared with another two advanced classification algorithms, the proposed algorithm showed better classification performance and therefore can be used as a powerful tool to help doctors diagnose and treat heart diseases.

Keywords: Association rule mining; heart disease; detection of inducing factors; Cleveland data; support vector machine; particle swarm optimisation

1 Introduction

Throughout history, humans have been significantly influenced by life-threatening diseases. Of these, heart diseases have aroused much attention among medical researchers [1, 4, 11, 32]. It is known that preferable diagnosis of heart diseases mainly involves the etiologic diagnosis, pathological diagnosis, and the diagnosis of pathophysiology and cardiac function. For heart diseases, existing diagnostic systems are slow, costly, and error prone. In addition, patients with heart diseases are required to undergo continuous observation. What is worse, improper treatment can be fatal. Therefore, correct diagnosis and early treatment are essential. It is important to develop an effective, reliable diagnostic system for heart diseases [13, 28].

Recently, the concept of computational intelligence has been widely applied to find the relationship between different diseases and patient attributes. For instance, one study [2] proposed a simple electrocardiogram (ECG) measurement approach to calculate the parameters including the waves P, T, U, and QRS of an ECG using automated software. This method is conducive to the diagnosis of heart diseases; however, it can merely calculate parameters and fails to realise automation of diagnosis and treatment. For instance, an intelligent system that uses a genetic support vector machine (SVM) to classify heart valve condition based on the signals from a Doppler echocardiogram was proposed [2]. A computer-aided diagnosis prototype, called the intelligent prediction system for heart diseases, was presented [20] to predict heart diseases by using multiple data mining technologies. An adaptive neutral network was used to classify multichannel ECG signals [3, 18]. Previous studies [15, 22] proposed cardiac resynchronisation therapy, which can largely improve the incidence and mortality of patients with serious heart failure. Moreover, they also proposed a novel ultrasound cardiogram (UCG) method to detect dyssynchrony; however, these methods are quite complex. A neural network was used to analyse the ultrasonic images of heart diseases [15, 22], and a statistical analysis system software-based method for diagnosing heart diseases was proposed [5, 16]. The records of patients diagnosed with severe hypertension were analysed using association rule (AR) mining (ARM) technology [10, 27]; this research indicated that severe hypertension has a close relationship with non-insulin-dependent diabetes mellitus and cerebral infarction. Some researchers used various exclusive data sets, whereas others used the publically available Cleveland data set [26]. For instance, databases were applied to determine the effectiveness of certain learning algorithms using examples, and the risk of coronary disease was diagnosed using probabilistic algorithms. Meanwhile, the comparison of globally evolutionary computations and classifiers for comprehensively improving diagnosis results were also included. Although scholars have noted the influence of different risk factors and the causes of heart diseases, the difference between the influences of these factors on males and females have not been clarified [12].

Based on the above analysis, this research proposes a detection algorithm for factors inducing heart disease based on a particle swarm optimisation (PSO)-SVM optimised by ARs. By taking factors including gender into consideration, three different ARM algorithms, including the Apriori algorithm [30, 34], Predictive Apriori algorithm [9, 21], and Tertius algorithm [7, 31, 33], were used to extract rules from heart disease data sets. Moreover, this research used the PSO-SVM to classify the training and testing sets, on which basis the factors inducing heart diseases were analysed. The experimental results based on the publically available Cleveland data set showed that this algorithm is effective and reliable as a diagnostic tool.

2 Data Sets

As previously mentioned, this study used the publically available Cleveland database on heart diseases taken from the UCI machine learning repository [26]. This database contains 76 attributes, among which the following 14 attributes closely related to heart diseases are commonly used:

Age.
Gender: male or female.
Type of chest pain: typical angina, non-typical angina (abnang), pain not related to angina (notang), and asymptomation (asympt).
Trestbps: a number representing the resting blood pressure of a patient on admission.
Chol: a number showing serum cholesterol level (mg/dl).
FBS: with two nominal values of true and false, which show whether fasting blood glucose is >120 mg/dl or not.
Restecg: normal (normality), abnormal (ABN), ST-T wave abnormality, and left ventricular hypertrophy (HYP). These four values present the results from static ECG.
Thalach: a number denoting the maximal heart rate.
Exang: with two nominal values (yes or no) used to show the existence of angina induced by exercise.
Oldpeak: ST segment depression caused by exercise compared with relaxation.
Slope: with three nominal values: rising wave, flat wave, and downward sloping wave, which show the slope characteristics of the ST-segment at peak exercise.
Ca: a value showing the number of colour vessels observed by fluoroscopy (0 to 3).
Thal: with three nominal values: normality, fixed defect, and reversible defect, which represent the heart state.
Classification attribute: an index whose value indicates whether a person is healthy or has been detected as having heart diseases. The types of disease include 1, 2, 3, and 4.

3 The Proposed Algorithm

This section describes the proposed AR-PSO-SVM model that is used to detect the factors inducing heart diseases, as shown in Figure 1. This model mainly aims to automatically optimise the accuracies of SVM classifiers through (i) reducing the number of features using AR and (ii) optimistically evaluating the parameters C and σ of SVMs using PSO.

Figure 1:

Detection of Heart Diseases Using AR-PSO-SVM. Cleveland Heart Disease Data Set.

3.1 Feature Selection Based on AR

AR was firstly used to select 14 features from the Cleveland database. During the selection process, only the major itemset for each kind of disease was searched and all the terms in these major itemsets played an important role in the subsequent classification. Therefore, this research just used these items to classify all patients. The major itemsets for various diseases were obtained by using the following algorithms:

Atherosclerosis
1-3-4-7-9-14
Coronary heart disease
2-4-5-7-8-9-11-14
Rheumatic heart disease
1-2-3-6-7-9-11-12-13
Congenital heart disease
2-4-6-8-10-11-13-14
Myocarditis
1-2-4-5-8-9-11-12-13
Angina
1-3-5-7-8-10-11-12
Arrhythmia
3-4-6-7-8-10-12-13

As mentioned above, atherosclerosis can be determined by using features 1, 3, 4, 7, 9, and 14; similarly, other diseases can also be determined based on the above list. Here, experiments were performed using three ARM algorithms, namely the Apriori, Predictive Apriori, and Tertius algorithms.

3.2 SVMs

An SVM [11] is a kind of supervised learning algorithm based on the statistical learning theory. It mainly aims to determine a hyperplane that can optimally segment two classes in the training set. It can realise an improvement of the generalisation ability of learning machines by structure risk minimisation, and the minimisation of empirical risks and confidence intervals. On this basis, it is able to exhibit preferable classification performance in the case of small sample sizes in statistical terms. In addition, SVM is capable of dealing with problems regarding the classification of high-dimensional data. Suppose {xi, yi}i=1N is a training set where x represents the input sampling value and y∈{+1, –1} shows the class label: the hyperplane is defined as w·x+b=0. Here, x denotes the points located on the hyperplane, w determines the direction of the hyperplane, and b represents the deviation of the distance from source to the hyperplane. To find the optimal hyperplane for separating classes, ‖w‖² is required to be minimal under the constraint condition y_i(w·x_i+b)≥1, where i=1, 2, …, n.

Therefore, the following optimisation problem must be solved:

min12∥w∥2

(1)yi(w · xi + b) ≥ 1, i = 1, 2, ..., n. (1)

By introducing the positively slack variable ξ_i into the above optimisation problem and permitting the variable to be extended using a non-linear decision surface, a new optimisation problem is presented as follows:

minw,ξ12∥w2∥ + c∑i=1Nξi

(2)yi(w ⋅ xi + b) ≥ 1 − ξi, ξi ≥ 0, i = 1, 2, ..., n. (2)

Formula (2) is a two programming problem. We used the Lagrange multiplier method to get the Lagrange function

(2.1)L(w, b, α, ξ, μ) = 12∥w∥2 + C∑i=1mξi+∑i=1mαi(1 − ξi − yi(wTxi + b))−∑i=1mμiξi. (2.1)

We used the L(w, b, α, ξ, μ) derivative to w, b, ξ_i, and let the derivatives equal to 0

(2.2)w = ∑i=1mαiyixi0 = ∑i=1mαiyiC = αi + μi. (2.2)

Further,

(2.3)max∑i=1mαi − 12∑i=1m∑j=1mαiαjyiyjxiTxjs.t.∑i=1mαiyi = 0. (2.3)

We used the sequential minimal optimization (SMO) algorithm to obtain the result. In SVM, the classified decision function is converted to

(3)f(x) = sign(∑i=1NαiyiK(xi, yi) + b), (3)

where α_i is the Lagrange multiplier; K(x_i, y_j)=φ(x_i)·φ(x_j) is an asymmetric positive kernel function, which maps data to high-dimensional spaces through use of a non-linear mapping function, φ(x). In this study, the radial basis function (RBF) is defined as K(x_i, x_j)=exp(–‖x_i–x_j‖²/2σ²), where σ is a positive real number and is used in the kernel function.

As presented above, SVMs are essentially binary classifiers. However, identification of many diseases is generally involved in the classification of heart diseases. To solve this problem, we can use multi-class classification strategies. In several recent studies, one-against-all is one of the most popular multi-class classification strategies. Let Ω={w₁, w₂, …, w₆} be the set of heart diseases expected to be classified. Firstly, a binary classification problem that is defined by one disease w_i(i=1, 2, …, 6) that is different from other diseases, for instance, Ω–{w₁}, is solved using a classifier. Then, during the classification process, patients are diagnosed based on the winner-takes-all principle. The detected disease corresponds to the SVM classifier with the highest output.

3.3 Particle Swarm Optimisation

PSO [24] is a stochastic, population-based optimisation technique introduced by Kennedy and Eberhart in 1995. It belongs to the family of swarm intelligence computational techniques and is inspired by social interactions in human beings and animals. Compared with other heuristics, it has fewer parameters to be tuned by the user. Its underlying concepts are very simple. Besides, its coding is very easy and provides fast convergence, and requires less computational burden in comparison with most other heuristics. Also, the behaviour of PSO is not highly affected by increases in dimensionality, and is efficient in tackling multi-objectives, multi-modalities, constraints, and discrete/integer variables.

In PSO, particles move towards the personally, and globally, optimal positions by updating on the following equations:

(4)vidt+1 = w∗vidt + C1 · rand.(pidk − xidt) + C2 · rand.(pgbestt − xidt), (4)

(5)xidt+1 = Vidt + xidt, (5)

where t represents the tth iteration; C₁ and C₂ are learning factors that are arbitrary positive numbers between 0 and 1, and that are normally distributed. In the literature, setting C₁=C₂=2 has been proposed as a generally acceptable setting for most problems (Ozcan and Mohan, 1999) [19] and is widely used in practical applications of PSO. w denotes the inertia weight coefficient. A comparative study has been implemented by Han et al. (2010) [8] on different types of inertia weight, and those linearly decreasing along with simulated annealing type are introduced as the best ones. Taking into account both simplicity and efficiency, linearly decreasing inertia weight is the most appropriate method for setting inertia weight and is normally used in PSO applications. The values v_max=0.9 and v_min=0.4 are widely accepted in the literature. x_id and v_id show the position and speed of particle i, respectively; p_id represents the personally optimal position of particle i; while p_gbest is one of the positions of the optimal particle among all particles.

We will introduce some existing research works about PSO, and point out that PSO parameters significantly affect its computational behaviour. Uy et al. (2007) [29] investigated the efficiency of some well-known randomised low-discrepancy sequences (Halton, Sobol, and Faure) for initialising PSO and came to a decision that initialisation with Sobol sequences perform best in PSO [24]. Han et al. (2010) [8] conducted a study about PSO performances with different types of inertia weight, and the study indicated that linearly decreasing inertia weights along with simulated annealing type are the best ones [24]. In the study by Ozcan and Mohan (1999) [19], C₁=C₂=2 has been proved as a generally acceptable setting for most problems, and Guo and Chen (2009) [6] proposed an adaptive variant that assigns specific inertia weight and social acceleration coefficient to each particle while the cognitive acceleration coefficient is kept constant [24].

To deal with discrete problems using PSO, Zhang and Xie used the rounding off method; that is, discrete/integer variables are rounded off to the nearest discrete/integer value during the course of optimisation, and this method does not affect the accuracy obviously [25]. Kennedy and Eberhart transformed the optimisation problem into a binary problem. They use the sigmoid function

s(Vid) = 11 + exp(−Vid),

then generated a random number r in [0,1] and updated the position of particle

Xid(t + 1) = {1(r < s(Vid))0(r ≥ s(Vid)).

Nema et al. (2008) [17] proposed a PSO variant that is hybridised with the branch and bound (BB) algorithm [25]. By this hybridisation, the variant has the advantages of PSO’s global search capability and that of BB’s rapid convergence capability. In the paper of Liu et al. (2009) [14], PSO’s conventional flight equations are modified

Xi1(t) = Xi(t) + αVi(t)Xi2(t) = βPMX[Xi1(t), Pi(t)]Xi1(t + 1) = γPMX[Xi2(t), Pli(t)],

where PMX and P_li represent partially matched crossover and particle i’s neighbourhood best. In the paper of Rezaee Jordehi [23], a novel adaptive PSO method is proposed by adding an adaptive mutation mechanism and a dynamic inertia weight into the conventional PSO method.

3.4 The Training Process

The training process of the proposed algorithm is as follows:

Three AR algorithms – Apriori, Predictive Apriori, and Tertius, are used to respectively screen the features relevant to heart diseases from Cleveland data sets. The data sets are then extracted based on the screened features.
The parameters for PSO are proposed for PSO-SVMs. Based on the parameters C and ξ_i of SVM, two-dimensional particle swarm is established and each particle in the swarm is then initialised.
The parameters C and ξ_i are used to calculate the w vector under the constraint of Formula (2); the solution involving the w vector is then substituted into Formula (3).
By integrating the real results of the training samples, classified/total (function M) is used as the fitness function in the PSO algorithm, where total and classified represent the numbers of training samples and accurately classified samples, respectively. If M(Xi)>M(Pi), the optimal position of each particle is updated as Xi; while if M(Pi)>M(Pg), the optimal global position is set as Pg=Pi. Formulae (4) and (5) are used to update the speed and position of each particle.
Steps 3 and 4 are repeated until the maximum number of iterations is reached, or the termination conditions are satisfied, to solve the optimised parameters C and ξ_i, as well as for the w vector.

For PSO, too few particles prompt the algorithm to get trapped in local optima, while too many particles slow down the algorithm. We will adjust the training data size based on the AR-selected properties.

4 Experimental Trials

Although the Cleveland database is commonly used to process classification problems, the authors believe that it is also feasible to use this database for knowledge extraction and to explore association mining rules. Therefore, two groups of experiments are conducted here: one to extract ARs based on health and illness rules, and, as gender has been found in the medical domain to be one of the important factors influencing heart disease, the other is based on gender to explore ARs. The experimental data consist of 300 samples having 14 features. The features are shown in Section 2 and further details are illustrated in Figure 1.

The following two sections detail these groups of experiments. In these experiments, the parameters for PSO-SVM are set as follows: the parameters C and σ of the RBF kernel function vary over the ranges (10⁻³, 200) and (10⁻³, 2) separately; meanwhile, the standard parameters in the PSO algorithm are chosen as follows: the swarm size, inertia coefficient, and maximum iteration times were S=50, respectively. Additionally, the acceleration constants C₁ and C₂ are both 2.

4.1 AR Mining Based on Health and Illness Rules

In the first group of experiments, the data are classified into two classes, namely, those of patients and those of healthy people. On this basis, three prevalent ARM algorithms, namely, the Apriori, Predictive Apriori, and Tertius algorithms, are used to select rules with confidence of >90%, confirmation degree of >99%, and precision of confirmation degree of >79%. If there are many such rules, this research merely considers those rules containing “illness” or “health” on their right-hand side (RHS); nevertheless, if no such rule is found, only the rules including “illness” or “health” on their left-hand side (LHS) are considered. The experimental results are shown in Tables 1–3.

Table 1:

Health and Illness Rules Extracted Using the Apriori Algorithm.

Algorithm	Rules	Time
Apriori-PSO-SVM	Health rules:	0
Apriori-PSO-SVM	Health rule 1: IF {gender=female∩angina induced by exercise=false∩number of colour vessels=0∩thal=normal}=>health(conf, 0.98)
	Health rule 2: IF {gender=female∩fasting blood glucose=false∩angina induced by exercise=false∩number of colour vessels=0}=>health(conf, 0.98)
	Health rule 3: IF {gender=female∩angina induced by exercise=false∩number of colour vessels=0}=>health(conf, 0.98)
	Health rule 4: IF {gender=female∩fasting blood glucose=false∩angina induced by exercise=false∩thal=normal}=>health(conf, 0.95)
	Health rule 5: IF {resting blood pressure<or=(115.2, 136.4]∩angina induced by exercise=false∩number of colour vessels=0∩thal=normal}=>health(conf, 0.94)
	Illness rules:
	Illness rule 1: IF {chest pain=asympt∩slope=falt∩thal=rev}=>illness(conf, 0.96)
	Illness rule 2: IF {chest pain=asympt∩angina induced by exercise=true∩thal=rev}=>illness(conf, 0.94)

Table 2:

Health and Illness Rules Extracted Using the Predictive Apriori Algorithm.

Algorithms	Rules	Time
Predictive Apriori-PSO-SVM	Health rules:	2′43″
Predictive Apriori-PSO-SVM	Health rule 1: IF {gender=female∩fasting blood glucose=false∩static electrocardiogram=normal∩angina induced by exercise=false∩thal=normal}=>health(acc, 0.9938)
	Health rule 2: IF {gender=female∩chest pain=notang∩thal=normal}=>health(acc, 0.9935)
	Health rule 3: IF {age=(48.2–57.8)∩the maximal heart rate=(149.6, 175.8]∩angina induced by exercise=false∩number of colour vessels=0}=>health(acc, 0.99314)
	Health rule 4: IF {gender=female∩chest pain=notang∩the maximal heart rate=(149.6, 175.8]}=>health(acc, 0.9921)
	Health rule 5: IF {age=(38.6–48.2)∩resting blood pressure=(115.2, 136.4)∩thal=normal}=>health(acc, 0.9918)
	Health rule 6: IF {gender=female∩angina induced by exercise=false∩number of colour vessels=0}=>health(acc, 0.9901)
	Illness rules:
	Illness rule 1: IF {age=(48.2, 57.8]∩slope=flat∩number of colour vessels=1}=>illness(acc, 0.9902)
	Illness rule 2: IF {the maximal heart rate=(123.4, 149.6]∩angina induced by exercise=true∩thal=rev}=>illness(acc, 0.9931)
	Illness rule 3: IF {gender=male∩chest pain=asympt∩number of colour vessels=2}=>illness(acc, 0.9915)
	Illness rule 4: IF {age=(57.8, 67.4]∩gender=male∩number of colour vessels=2}=>illness(acc, 0.9902)

Table 3:

Health and Illness Rules Extracted Using the Tertius Algorithm.

Algorithms	Rules	Time
Tertius-PSO-SVM	Health rules:	2′49, 672″
Tertius-PSO-SVM	Health rule 1: IF {chest pain=angina or cholesterol=(476.4, inf) or thal=normal}=>health(conf, 0.30)
	Health rule 2: IF {chest pain=angina or the maximal heart rate=(175.8, inf) or thal=normal}=>health(conf, 0.35)
	Health rule 3: IF {chest pain=abnang or the maximal heart rate=(175.8, inf) or thal=normal}=>health(conf, 0.40)
	Health rule 4: IF {cholesterol=(476.4, inf) or the maximal heart rate=(175.8, inf) or thal=normal}=>health(conf, 0.32)
	Health rule 5: IF {age=(67.4, inf) or chest pain=notang or number of colour vessels=0}=>health(conf, 0.85)
	Illness rules:
	Illness rule 1: IF {chest pain=asympt or resting blood pressure=(178.8, inf) or old peak=(2.48–3.72]}=>illness(conf, 0.97)
	Illness rule 2: IF {angina induced by exercise=true or old peak=(2.48, 3.72) or thal=rev}=>illness(conf, 1)
	Illness rule 3: IF {chest pain=asympt or the maximal heart rate=(inf, 97.2] or old peak=(2.48, 3.72]}=>illness(conf, 0.54)
	Illness rule 4: IF {old peak=(2.48, 3.72) or number of colour vessels=3 or thal=rev}=>illness(conf, 1)
	Illness rule 5: IF {chest pain=asympt or old peak=(2.48, 3.72) or thal=fix}=>illness(conf, 0.80)
	Illness rule 6: IF {angina induced by exercise=true or old peak=(3.72, 4.96) or thal=rev}=>illness(conf, 0.86)

As shown in Table 1, four-fifths of the “health” rules are obtained from females, indicating that females are less likely to suffer from coronary heart disease according to this database. In addition, angina (one type of chest pain) induced by exercise being false as an index indicating good health, has nothing to do with gender as these rules have appeared in all LHS rules with high confidence. Moreover, the number of colour vessels equalling zero and thal (a kind of heart state) being normal also represent good health. On the other hand, the mining of “illness” rules suggests that asymptomatic chest pain and thal, being reversible defects, show that a person may be ill; two rules with high confidence including these two factors are found in LHS.

As can be seen in Table 2, differing from the Apriori algorithm, which is based on confidence, the Predictive Apriori algorithm selects rules according to accuracy. Similar to the results obtained with the Apriori algorithm, most “health” rules are extracted from females. However, the factors in the LHS are quite different: angina induced by exercise being false and thal being normal are once again confirmed as indices of good health. Likewise, the range of the maximum heart rate being (149.6, 175.8), the number of colour vessels being zero, and chest pain not related to angina (notang) are proven to be factors indicating good health. Moreover, the “illness” rules, two of which are acquired from males, are obviously different: the slope of the ST segment being flat, age being >48 years, and the existence of colour vessels are shown to be risk indices for heart disease and these factors exist in at least two rules.

The data in Table 3 show that thal being normal is an indicator of good health, and other attributes including the maximum heart rate and cholesterol level are also factors influencing health. The results also indicate that old peak, which represents the ST segment depression caused by exercise, compared with relaxation, being >2.48 can make people susceptible to disease. Old peak appears in the RHS of all “illness” rules. Meanwhile, chest pain and angina induced by exercise are also factors inducing disease.

On the whole, the rules generated by these three algorithms all show that females are less susceptible to heart disease. This discovery is further studied in the next section. Among these three algorithms, the Apriori algorithm operates the fastest, and the rules generated by this algorithm form a pattern. Therefore, it is also used to explore ARs in the second group of experiments.

4.2 Comparison and Analysis

This section compares the classification performance of the proposed algorithm with those of other advanced algorithms, including the algorithm combining an artificial neutral network and fuzzy logic (ANN-FL) [15], that integrating ARs and a neural network (AR-NN) [5], and a feature selection-based SVM (FS-SVM) [27]. In addition, this work classifies seven diseases, namely, atherosclerosis, coronary heart disease, rheumatic heart disease, congenital heart disease, myocarditis, angina, and arrhythmia. The performance of the algorithms is determined by the calculated values of specificity, sensitivity, and overall classification accuracy. Thereinto, the specificity refers to the ratio of the number of true negative decisions to the number of actual negative cases; sensitivity means the ratio of the number of true positive decisions to that of the actual positive cases, while the overall classification accuracy is the ratio of the number of accurate decisions to the total number of cases. The results obtained by the proposed AR-PSO-SVM algorithm are acquired by averaging the statistical results of the three ARM algorithms. When the positive predictions of the algorithm are consistent with those of the doctors, they are proven to be correct positive decisions; when they are inconsistent with those of the doctors, they are shown to be true negative decisions. The statistical parameters for each classifier are summarised in Table 4.

Table 4:

Statistical Parameters of Each Classifier.

Classifiers	Diseases	Statistical parameters (%)			Time cost for classification (s)
Classifiers	Diseases	Sensitivity	Specificity	Overall classification accuracy	Time cost for classification (s)
AR-PSO-SVM	Atherosclerosis	100	100	98.96	94.983
	Coronary heart disease	100	100
	Rheumatic heart disease	97.22	100
	Congenital heart disease	100	92.59
	Myocarditis	96.15	100
	Angina	100	100
	Arrhythmia	98.46	99.24
ANN-FL	Atherosclerosis	96.42	100	97.39	130.872
	Coronary heart disease	100	100
	Rheumatic heart disease	97.22	94.59
	Congenital heart disease	96.00	96.00
	Myocarditis	96.15	100
	Angina	100	100
	Arrhythmia	97.63	97.09
AR-NN	Atherosclerosis	94.64	100	94.61	156.082
	Coronary heart disease	96.77	100
	Rheumatic heart disease	94.44	89.47
	Congenital	92	88.46
	Myocarditis	92.30	100
	Angina	100	100
	Arrhythmia	96.53	95.98
FS-SVM	Atherosclerosis	95.11	100	96.02	133.241
	Coronary heart disease	98.10	100
	Rheumatic heart disease	94.90	91.29
	Congenital heart disease	93.63	90.32
	Myocarditis	91.27	99.70
	Angina	100	100
	Arrhythmia	97.09	96.12

As presented in Table 4, these four algorithms show similar sensitivity and specificity for certain diseases. For instance, their specificities to coronary heart disease, their sensitivities and specificities to angina, and their specificities to atherosclerosis, are all 100%. However, on the whole, the sensitivity, specificity, and overall classification accuracy of the proposed algorithm are not lower than those of the other three algorithms. For example, in terms of rheumatic heart disease, the sensitivity and specificity of the proposed algorithm are 2.32% and 8.71%, higher than those of the FS-SVM algorithm, respectively.

By comparing the overall classification accuracies of these algorithms, it can be seen that the accuracy of the proposed AR-PSO-SVM is 1.67% higher than that of ANN-FL, 4.35% higher than that of AR-NN, and 2.94% higher than that of FS-SVM. In addition, in the comparison of the total time cost for classification, the proposed algorithm costs 36 s less than ANN-FL, 61 s shorter than AR-NN, and 39 s less than FS-SVM. When promising to improve sensitivity, specificity, and overall classification accuracy, the proposed algorithm can still significantly reduce the total time cost for classification; thereby, it is deemed to be both reliable and effective.

5 Conclusions

This research proposed a detection algorithm for factors inducing heart diseases based on PSO-SVM optimised by ARs. Then, experiments were conducted using a database of heart diseases using three different ARM algorithms so as to analyse the extracted and classified results. It can be found from the “health” rule set that gender is one of the factors influencing heart health. That is, males are more susceptible to coronary heart disease, which agrees with results in existing medical research. Here, the experimental results showed that both males and females are likely to suffer from heart disease according to factors such as asymptomatic chest pain and angina induced by exercise.

In the future, we will devise more suitable sets of PSO parameters so as to improve the accuracy of prediction of heart disease, and devise more efficient composite PSOs with low computational burden and high accuracy, so that we can find the key factors influencing diseases so as to help doctors diagnose and treat patients.

Bibliography

[1] H. D. Allen, D. J. Driscoll, R. E. Shaddy and T. F. Feltes, Moss & Adams’ Heart Disease in Infants, Children, and Adolescents: Including the Fetus and Young Adult, 8th ed., Wolters Kluwer Health, Philadelphia, 2013.Search in Google Scholar

[2] E. Avci, A new intelligent diagnosis system for the heart valve diseases by using genetic-SVM classifier, Expert Syst. Appl.36 (2009), 10618–10626.10.1016/j.eswa.2009.02.053Search in Google Scholar

[3] R. Ceylan, Y. Ozbay and B. Karlik, A novel approach for classification of ECG arrhythmias: type-2 fuzzy clustering neural network, Expert Syst. Appl.36 (2009), 6721–6726.10.1016/j.eswa.2008.08.028Search in Google Scholar

[4] C. W. Chan, V. Lopez and J. W. Y. Chung, A qualitative study of the perceptions of coronary heart disease among Hong Kong Chinese people, J. Clin. Nurs.20 (2011), 1151–1159.10.1111/j.1365-2702.2010.03526.xSearch in Google Scholar PubMed

[5] S. Gupta, R. Aroni and S. Lockwood, South Asians and Anglo Australians with heart disease in Australia, Aust. Health Rev.4 (2015), 568–576.10.1071/AH14254Search in Google Scholar PubMed

[6] L. Guo, and X. Chen, A novel particle swarm optimization based on the self-adaptation strategy of acceleration coefficients. in: Proceedings of International Conference on Computational Intelligence and Security, CIS ’09, Beijing, China. New York, NY, USA: IEEE, 2009.Search in Google Scholar

[7] R. Gurgel-Goncalves and C. A. C. Cuba, Infestation of thornbird nests (Passeriformes: Furnariidae) by Psammolestes tertius (Hemiptera: Reduviidae) across Brazilian Cerrado and Caatinga ecoregions, Zoologia28 (2011), 411–414.10.1590/S1984-46702011000300017Search in Google Scholar

[8] W. Han, P. Yang, H. Ren and J. Sun, Comparison study of several kinds of inertia weight for PSO[C]. in: Proceedings of the IEEE international conference on progress in informatics and computing, pp. 280–284, Washington DC: IEEE Computer Society, 2010.Search in Google Scholar

[9] M. Karabatak and M. C. Ince, An expert system for detection of breast cancer based on association rules and neural network, Expert Syst. Appl.36 (2009), 3465–3469.10.1016/j.eswa.2008.02.064Search in Google Scholar

[10] S. H. Khan, F. A. Khan, A. Ijaz, A. Sattar, M. Dilawar and R. Hashim, Hypertension and metabolic syndrome: impact of clustering of hypertension in subjects with metabolic syndrome, Pak. J. Med. Sci.23 (2007), 903–908.Search in Google Scholar

[11] M. Kruse, M. Davidsen, M. Madsen, D. Gyrd-Hansen and J. Sorensen, Costs of heart disease and risk behaviour: implications for expenditure on prevention, Scand. J. Public Health36 (2008), 850–856.10.1177/1403494808095955Search in Google Scholar PubMed

[12] A. V. S. Kumar, Diagnosis of heart disease using fuzzy resolution mechanism, J. Artif. Intell.5 (2012), 47–55.10.3923/jai.2012.47.55Search in Google Scholar

[13] H. Liu, T. T. Huang and J. H. Lin, Risk factors and risk index of cardiac events in pregnant women with heart disease, Chinese Med. J.125 (2008), 3410–3415.Search in Google Scholar

[14] H. Liu, X. Liu, Q. Wang, et al., Routing optimization for dispatching vehicles based on an improved discrete particle swarm optimization algorithm with mutation operation. in: 3rd International Conference on Genetic and Evolutionary Computing, WGEC ’09, Guilin, China. New York, NY, USA: IEEE, 2009.Search in Google Scholar

[15] N. C. Long, P. Meesad and H. Unger, A highly accurate firefly based algorithm for heart disease prediction, Expert Syst. Appl.42 (2015), 8221–8231.10.1016/j.eswa.2015.06.024Search in Google Scholar

[16] L. Lorgis, M. Zeller and P. Jourdain, Heart rate distribution and predictors of increased heart rate among French hypertensive patients with stable coronary artery disease: data from the LHYCORNE cohort, Arch. Cardiovasc. Dis.102 (2009), 541–547.10.1016/j.acvd.2009.05.003Search in Google Scholar PubMed

[17] S. Nema, J. Y. Goulermas, G. Sparrow, et al., A hybrid particle swarm branch-and-bound (hpb) optimizer for mixed discrete nonlinear programming[C], IEEE Transactions on Systems, Man, and Cybernetics, 38 (2008), 1411–1424.10.1109/TSMCA.2008.2003536Search in Google Scholar

[18] Y. Ozbay, R. Ceylan and B. Karlik, Integration of type-2 fuzzy clustering and wavelet transform in a neural network based ECG classifier, Expert Syst. Appl.38 (2011), 1004–1010.10.1016/j.eswa.2010.07.118Search in Google Scholar

[19] E. Ozcan and C. Mohan, Particle swarm optimisation: Surfing the waves, in: Proceedings of the IEEE international congress on evolutionary computation, pp. 1939–1944, Washington DC: IEEE Computer Society, 1999.Search in Google Scholar

[20] S. Palaniappan and R. Awang, Intelligent heart disease prediction system using data mining techniques[C]. ACS/IEEE International Conference on Computer Systems and Applications, New York, NY, USA: IEEE, 2008.10.1109/AICCSA.2008.4493524Search in Google Scholar

[21] B. M. Patil, R. C. Joshi and D. Toshniwal, Classification of type-2 diabetic patients by using Apriori and predictive Apriori, Int. J. Comput. Vis. Robot.2 (2011), 254–265.10.1504/IJCVR.2011.042842Search in Google Scholar

[22] P. E. Puddu and A. Menotti, Artificial neural network versus multiple logistic function to predict 25-year coronary heart disease mortality in the Seven Countries Study, Eur. J. Cardiovasc. Prev. Rehabil.16 (2009), 583–591.10.1097/HJR.0b013e32832d49e1Search in Google Scholar PubMed

[23] A. Rezaee Jordehi, Particle swarm optimisation for dynamic optimisation problems: a review, Neural Comput. Appl.25 (2014), 1507–1516.10.1007/s00521-014-1661-6Search in Google Scholar

[24] A. Rezaee Jordehi and J. Jasni, Parameter selection in particle swarm optimisation: a survey, J. Exp. Theor. Artif. Intell.25 (2013), 527–542.10.1080/0952813X.2013.782348Search in Google Scholar

[25] A. Rezaee Jordehi and J. Jasni, Particle swarm optimisation for discrete optimisation problems: a review, Artif. Intell. Rev.43 (2015), 243–258.10.1007/s10462-012-9373-8Search in Google Scholar

[26] A. M. Shin, I. H. Lee, G. H. Lee, H. J. Park, H. S. Park, K. I. Yoon, J. J. Lee and Y. N. Kim, Diagnostic analysis of patients with essential hypertension using association rule mining, Healthc. Inform. Res.16 (2010), 77–81.10.4258/hir.2010.16.2.77Search in Google Scholar PubMed PubMed Central

[27] B. A. Smoley, N. L. Smith and G. P. Runkle, Hypertension in a population of active duty service members, J. Am. Board Fam. Med.21 (2008), 504–511.10.3122/jabfm.2008.06.070182Search in Google Scholar PubMed

[28] A. Suliman, The state of heart disease in Sudan, Cardiovasc. J. Afr.22 (2011), 191–196.10.5830/CVJA-2010-054Search in Google Scholar PubMed PubMed Central

[29] N. Q. Uy, N. X. Hoai, R. I. Mckay, et al., Initialising PSO with randomised low-discrepancy sequences: the comparative results. in Congress on Evolutionary Computation, Singpaore. New York, NY, USA: IEEE, 2007.Search in Google Scholar

[30] W. Q. Wang, X. W. Luo and J. K. Hu, Apriori-sort algorithm, Comput. Eng. Appl.44 (2008), 156–159.Search in Google Scholar

[31] E. Witvrouw, K. V. Borre, T. M. Willems, J. Huysmans, E. Broos and D. De Clercq, The significance of peroneus tertius muscle in ankle injuries – a prospective study, Am. J. Sport. Med.34 (2006), 1159–1163.10.1177/0363546505286021Search in Google Scholar PubMed

[32] M. C. Xiao, X. Y. Wang, L. Y. Dou, X. X. Liu and Y. Tian, Complete mitochondrial genome sequence of a coronary heart disease model rat strain (Muridae; Rattus), Mitochondrial DNA27 (2016), 1287–1288.10.3109/19401736.2014.945559Search in Google Scholar PubMed

[33] J. Xie and C. Wang, Using support vector machines with a novel hybrid feature selection method for diagnosis of erythemato-squamous diseases, Expert Syst. Appl.38 (2011), 5809–5815.10.1016/j.eswa.2010.10.050Search in Google Scholar

[34] Y. G. Yu, Z. F. Zhong and J. Ma, Apriori optimization algorithm based on equivalence class, Comput. Eng.36 (2010), 66–80.Search in Google Scholar

Received: 2016-2-6

Published Online: 2016-8-17

Published in Print: 2017-7-26

This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Particle Swarm Optimisation-Support Vector Machine Optimised by Association Rules for Detecting Factors Inducing Heart Diseases

Abstract

1 Introduction

2 Data Sets

3 The Proposed Algorithm

3.1 Feature Selection Based on AR

3.2 SVMs

3.3 Particle Swarm Optimisation

3.4 The Training Process

4 Experimental Trials

4.1 AR Mining Based on Health and Illness Rules

4.2 Comparison and Analysis

5 Conclusions

Bibliography

Journal and Issue

Articles in the same Issue