Performance analysis of classification algorithms on early detection of liver disease
Introduction
In recent years, we have faced with an increasing number of data stored in various organizations such as banks, hospitals, universities and etc. that encourages us to find a way to extract knowledge from this large amount of data and to efficiently use them. Data mining is defined as a method to discover and extract knowledge from large volumes of data that is useful, practical and understandable (Han, Kamber, & Pei, 2011). It is also defined as a semi-automated way to find hidden patterns among data (Han & Kamber, 2001). One of the most important uses of data mining is the extraction of knowledge from data more accurately in a less time, less cost and possibly to have comprehensive and more complete results. This knowledge is used in various fields such as medical application, web mining, security, prevention of crime and many other fields (Witten & Frank, 2005). Medical science is one of the important areas where data mining is used. Since this branch of science deals with human life, it is highly sensitivities. In recent years, a lot of researches have been done on a variety of diseases using data mining. Looking more closely at the research done in recent years in this field, specifically, in the medical field, we can see many works that use data mining for forecasting, prevention and treatment of patients (Das, 2010, Riganello et al., 2010; Gauthier, Alemayehu & Berger, 2016; Kasabov and Capecci, 2015, Marateb et al., 2014, Nahar et al., 2013, Patidar et al., 2015, Souillard-Mandar et al., 2015, Tanha et al., 2015 Rodríguez‐Jiménez et al., 2016, Tomczak and Zięba, 2015). In medical science, accuracy and speed are two important factors that should be considered chiefly in dealing with any disease. In this regard, data mining techniques can be of great help to physicians.
The organization of this paper is as follows. In Section 2, some background on data mining, liver disease, classification algorithms, and related works are provided. Section 3 describes our method in the implementation of boosted C5.0 and CHAID classification algorithms for the early detection of liver disease. Finally, we conclude our paper in Section 4 with some discussion and suggestion for future works.
Section snippets
Data mining
With advances in science, several machines have entered in our lives. One of the most famous areas where computers as the mostly used machines can be helpful is knowledge extraction with the help of a machine (machine learning). This approach that can be of great help to all scientific fields is called data mining or Knowledge Discovery of the Databases (KDD). Supervised and unsupervised learning are two main methods for machine learning (Han et al., 2011). The purpose of these methods is to
The implementation of classification algorithms
In this paper, we have used Boosted C5.0 and CHAID algorithms that are relevant to the Decision Trees in order to discover hidden knowledge in the liver disease dataset in UCI repository. In this regard, we benefited from IBM SPSS Modeler 14.2 software (Firat University license) and evaluated the algorithms. For our purpose, the data are divided into two groups: training and testing. For more clarity, all stages of this research is presented in Fig. 3:
In this regard, the implementation steps
Conclusion and discussion
According to the statistics published by the relevant agencies, liver disease is among the most fatal disease which puts human life at risk. Decision trees are one of the most important and most well-known algorithms in data mining algorithms and therefore in this paper we used two algorithms named C5.0 and CHAID which are based on decision trees. One of the important features about C5.0 algorithm is the possibility to apply boosting techniques in it. Boosting techniques in C5.0 algorithm leads
References (73)
- et al.
A data mining approach for diagnosis of coronary artery disease
Computer Methods and Programs in Biomedicine
(2013) - et al.
Clarification of the use of chi-square and likelihood functions in fits to histograms
Nuclear Instruments and Methods in Physics Research
(1984) Detection of fraudulent financial statements using the hybrid data mining approach
SpringerPlus
(2016)A comparison of multiple classification methods for diagnosis of Parkinson disease
Expert Systems with Applications
(2010)- et al.
Effective diagnosis of heart disease through neural networks ensembles
Expert Systems with Applications
(2009) - et al.
Diagnosis of valvular heart disease through neural networks ensembles
Computer Methods and Programs in Biomedicine
(2009) - et al.
Evaluation of ensemble methods for diagnosing of valvular heart disease
Expert Systems with Applications
(2010) An introduction to ROC analysis
Pattern Recognition Letters
(2006)- et al.
A decision-theoretic generalization of on-line learning and an application to boosting
Journal of Computer and System Sciences
(1997) From patterns in data to knowledge discovery: what data mining can do
Physics Procedia
(2015)
Age-related differences in reporting of drug-associated liver injury: Data-mining of WHO safety report database
Regulatory Toxicology and Pharmacology
Spiking neural network methodology for modelling, classification and understanding of EEG spatio-temporal data measuring cognitive processes
Information Sciences
An intelligent model for liver disease diagnosis
Artificial Intelligence in Medicine
A hybrid intelligent system for diagnosing microalbuminuria in type 2 diabetes patients without having to measure urinary albumin
Computers in Biology and Medicine
Comparison of three data mining models for predicting diabetes or prediabetes by risk factors
The Kaohsiung Journal of Medical Sciences
Association rule mining to detect factors which contribute to heart disease in males and females
Expert Systems with Applications
C5. 0 classification algorithm and application on individual credit evaluation of banks
Systems Engineering-Theory & Practice
Automated diagnosis of coronary artery disease using tunable-Q wavelet transform applied on heart rate signals
Knowledge-Based Systems
Heart rate variability: An index of brain processing in vegetative state? An artificial intelligence, data mining study
Clinical Neurophysiology
Disease prediction with different types of neural network classifiers
Telematics and Informatics
A survey and compare the performance of IBM SPSS modeler and rapid miner software for predicting liver disease by using various data mining algorithms
Cumhuriyet Science Journal
Big data: Transforming drug development and health policy decision making
Health services and outcomes research methodology
Data mining techniques for optimization of liver disease classification
Data mining and knowledge discovery
A novel integrated model for assessing landslide susceptibility mapping using CHAID and AHP pair-wise comparison
International Journal of Remote Sensing
A pragmatic approach for detecting liver cancer using image processing and data mining techniques
Classification of liver disease diagnosis: A comparative study
A background subtraction algorithm for indoor monitoring surveillance systems
Advanced analytics methodologies: Driving business value with analytics
Comparative analysis of machine learning algorithms along with classifiers for network intrusion detection
Classification of heart disease using K-nearest neighbor and genetic algorithm
Procedia Technology
An electric energy consumer characterization framework based on data mining techniques
Power Systems, IEEE Transactions on
Round robin classification
The Journal of Machine Learning Research
Pairwise classification as an ensemble technique
Machine learning: ECML 2002
Data mining: Concepts, models and techniques
Cited by (122)
Recent advancement in cancer diagnosis using machine learning and deep learning techniques: A comprehensive review
2022, Computers in Biology and MedicineFILTER SELECTION FOR REMOVING NOISE FROM CT SCAN IMAGES USING DIGITAL IMAGE PROCESSING ALGORITHM
2024, Biomedical Engineering - Applications, Basis and CommunicationsAI-Enhanced Comprehensive Liver Tumor Prediction using Convolutional Autoencoder and Genomic Signatures
2024, International Journal of Advanced Computer Science and ApplicationsSupervised Learning Models for Diagnosing Severity of Cirrhosis Disease
2024, Handbook of AI-Based Models in Healthcare and Medicine: Approaches, Theories, and Applications