Skip to main content

Advertisement

Log in

WebMAC: A web based clinical expert system

  • Published:
Information Systems Frontiers Aims and scope Submit manuscript

Abstract

Disease diagnosis at early stages can enable the physicians to overcome the complications and treat them properly. The diagnosis method plays an important role in disease diagnosis and accuracy of its treatment. A diagnosis expert system can help a great deal in identifying those diseases and describing methods of treatment to be carried out; taking into account the user capability in order to deal and interact with expert system easily and clearly. A good way to improve diagnosis accuracy of expert systems is use of ensemble classifiers. The proposed research presents an expert system using multi-layer classification with enhanced bagging and optimized weighting. The proposed method is named as “M2-BagWeight” which overcomes the limitations of individual as well as other ensemble classifiers. Evaluation of the proposed model is performed on two different liver disease datasets, chronic kidney disease dataset, heart disease dataset, diabetic retinopathy debrecen dataset, breast cancer dataset and primary tumor dataset obtained from UCI public repository. It is clear from the analysis of results that proposed expert system has achieved high classification and prediction accuracy when compared with individual as well as ensemble classifiers. Moreover, an application named “WebMAC” is also developed for practical implementation of proposed model in hospital for diagnostic advice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. http://archive.ics.uci.edu/ml/datasets.html [Last Accessed 25 Sep. 2015]

References

  • Aruna, S., Rajagopalan, S. P., & Nandakishore, L. V. (2011). Knowledge based analysis of various statistical tools in detecting breast cancer. CCSEA, CS IT, 02, 37–45.

    Google Scholar 

  • Ashfaq, A. K., Aljahdali, S., & Hussain, S. N. (2013). Comparative prediction performance with support vector machine and random forest classification techniques. International Journal of Computers and Applications, 69(11), 0975–8887.

    Google Scholar 

  • Ba-Alwi, F. M., & Hintaya, H. M. (2013). Comparative Study for Analysis the Prognostic in Hepatitis Data: Data Mining Approach. International Journal of Scientific & Engineering Research, 4(8), 680–685.

    Google Scholar 

  • Ben-Hur, A., & Weston, J. (2010). A user’s guide to support vector machines. In Data mining techniques for the life sciences, Humana Press. (pp. 223–239).

  • Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.

    Google Scholar 

  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.

    Article  Google Scholar 

  • Chen, A.H.; Huang, S.Y.; Hong, P.S.; Cheng, C.H.; Lin, E.J. (2011a). HDPS: Heart disease prediction system. In: Computing in Cardiology, IEEE pp. 557–560.

  • Chen, C. M., Hsu, C. Y., Chiu, H. W., & Rau, H. H. (2011b). Prediction of survival in patients with liver cancer using artificial neural networks and classification and regression trees. In Natural Computation (ICNC), 2011, IEEE, Seventh International Conference on (Vol. 2, pp. 811–815).

  • Chitra, R., & Seenivasagam, D. V. (2013). Heart Disease Prediction System Using Supervised Learning Classifier. International Journal of Software Engineering and Soft Computing, 3(1).

  • Díez-Pastor, J. F., Rodríguez, J. J., García-Osorio, C., & Kuncheva, L. I. (2015). Random balance: ensembles of variable priors classifiers for imbalanced data. Knowledge-Based Systems, 85, 96–111.

    Article  Google Scholar 

  • Dua, S., & Du, X. (2011). Data mining and machine learning in cyber security. CRC press.

  • Fernandez-Millan, R., Medina-Merodio, J. A., Plata, R. B., Martinez-Herraiz, J. J., & Gutierrez-Martinez, J. M. (2015). A laboratory test expert system for clinical diagnosis support in primary health care. Applied Sciences, 5(3), 222–240.

    Article  Google Scholar 

  • Freund, Y. (2001). An adaptive version of the boost by majority algorithm. Machine Learning, 43(3), 293–318.

    Article  Google Scholar 

  • Freund, Y., & Schapire, R. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55, 119–139.

    Article  Google Scholar 

  • Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). The Annals of Statistics, 28(2), 337–407.

    Article  Google Scholar 

  • García-Laencina, P. J., Sancho-Gómez, J. L., Figueiras-Vidal, A. R., & Verleysen, M. (2009). K nearest neighbours with mutual information for simultaneous classification and missing data imputation. Neurocomputing, 72(7), 1483–1493.

    Article  Google Scholar 

  • Gath, S.J., & Kulkarni, R.V. (2014). A Review: expert system for diagnosis of myocardial infarction. arXiv preprint arXiv:1401.0245.

  • Ghumbre, S., Patil, C., Ghatol, A. (2011). Heart disease diagnosis using support vector machine. In: International Conference on Computer Science and Information Technology (ICCSIT') Pattaya.

  • Gulia, A., Vohra, R., & Rani, P. (2014). Liver patient classification using intelligent techniques. International Journal of Computer Science and Information Technologies, 5(4), 5110–5115.

    Google Scholar 

  • Jilani, T. A., Shoaib, M., Rasheed, R., & Rehman, B. U. (2014). A comparative study of data mining techniques for Hcv patients’ data. J. Appl. Environ. Biol. Sci, 4(9S), 217–223.

    Google Scholar 

  • Jin, H., Kim, S., & Kim, J. (2014). Decision factors on effective liver patient data prediction. International Journal of BioScience and BioTechnology, 6(4), 167–178.

    Google Scholar 

  • Kalaiselvi, C., & Nasira, G. M. (2015). Prediction of Heart Diseases and Cancer in Diabetic Patients Using Data Mining Techniques. Indian Journal of Science and Technology, 8(14).

  • Kang, S., Cho, S., & Kang, P. (2015). Multi-class classification via heterogeneous ensemble of one-class classifiers. Engineering Applications of Artificial Intelligence, 43, 35–43.

    Article  Google Scholar 

  • Kankanhalli, A., Hahn, J., Tan, S., & Gao, G. (2016). Big data and analytics in healthcare: introduction to the special section. Information Systems Frontiers, 18(2), 233–235.

    Article  Google Scholar 

  • Kaya, Y., & Uyar, M. (2013). A hybrid decision support system based on rough set and extreme learning machine for diagnosis of hepatitis disease. Applied Soft Computing, 13(8), 3429–3438.

    Article  Google Scholar 

  • Kim, M. J., Kang, D. K., & Kim, H. B. (2015). Geometric mean based boosting algorithm with over-sampling to resolve data imbalance problem for bankruptcy prediction. Expert Systems with Applications, 42(3), 1074–1082.

    Article  Google Scholar 

  • King, M. A., Abrahams, A. S., & Ragsdale, C. T. (2015). Ensemble learning methods for pay-per-click campaign management. Expert Systems with Applications, 42(10), 4818–4829.

    Article  Google Scholar 

  • Kitakaze, M., Asakura, M., Nakano, A., Takashima, S., & Washio, T. (2015). Data mining as a powerful tool for creating novel drugs in cardiovascular medicine: the importance of a “back-and-forth loop” between clinical data and basic research. Cardiovascular Drugs and Therapy, 29, 309–315.

    Article  Google Scholar 

  • Kumar, Y., & Sahoo, G. (2013). Prediction of different types of liver diseases using rule based classification model. Technology and healthcare, 21(5), 417–432.

    Google Scholar 

  • Kumar, V., & Velide, L. (2014). A data mining approach for prediction and treatment of diabetes disease.

  • Kumar, M. V., Sharathi, V. V., & Devi, B. R. G. (2012a). Hepatitis prediction model based on data mining algorithm and optimal feature selection to improve predictive accuracy. International Journal of Computer Applications, 51(19), 13–16.

    Article  Google Scholar 

  • Kumar, M. V., Sharathi, V. V., & Devi, B. R. G. (2012b). Hepatitis Prediction Model based on Data Mining Algorithm and Optimal Feature Selection to Improve Predictive Accuracy. International Journal of Computer Applications, 51(19).

  • Lavanya, D., & Rani, K. U. (2012). Ensemble decision tree classifier for breast cancer data. International Journal of Information Technology Convergence and Services (IJITCS), 2(1).

  • Lavrač, N. (1999). Selected techniques for data mining in medicine. Artificial Intelligence in Medicine, 16(1), 3–23.

    Article  Google Scholar 

  • Moretti, F., Pizzuti, S., Panzieri, S., & Annunziato, M. (2015). Urban traffic flow forecasting through statistical and neural network bagging ensemble hybrid modeling. Neurocomputing, 167, 3–7.

  • Nagarajan, S., Chandrasekaran, R. M., & Ramasubramanian, P. (2015). Data mining techniques for performance evaluation of diagnosis in gestational diabetes.

  • Oh, D. Y., & Gray, J. B. (2013). GA-ensemble: a genetic algorithm for robust ensembles. Computational Statistics, 28(5), 2333–2347.

    Article  Google Scholar 

  • Rokach, L., & Maimon, O. (2005). Decision trees. In Data mining and knowledge discovery handbook, Springer US. (pp. 165–192).

  • Saeys, Y., Inza, I., & Larrañaga, P. (2007). A review of feature selection techniques in bioinformatics. Bioinformatics, 23(19), 2507–2517.

    Article  Google Scholar 

  • Salama, G. I., Abdelhalim, M. B., & Zeid, M. A. (2012a). Breast Cancer Diagnosis on Three Different Datasets Using Multi-Classifiers. International Journal of Computer and Information Technology, 01(01).

  • Salama, G. I., Abdelhalim, M. B., & Zeid, M. A. (2012b). Breast cancer diagnosis on three different datasets using multiclassifiers. Int. J. Comput. Inf. Technol., 01(01), 764–2277.

    Google Scholar 

  • Shah, B. R., & Lipscombe, L. L. (2015). Clinical diabetes research using data mining: a Canadian perspective. Canadian Journal of Diabetes, 39(3), 235–238.

    Article  Google Scholar 

  • Shouman, M.,Turner, T., Stocker, R. (2012). Integrating Naive Bayes and K-means clustering with different initial centroid selection methods in the diagnosis of heart disease patients. In: Computer science and information technology, pp. 125–137.

  • Shouman, M., Turner, T., & Stocker, R. (2013). Integrating Clustering with Different Data Mining Techniques in the Diagnosis of Heart Disease. Journal of Computing Science and Engineering, 20(1).

  • Thirumal, P. C., & Nagarajan, N. (2006). Utilization of data mining techniques for diagnosis of diabetes mellitus-a case study.

  • Timsina, P., Liu, J., & El-Gayar, O. (2016). Advanced analytics for the automation of medical systematic reviews. Information Systems Frontiers, 18(2), 237–252.

    Article  Google Scholar 

  • Vijayan, V., & Ravikumar, A. (2014). Study of data mining algorithms for prediction and diagnosis of diabetes mellitus. International Journal of Computer Applications, 95(17).

  • Vijayarani, S., & Dhayanand, M. S. (2015). Liver Disease Prediction using SVM and Naïve Bayes Algorithms. International Journal of Science, Engineering and Technology Research (IJSETR), 4(4).

  • Yang, C. G., & Lee, H. J. (2016). A study on the antecedents of healthcare information protection intention. Information Systems Frontiers, 18(2), 253–263.

    Article  Google Scholar 

  • Yasin, H., Jilani, T. A., & Danish, M. (2011). Hepatitis-C classification using data mining techniques. International Journal of Computer Applications, 24(3), 1–6.

    Article  Google Scholar 

  • Zhu, F., Patumcharoenpol, P., Zhang, C., Yang, Y., Chan, J., Meechai, A., & Shen, B. (2013). Biomedical text mining and its applications in cancer research. Journal of Biomedical Informatics, 46(2), 200–211.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Usman Qamar.

Appendix: Details of the datasets

Appendix: Details of the datasets

1.1 Datasets

1.1.1 Liver disease datasets

Two liver disease datasets namely Bupa liver disease dataset and ILPD liver disease dataset are used for evaluation purposes. Both datasets are obtained from the UCI machine learning repository. Each dataset contains a diverse set of attributes and instances that will ultimately determine the presence or absence of liver disease in patients. The class labels are represented by 0 and 1 where 0 indicates the absence of disease, whereas 1 represented the presence of disease. Complete description of each dataset is given below:

  1. a)

    Bupa Liver Disease Dataset

The Bupa liver diabetes dataset was initially taken from BUPA Medical Research Ltd. There are 345 instances in the dataset representing both healthy and liver disease patients. There are seven attributes in the dataset containing no missing values. It is a complete dataset containing categorical, real and integer type attributes. The first 5 variables are all blood tests which are thought to be sensitive to liver disorders that might arise from excessive alcohol consumption. A sample of the Bupa liver diabetes dataset is shown in Table 28.

  1. b)

    Indian Liver Patient Dataset (ILPD)

The Indian liver patient dataset was collected from north east of Andhra Pradesh, India. There are 583 instances in the dataset which contains 416 liver patients’ record and 167 non-liver patient’s record. The dataset contains 441 male patients’ record and 142 female patients’ records. There are 10 attributes in the dataset that are age, gender, total Bilirubin, direct Bilirubin, total proteins, albumin, A/G ratio, SGPT, SGOT and Alkphos. The dataset does not contain any missing value attribute. The attributes consist of integer and real type data sets. A sample of ILPD dataset is shown in Table 29.

1.1.2 Chronic kidney disease dataset

The chronic kidney disease dataset is used to determine chronic kidney disease in patients. There are two class labels in the dataset; CKD (chronic kidney disease) and NotCKD. The dataset contains 24 disease diagnosis attributes and 1 class label attribute. There are 400 instances in the dataset and it also contains missing values. The CKD patients are 250 and NotCKD are 150. The class labels are replaced with 0 and 1 where 0 indicates NotCKD whereas 1 represent CKD patients. A sample of CKD dataset is given in Table 30.

1.1.3 Cleveland heart disease dataset

There are total 303 records in Cleveland heart disease dataset. The training set is composed of 272 instances, whereas test set consists of 31 instances. The feature space contains 14 attributes where 13 attributes present vital signs and one attribute is goal class (0, 1), 0 presents the absence of heart disease and 1 show the presence of dis- ease. A sample dataset of heart disease from UCI repository is shown in Table 31.

1.1.4 Diabetic retinopathy debrecen dataset

This dataset contains features extracted from the Messidor image set to predict whether an image contains signs of diabetic retinopathy or not. The dataset contains 121 instances and 20 attributes. There is no missing value in the dataset. The class label 1 contains sign of disease whereas 0 indicates absence of disease. A sample set of Diabetic Retinopathy Debrecen dataset is shown in Table 32.

1.1.5 Wisconsin breast cancer dataset (WBC)

The Wisconsin breast cancer dataset consists of 699 instances and 11 attributes. 10 attributes represent feature information, whereas one attribute contains class information where 2 = Benign and 4 = Malignant. There are 16 missing values in the dataset which are denoted by “?”. The class distribution consists of 458 benign instances and 241 malignant instances. This represents an unbalanced dataset. A sample of WBC dataset is shown in Table 33.

1.1.6 Primary tumor dataset

The primary tumor dataset is initially taken from Ljubljana Oncology Institute. The dataset contains 339 instances and 17 attributes. The class label is replaced with either 0 or 1 where 0 is absence of disease and 1 indicates presence of disease. A sample of primary tumor dataset is given in Table 34.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bashir, S., Qamar, U. & Khan, F.H. WebMAC: A web based clinical expert system. Inf Syst Front 20, 1135–1151 (2018). https://doi.org/10.1007/s10796-016-9718-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10796-016-9718-y

Keywords

Navigation