Abstract
The human liver disorder is a genetic problem due to the habituality of alcohol or effect by the virus. It can lead to liver failure or liver cancer, if not been detected in initial stage. The aim of the proposed method is to detect the liver disorder in initial stage using liver function test dataset. The problem with many real-world datasets including liver disease diagnosis data is class imbalanced. The word imbalance refers to the conditions that the number of observations belongs to one class having more or less than the other class(es). Traditional K- Nearest Neighbor (KNN) or Fuzzy KNN classifier does not work well on the imbalanced dataset because they treat the neighbor equally. The weighted variant of Fuzzy KNN assign a large weight for the neighbor belongs to the minority class data and relatively small weight for the neighbor belongs to the majority class to resolve the issues with data imbalance. In this paper, Variable- Neighbor Weighted Fuzzy K Nearest Neighbor Approach (Variable-NWFKNN) is proposed, which is an improved variant of Fuzzy-NWKNN. The proposed Variable-NWFKNN method is implemented on three real-world imbalance liver function test datasets BUPA, ILPD from UCI and MPRLPD. The Variable-NWFKNN is compared with existing NWKNN and Fuzzy-NWKKNN methods and found accuracy 73.91% (BUPA Dataset), 77.59% (ILPD Dataset) and 87.01% (MPRLPD Dataset). Further, TL_RUS method is used for preprocessing and it improved the accuracy as 78.46% (BUPA Dataset), 78.46% (ILPD Dataset) and 95.79% (MPRLPD Dataset).





Similar content being viewed by others
References
Abdar M, Yen NY, Hung JCS (2017) Improving the diagnosis of liver disease using multilayer perceptron neural network and boosted decision trees. J Med Biol Eng 38(6):953–965
Abdar M, Zomorodi-Moghadam M, Das R, Ting IH (2017) Performance analysis of classification algorithms on early detection of liver disease. Expert Syst Appl 67:239–251
Al Shalabi L, Shaaban Z (2006) Normalization as a preprocessing engine for data mining and the approach of preference matrix. In: 2006 International conference on dependability of computer systems. IEEE, pp 207–214
Alfisahrin SNN, Mantoro T (2013) Data mining techniques for optimization of liver disease classification. In: 2013 International conference on advanced computer science applications and technologies. IEEE, pp 379–384
Bach M, Werner A, Zywiec J, Pluskiewicz W (2017) The study of under-and over-sampling methods’ utility in analysis of highly imbalanced data on osteoporosis. Inform Sci 384:174–190
Basha SM, Rajput DS (2019) A roadmap towards implementing parallel aspect level sentiment analysis. Multimed Tools Appl, 1–30
Basha SM, Rajput DS, Vandhan V (2018) Impact of gradient ascent and boosting algorithm in classification. Int J Intell Eng Syst (IJIES) 11(1):41–49
Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newslett 6(1):20–29
Bennin KE, Keung J, Phannachitta P, Monden A, Mensah S (2018) Mahakil: diversity based oversampling approach to alleviate the class imbalance issue in software defect prediction. IEEE Trans Softw Eng 44(6):534–550
Bond EJ, Li X, Hagness SC, Van Veen BD (2003) Microwave imaging via space-time beamforming for early detection of breast cancer. IEEE Trans Antennas Propag 51(8):1690–1705
Brownlee J (2016) How to normalize and standardize your machine learning data in weka. https://machinelearningmastery.com/normalize-standardize-machine-learning-data-weka/, accessed on 04/02/2019
Chikh MA, Saidi M, Settouti N (2012) Diagnosis of diabetes diseases using an artificial immune recognition system2 (airs2) with fuzzy k-nearest neighbor. J Med Syst 36(5):2721–2729
Chuang CL (2011) Case-based reasoning support for liver disease diagnosis. Artif Intell Med 53(1):15–23
Cover TM, Hart PE, et al. (1967) Nearest neighbor pattern classification. IEEE Trans Inform Theory 13(1):21–27
Devi D, Purkayastha B, et al. (2017) Redundancy-driven modified tomek-link based undersampling: a solution to class imbalance. Pattern Recogn Lett 93:3–12
Dietterich TG (2000) Ensemble methods in machine learning. In: International workshop on multiple classifier systems. Springer, pp 1–15
Dorj UO, Lee KK, Choi JY, Lee M (2018) The skin cancer classification using deep convolutional neural network. Multimed Tools Appl 77(8):9909–9924
Esposito M, De Falco I, De Pietro G (2011) An evolutionary-fuzzy dss for assessing health status in multiple sclerosis disease. Int J Med Inform 80(12):e245–e254
Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2012) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern Part C (Appl Rev) 42(4):463–484
Gong J, Kim H (2017) Rhsboost: improving classification performance in imbalance data. Comput Stat Data Anal 111:1–13
Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier
Hashem S, Esmat G, Elakel W, Habashy S, Raouf SA, Elhefnawi M, Eladawy MI, ElHefnawi M (2018) Comparison of machine learning approaches for prediction of advanced liver fibrosis in chronic hepatitis c patients. IEEE/ACM Trans Comput Biol Bioinform 15(3):861–868
He H, Garcia EA (2008) Learning from imbalanced data. IEEE Trans Knowl Data Eng 9:1263–1284
Ishtiaq U, Kareem SA, Abdullah ERMF, Mujtaba G, Jahangir R, Ghafoor HY (2019) Diabetic retinopathy detection through artificial intelligent techniques: a review and open issues. Multimed Tools Appl, 1–44
Kang Q, Chen X, Li S, Zhou M (2017) A noise-filtered under-sampling scheme for imbalanced classification. IEEE Trans Cybern 47(12):4263–4274
Kantardzic M (2011) Data mining: concepts, models, methods, and algorithms. Wiley
Kaur P, Kumar R, Kumar M (2019) A healthcare monitoring system using random forest and internet of things (iot). Multimed Tools Appl, 1–12
Kavakiotis I, Tsave O, Salifoglou A, Maglaveras N, Vlahavas I, Chouvarda I (2017) Machine learning and data mining methods in diabetes research. Comput Struct Biotechnol J 15:104–116
Keller JM, Gray MR, Givens JA (1985) A fuzzy k-nearest neighbor algorithm. IEEE Trans Syst Man Cybern 4:580–585
Khakhar A (2017) A liver diseases in india. http://www.livertransplant.org/liver-transplantation/awareness/liver-diseases-in-india-stats, accessed on 08/04/2019
Kumar S, Biswas SK, Devi D (2018) Tlusboost algorithm: a boosting solution for class imbalance problem. Soft Comput, 1–13
Lin RH (2009) An intelligent model for liver disease diagnosis. Artif Intell Med 47(1):53–62
Lin RH, Chuang CL (2010) A hybrid diagnosis model for determining the types of the liver disease. Comput Biol Med 40(7):665–670
Liu DY, Chen HL, Yang B, Lv XE, Li LN, Liu J (2012) Design of an enhanced fuzzy k-nearest neighbor classifier based computer aided diagnostic system for thyroid disease. J Med Syst 36(5):3243–3254
Media L (2017) World health ranking. https://www.worldlifeexpectancy.com/india-liver-disease, accessed on 08/04/2019
Meng D, Zhang L, Cao G, Cao W, Zhang G, Hu B (2017) Liver fibrosis classification based on transfer learning and fcnet for ultrasound images. IEEE Access 5:5804–5810
Patel H, Thakur GS (2017) Classification of imbalanced data using a modified fuzzy-neighbor weighted approach. Int J Intell Eng Syst 10(1):56–64
Patel H, Thakur G (2018) An improved fuzzy k-nearest neighbor algorithm for imbalanced data using adaptive approach. IETE J Res, 1–10
Peng L, Zhang H, Yang B, Chen Y (2014) A new approach for imbalanced data classification based on data gravitation. Inform Sci 288:347–373
Priya RV (2019) Emotion recognition from geometric fuzzy membership functions. Multimed Tools Appl, 1–32
Seiffert C, Khoshgoftaar TM, Van Hulse J, Napolitano A (2010) Rusboost: a hybrid approach to alleviating class imbalance. IEEE Trans Syst Man Cybern-Part A: Syst Humans 40(1):185–197
Tan S (2005) Neighbor-weighted k-nearest neighbor for unbalanced text corpus. Expert Syst Appl 28(4):667–671
Tiwari V, Tiwari B, Thakur RS, Gupta S (2016) Pattern and data analysis in healthcare settings. IGI Global
UCI (2012) Ilpd (indian liver patient dataset) data set. https://archive.ics.uci.edu/ml/datasets/ILPD+(Indian+Liver+Patient+Dataset), accessed on 25/05/2018
Witten IH, Frank E, Hall MA, Pal CJ (2016) Data mining: practical machine learning tools and techniques. Morgan Kaufmann
Yan Y, Liu R, Ding Z, Du X, Chen J, Zhang Y (2019) A parameter-free cleaning method for smote in imbalanced classification. IEEE Access 7:23537–23548
Yu HF (2019) Bibliographic automatic classification algorithm based on semantic space transformation. Multimed Tools Appl, 1–15
Yu C, Chen H, Li Y, Peng Y, Li J, Yang F (2019) Breast cancer classification in pathological images based on hybrid features. Multimed Tools Appl, 1–21
Zhou X, Zhang Y, Shi M, Shi H, Zheng Z (2014) Early detection of liver disease using data visualisation and classification method. Biomed Signal Process Control 11:27–35
Zomaya AY, Sakr S (2017) Handbook of big data technologies. Springer
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kumar, P., Thakur, R.S. Liver disorder detection using variable- neighbor weighted fuzzy K nearest neighbor approach. Multimed Tools Appl 80, 16515–16535 (2021). https://doi.org/10.1007/s11042-019-07978-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-07978-3