Abstract
Datasets obtained from the real world are far from balanced, particularly for disease datasets, since such datasets are usually highly skewed having a few minority classes apart from one or more prominent majority classes. In this research, we put forward the novel hybrid architecture to handle imbalanced binary disease datasets that arrives upon the efficient combination of Support vector machine (SVM) classifier’s sensitive parameter values for improved performance of SVM by means of an Evolutionary algorithm (EA), namely monarch butterfly optimization (MBO). In this paper, MBO is used to enumerate three objectives, namely prediction accuracy (PAC), sensitivity (SEN), specificity (SPE). Additionally, we propose a Totally uni-modular matrix (TUM) and limit points based non-dominated solutions selection for deciding local and global search and to generate an efficient initial population respectively. Since these two greatly affect the performance of EAs, the performance of the proposed hybrid architecture is tested on 18 disease datasets having binary class labels and the results obtained demonstrate improvements using the proposed method. For the majority of the datasets, either 100% sensitivity and/or specificity were attained. Moreover, pertinent statistical tests were carried out to ascertain the performances obtained.
Similar content being viewed by others
References
Bashir S, Qamar U, Khan FH (2015) BagMOOV: a novel ensemble for heart disease prediction bootstrap aggregation with multi-objective optimized voting. Australas Phys Eng Sci Med 38(2):305–323
Bashir S, Qamar U, Khan FH (2016) IntelliHealth: a medical decision support application using a novel weighted multi-layer classifier ensemble framework. J Biomed Inf 59:185–200
Bashir S, Qamar U, Khan FH, Naseem L (2016) HMV: a medical decision support framework using multi-layer classifiers for disease prediction. J Comput Sci 13:10–25
Berge C (1984) Hypergraphs: combinatorics of finite sets, vol 45. Elsevier
Brodley CE, Friedl MA (1999) Identifying mislabeled training data. J Artif Intell Res 11:131–167
Bukala J, Damaziak K, Karimi HR et al (2019) Evolutionary computing methodology for small wind turbine supporting structures. Int J Adv Manufac Technol 100(9–12):2741–2752
Chau KW (2007) Reliability and performance-based design by artificial neural network. Adv Eng Softw 38(3):145–149
Chen S, Chen R, Gao J (2017) A monarch butterfly optimization for the dynamic vehicle routing problem. Algorithms 10(3):107
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Deb K (2001) Multi-objective optimization using evolutionary algorithms, vol 16. Wiley, Hoboken
Díez-Pastor JF, Rodríguez JJ, García-Osorio CI et al (2015) Diversity techniques improve the performance of the best imbalance learning ensembles. Inf Sci 325:98–117
Díez-Pastor JF, Rodríguez JJ, García-Osorio C et al (2015) Random balance: ensembles of variable priors classifiers for imbalanced data. Knowl-Based Syst 85:96–111
Elrahman SMA, Abraham A (2013) A review of class imbalance problem. J Netw Innov Comput 1(2013):332–340
Farid DM, Al-Mamun MA, Manderick B, Nowe A (2016) An adaptive rule-based classifier for mining big biological data. Expert Syst Appl 64:305–316
Faris H, Aljarah I, Mirjalili S (2018) Improved monarch butterfly optimization for unconstrained global search and neural network training. Appl Intell 48(2):445–464
Feng Y, Wang GG, Deb S, Lu M, Zhao XJ (2017) Solving 0–1 knapsack problem by a novel binary monarch butterfly optimization. Neural Comput Appl 28(7):1619–1634
Feng Y, Wang GG, Dong J, Wang L (2018) Opposition-based learning monarch butterfly optimization with Gaussian perturbation for large-scale 0-1 knapsack problem. Comput Electr Eng 67:454–468
Feng Y, Wang GG, Li W, Li N (2018) Multi-strategy monarch butterfly optimization algorithm for discounted 0–1 knapsack problem. Neural Comput Appl 30(10):3019–3036
Feng Y, Yang J, Wu C, Lu M, Zhao XJ (2018) Solving 0–1 knapsack problems by chaotic monarch butterfly optimization algorithm with Gaussian mutation. Mem Comput 10(2):135–150
Fernández A, del Río S, Chawla NV, Herrera F (2017) An insight into imbalanced Big Data classification: outcomes and challenges. Complex Intell Syst 3(2):105–120
Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2012) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern Part C (Appl Rev) 42(4):463–484
George L, Nemhauser, Laurence A (1999) Wolsey Integer and combinatorial optimization. Wiley, Hoboken, pp 540–546
Ghanem WA, Jantan A (2018) Hybridizing artificial bee colony with monarch butterfly optimization for numerical optimization problems. Neural Comput Appl 30(1):163–181
Gil D, Girela JL, De Juan J, Gomez-Torres MJ, Johnsson M (2012) Predicting seminal quality with artificial intelligence methods. Expert Syst Appl 39(16):12564–12573
Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H, Bing G (2017) Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl 73:220–239
Haixiang G, Yijing L, Yanan L, Xiao L, Jinling L (2016) BPSO-Adaboost-KNN ensemble learning algorithm for multi-class imbalanced data classification. Eng Appl Artif Intell 49:176–193 (247)
Huang C, Li Y, Change LC, Tang X (2016) Learning deep representation for imbalanced classification. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 5375–5384
Jian C, Gao J, Ao Y (2016) A new sampling method for classifying imbalanced data based on support vector machine ensemble. Neurocomputing 193:115–122
Jiang B, Karimi HR, Kao Y, Gao C (2018) A novel robust fuzzy integral sliding mode control for nonlinear semi-Markovian jump T-S fuzzy systems. IEEE Trans Fuzzy Syst 26(6):3594–3604
Krawczyk B, Galar M, Jelen L, Herrera F (2016) Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy. Appl Soft Comput 38:714–726
Krawczyk B, Wozniak M, Schaefer G (2014) Cost-sensitive decision tree ensembles for effective imbalanced classification. Appl Soft Comput 14:554–562
Lipschutz S (2010) General topology. McGraw-Hill, New York
López V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141
López V, Triguero I, Carmona CJ, García S, Herrera F (2014) Addressing imbalanced classification with instance generation techniques: IPADE-ID. Neurocomputing 126:15–28
Mangat V, Vig R (2014) Novel associative classifier based on dynamic adaptive PSO: application to determining candidates for thoracic surgery. Expert Syst Appl 41(18):8234–8244
Moazenzadeh R, Mohammadi B, Shamshirband S, Chau KW (2018) Coupling a firefly algorithm with support vector regression to predict evaporation in northern Iran. Eng Appl Comput Fluid Mech 12(1):584–597
Nalluri MR, Roy DS (2017) Hybrid disease diagnosis using multiobjective optimization with evolutionary parameter optimization. J Healthc Eng 2017:1–27
Nalluri MSR, Kannan K, Gao XZ, Roy DS (2019) An efficient hybrid meta-heuristic approach for cell formation problem. Soft Comput 23:1–25
Napierala K, Stefanowski J (2015) Addressing imbalanced data with argument based rule learning. Expert Syst Appl 42(24):9468–9481
Napierala K, Stefanowski J, Wilk S (2010) Learning from imbalanced data in presence of noisy and borderline examples. In International Conference on rough sets and current trends in computing. Springer, Berlin, Heidelberg, pp 158–167
Platt JC (1999) Fast training of support vector machines using sequential minimal optimization. In: Advances in kernel methods, pp 185-208
Quionero-Candela J, Sugiyama M, Schwaighofer A, Lawrence ND (2009) Dataset shift in machine learning. The MIT Press, Cambridge
Rao NM, Kannan K, Gao XZ, Roy DS (2018) Novel classifiers for intelligent disease diagnosis with multi-objective parameter evolution. Comput Electr Eng 67:483–496
Sáez JA, Luengo J, Stefanowski J, Herrera F (2015) SMOTE–IPF: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf Sci 291:184–203
Shen L, Chen H, Yu Z, Kang W, Zhang B, Li H, Liu D (2016) Evolving support vector machines using fruit fly optimization for medical data classification. Knowl-Based Syst 96:61–75
Sheskin DJ (2003) Handbook of parametric and nonparametric statistical procedures. CRC Press, Boca Roton
Stefanowski J (2013) Overlapping, rare examples and class decomposition in learning classifiers from imbalanced data. In: Ramanna S, Jain L, Howlett R (eds) Emerging paradigms in machine learning. Springer, Berlin, Heidelberg, pp 277–306
Sun Y, Wong AK, Kamel MS (2009) Classification of imbalanced data: a review. Int J Pattern Recognit Artif Intell 23(04):687–719
Sun Z, Song Q, Zhu X, Sun H, Xu B, Zhou Y (2015) A novel ensemble method for classifying imbalanced data. Pattern Recogn 48(5):1623–1637
Tang Y, Zhang YQ, Chawla NV, Krasser S (2009) SVMs modeling for highly imbalanced classification. IEEE Trans Syst Man Cybern Part B (Cybernetics) 39(1):281–288
Taormina R, Chau KW, Sivakumar B (2015) Neural Network River forecasting through base flow separation and binary-coded swarm optimization. J Hydrol 529:1788–1797
Uriarte-Arcia AV, López-Yáñez I, Yáñez-Márquez C (2014) One-hot vector hybrid associative classifier for medical data classification. PLoS One 9(4):e95715
Wang GG, Deb S, Zhao X, Cui Z (2018) A new monarch butterfly optimization with an improved crossover operator. Oper Res Int J 18(3):731–755
Wang GG, Zhao X, Deb S (2015) A novel monarch butterfly optimization with greedy strategy and self-adaptive. In: Soft computing and machine intelligence (ISCMI), 2015 Second International Conference on, pp 45–50. IEEE
Wang Y, Karimi HR, Lam HK, Shen H (2018) An improved result on exponential stabilization of sampled-data fuzzy systems. IEEE Trans Fuzzy Syst 26(6):3875–3883
Weiss GM (2010) The impact of small disjuncts on classifier learning. In Data Mining (pp. 193-226).Springer, Boston, MA
Wu CL, Chau KW (2011) Rainfall–runoff modeling using artificial neural network coupled with singular spectrum analysis. J Hydrol 399(3–4):394–409
Xiao W, Zhang J, Li Y, Zhang S, Yang W (2017) Class-specific cost regulation extreme learning machine for imbalanced classification. Neurocomputing 261:70–82
Zhang S, Chau KW (2009) Dimension reduction using semi-supervised locally linear embedding for plant leaf classification. In: International Conference on intelligent computing. Springer, Berlin, Heidelberg, pp 948-955
Zhao ZQ (2009) A novel modular neural network for imbalanced classification problems. Pattern Recogn Lett 30(9):783–788
Zhihua C, Feixiang L, Wensheng Z (2019) Bat algorithm with principal component analysis. Int J Mach Learn Cybern 10(3):603–622
Zhihua C, Jiangjiang Z, Yechuang W, Yang W et al (2019) A pigeon-inspired optimization algorithm for many-objective optimization problems. Sci China Inf Sci 62(7):070212. https://doi.org/10.1007/s11432-018-9729-5
Zieba M, Tomczak JM, Lubicz M, Swiatek J (2014) Boosted SVM for extracting rules from imbalanced data in application to prediction of the post-operative life expectancy in the lung cancer patients. Appl Soft Comput 14:99–108
Zou Q, Xie S, Lin Z, Wu M, Ju Y (2016) Finding the best classification threshold in imbalanced classification. Big Data Res 5:2–8
Acknowledgements
K. Kannan gratefully acknowledge Tata Realty-IT city-SASTRA Srinivasa Ramanujan Research Cell of SASTRA University (India) for the financial support extended to us in carrying out this research work. Xiao-Zhi Gao’s research work was partially supported by the National Natural Science Foundation of China (NSFC) under Grant 51875113.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Nalluri, M.R., Kannan, K., Gao, XZ. et al. Multiobjective hybrid monarch butterfly optimization for imbalanced disease classification problem. Int. J. Mach. Learn. & Cyber. 11, 1423–1451 (2020). https://doi.org/10.1007/s13042-019-01047-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-019-01047-9