Abstract
The imbalance and multi-dimension are two common problems in the medical image datasets, which affect the performances of the image processing procedures. The traditional methods to solve these two problems are notoriously difficult. Accordingly, this work employed metaheuristic methods to optimize the rebalancing process of the imbalanced class distribution for further use in the feature selection procedure for dimensionality reduction for the medical X-ray image datasets. Different metaheuristic algorithms were used to maximize the parameter values of the rebalancing and feature selection phases to preprocess the datasets. The proposed work devised a multi-objective optimization strategy in the process of the metaheuristic algorithms search to solve the problem of dual imbalanced dataset and feature selection. Afterward, a comparative study of the proposed optimized approach with the conventional methods was conducted to evaluate the proposed method performance. The results established the superiority of the proposed method to overcome the imbalanced and multi-dimensional problem. The proposed method generated a reasonable number of minority class samples and selected a sensible subset of features to ultimately obtain a very extraordinary accuracy with great credibility from a negative value of kappa and a false high accuracy. It produced higher credibility and correctness classification performance in the practical problem of medical X-ray images compared to other algorithms. Feature selection with Random-SMOTE (RSMOTE) using the self-adaptive Bat algorithm is superior to the optimization using particle swarm optimization. The proposed method using the Bat algorithm achieved 94.6% classification accuracy with 0.883 Kappa value using the lung X-ray first dataset.
Similar content being viewed by others
References
Xu, Z., King, I., Lyu, M. R. T., & Jin, R’ Discriminative semi-supervised feature selection via manifold regularization, Neural Netwo IEEE Trans 21, 7(2010): 1033–1047
Jain A, Zongker D (1997) Feature selection: evaluation, application, and small sample performance. Pattern Analysis Mach Intel, IEEE Trans 19(2):153–158
Nath SS et al. (2014) A survey of image classification methods and techniques. Control, Instrumentation, Communication and Computational Technologies (ICCICCT), 2014 International Conference on. IEEE
Kira K, Rendell LA (1992) The feature selection problem: Traditional methods and a new algorithm. AAAI. 2
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97.1:273–324
Inza I et al (2004) Filter versus wrapper gene selection approaches in DNA microarray domains. Artif Intell Med 31.2:91–103
Zhou X, Wang X, Dougherty ER (2004) Nonlinear probit gene classification using mutual information and wavelet-based feature selection. J Biol Syst 12(03):371–386
Furey TS et al (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16.10:906–914
Hsu, William H (2004) Genetic wrappers for feature selection in decision tree induction and variable ordering in Bayesian network structure learning. Inf Sci 163.1:103–122
Fong S, et al. (2015) Advances of Applying Metaheuristics to Data Mining Techniques Improving Knowledge Discovery through the Integration of Data Mining Techniques 75
Wu G, Chang EY (2004) Aligning boundary in kernel space for learning imbalanced dataset Data Mining, 2004. ICDM'04. Fourth IEEE International Conference on. IEEE
Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced datasets. Machine Learning: ECML 2004. Springer Berlin Heidelberg. 39–50
Kubat M, Holte RC, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. Mach Learn 30(2–3):195–215
Liu X-Y, Wu J, Zhou Z-H (2009) Exploratory undersampling for class-imbalance learning. Syst, Man, Cybernet, Part B: Cybernet, IEEE Trans 39(2):539–550
Srinivas M et al. Multi-level Classification: A Generic Classification Method for Medical Datasets
Thai-Nghe N, Gantner Z, Schmidt-Thieme L (2010) Cost-sensitive learning methods for imbalanced data. Neural Networks (IJCNN), The 2010 International Joint Conference on. IEEE
Joshi MV, Kumar V, Agarwal RC (2001) Evaluating boosting algorithms to classify rare classes: Comparison and improvements. Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on. IEEE
Krawczyk B, Schaefer G (2012) Ensemble fusion methods for medical data classification. Neural Network Applications in Electrical Engineering (NEUREL), 2012 11th Symposium on. IEEE
Krawczyk B, Schaefer G, Wozniak M (2013) A cost-sensitive ensemble classifier for breast cancer classification. Applied Computational Intelligence and Informatics (SACI), 2013 IEEE 8th International Symposium on. IEEE
Rohlfing T et al. (2004) Performance-based multi-classifier decision fusion for atlas-based segmentation of biomedical images. Biomedical Imaging: Nano to Macro, 2004. IEEE International Symposium on. IEEE
Li X, Wang L, Sung E (2008) AdaBoost with SVM-based component classifiers. Eng Appl Artif Intell 21(5):785–795
Galar M et al (2012) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. Part C: Applicat Rev IEEE Transact 42.4:463–484
Poli R, Kennedy J, Blackwell T (2007) Particle swarm optimization. Swarm intel 1(1):33–57
Chatterjee S, et al. (2016) Particle swarm optimization trained neural network for structural failure prediction of multistoried RC buildings. Neural Computing and Applications 1–12
Fister I et al. (2014) A novel hybrid self-adaptive bat algorithm. Sci World J
Shiraishi J et al (2000) Development of a digital image database for chest radiographs with and without a lung nodule: receiver operating characteristic analysis of radiologists' detection of pulmonary nodules. Am J Roentgenol 174.1:71–74
Van Ginneken B, Stegmann MB, Loog M (2006) Segmentation of anatomical structures in chest radiographs using supervised methods: a comparative study on a public database. Med Image Anal 10.1:19–40
Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, Tarbox L, Prior F (2013) The Cancer imaging archive (TCIA): maintaining and operating a public information repository. J Digit Imaging 26(6):1045–1057
Armato SGIII, McLennan G, Bidaut L, McNitt-Gray MF, Meyer CR, Reeves AP, Zhao B, Aberle DR, Henschke CI, Hoffman EA, Kazerooni EA, MacMahon H, van Beek EJR, Yankelevitz D et al (2011) The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on CT scans. Med Phys 38:915–931
Li J et al. (2016) Adaptive Multi-objective Swarm Crossover Optimization for Imbalanced Data Classification. Advanced Data Mining and Applications: 12th International Conference, ADMA 2016, Gold Coast, QLD, Australia, December 12–15, 2016, Proceedings 12. Springer International Publishing
Yang X-S (2010) A new metaheuristic bat-inspired algorithm." Nature inspired cooperative strategies for optimization (NICSO 2010). Springer Berlin Heidelberg. 65–74
Chawla NV et al (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res:321–357
Li J et al (2015) Improving the classification performance of biological imbalanced datasets by swarm optimization algorithms. J Supercomput:1–21
Li J et al (2016) Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification. Bio Data Mining 9.1:37
Ichikawa T, et al. High-b value diffusion-weighted MRI for detecting pancreatic adenocarcinoma: preliminary results. Am J Roentgenol 188.2 (2007): 409–414
Li J et al (2016) Solving the under-fitting problem for decision tree algorithms by incremental swarm optimization in rare-event healthcare classification. J Med Imaging Health Inform 6.4:1102–1110
Li, Jinyan, Simon Fong, and Yan Zhuang. (2015) Optimizing SMOTE by Metaheuristics with Neural Network and Decision Tree." Computational and Business Intelligence (ISCBI), 2015 3rd International Symposium on. IEEE
Fonseca CM, Fleming PJ (1998) Multiobjective optimization and multiple constraint handling with evolutionary algorithms. I. A unified formulation. Syst, Man Cybernet, Part A: Syst Hum, IEEE Transact 28(1):26–37
Fong S et al (2014) Feature selection in life science classification: metaheuristic swarm search. IT Prof 16.4:24–29
Saba L, Dey N, Ashour AS, Samanta S, Nath SS, Chakraborty S, Sanches J, Kumar D, Marinho R, Suri JS (2016) Automated stratification of liver disease in ultrasound: an online accurate feature classification paradigm. Comput Methods Prog Biomed 130:118–134
Ahmed SS, Dey N, Ashour AS, Sifaki-Pistolla D, Bălas-Timar D, Balas VE, Tavares JMR (2017) Effect of fuzzy partitioning in Crohn’s disease classification: a neuro-fuzzy-based approach. Med Biolog eng Comput 55(1):101–115
Wang C, Li Z, Dey N, Ashour A, Fong S, Sherratt RS, Wu L, Shi F (2017) Histogram of oriented gradient based plantar pressure image feature extraction and classification employing fuzzy support vector machine. J Med Imaging Health Inform
Samanta SO, Choudhury AL, Dey N, Ashour AS, Balas VE (2017) Quantum-inspired evolutionary algorithm for scaling factor optimization during manifold medical information embedding. InQuantum Inspired Comput Intell:285–326
Chatterjee S, Sarkar S, Hore S, Dey N, Ashour AS, Balas VE (2017) Particle swarm optimization trained neural network for structural failure prediction of multistoried RC buildings. Neural Comput & Applic 28(8):2005–2016
Naik A, Satapathy SC, Ashour AS, Dey N (2016) Social group optimization for global optimization of multimodal functions and data clustering problems. Neural Comput & Applic:1–17
Beagum S, Dey N, Ashour AS, Sifaki-Pistolla D, Balas VE (2017) Nonparametric de-noising filter optimization using structure-based microscopic image classification. Microsc Res Tech 80(4):419–429
Ashour AS, Samanta S, Dey N, Kausar N, Abdessalemkaraa WB, Hassanien AE (2015) Computed tomography image enhancement using cuckoo search: a log transform based approach. J Sign Inform Proces 6(03):244
Chatterjee S, Sarkar S, Hore S, Dey N, Ashour AS, Balas VE (2017) Particle swarm optimization trained neural network for structural failure prediction of multistoried RC buildings. Neural Comput & Applic 28(8):2005–2016
Yu J, Rui Y, Tao D (2014) Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Proc 23(5):2019–2032
Yu J, Tao D, Wang M, Rui Y (2015) Learning to rank using user clicks and visual features for image retrieval. IEEE Trans Cybernet 45(4):767–779
Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670
Hong C, Yu J, Chen X (2013) Image-based 3D human pose recovery with locality sensitive sparse retrieval. InSystems, Man, and Cybernetics (SMC), IEEE International Conference on 2013 (pp. 2103–2108). IEEE
Yu J, Yang X, Gao F, Tao D (2017) Deep multimodal distance metric learning using click constraints for image ranking. IEEE Trans Cybernet 47(12):4014–4024
Li K, He FZ, Yu HP, Chen X (2017) A correlative classifiers approach based on particle filter and sample set for tracking occluded target. App Math-A J Chin Univ 32(3):294–312
Yu H, He F, Pan Y (2018) A novel region-based active contour model via local patch similarity measure for image segmentation. Multimed Tools Appl:1–23
Yu H, He F (2018) Pan Y. A novel segmentation model for medical images with intensity inhomogeneity based on adaptive perturbation. Multimed Tools Appl:1–20
Chen X, He F, Yu H (2018) A matting method based on full feature coverage. Multimed Tools Appl:1–29
Li K, He F, Yu H, Chen X (2017) A parallel and robust object tracking approach synthesizing adaptive Bayesian learning and improved incremental subspace learning. Front Comput Sci:1–20
Li K, He FZ, Robust Visual YHP (2018) Tracking based on convolutional features with illumination and occlusion handing. J Comput Sci Technol 33(1):223–236
Zhou Y, He F, Hou N, Qiu Y (2018) Parallel ant colony optimization on multi-core SIMD CPUs. Futur Gener Comput Syst 79:473–487
Zhou Y, He F, Qiu Y (2017) Dynamic strategy based parallel ant colony optimization on GPUs for TSPs. SCIENCE CHINA Inf Sci 60(6):068102
Wu Y, He F, Zhang D, Li X (2018) Service-oriented feature-based data exchange for cloud-based design and manufacturing. IEEE Trans Serv Comput 11(2):341–353
Drummond C, Holte RC (2003) C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. in Workshop on learning from imbalanced datasets II. Citeseer
Li J, Fong S, Meng Y, et al (2016) Adaptive Multi-objective Swarm Crossover Optimization for Imbalanced Data Classification[M]// Advanced Data Mining and Applications
Lei Yu, Liu H (2013) Feature selection for high-dimensional data: a fast correlation-based filter solution. twentieth international conference on international conference on Mach Learn
Li J, Li H, Yu JL (2012) Application of random-SMOTE on imbalanced data mining. In: Fourth international conference on Business Intelligence & Financial Engineering
Singh R, Kumar H, Singla RK (2014) Analysis of feature selection techniques for network traffic dataset. In: International conference on Machine Intelligence & Research Advancement
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explorat 11(1)
Acknowledgements
The authors are thankful for the financial support from the Research Grants, “Temporal Data Stream Mining by Using Incrementally Optimized Very Fast Decision Forest (iOVFDF), Grant no. MYRG2015-00128-FST”, “Improving the Protein-Ligand Scoring Function for Molecular Docking by Fuzzy Rule-based Machine Learning Approaches, Grant no. MYRG2016-00217-FST”, and “A Scalable Data Stream Mining Methodology: Stream-based Holistic Analytics and Reasoning in Parallel, Grant no. FDCT/126/2014/A3”, offered by the University of Macau and Macau FDCT respectively. We thank also Ms. Dantong Wang, our former MSc student, for her technical contribution to this paper as well as a geat appreciation is directed to Dr. Li Tengyue (Department of Computer and Information Science, University of Macau, Taipa, Macau SAR) for her helps.
Funding
We are the authors confirm no funding obtained.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We are the authors confirm that no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, J., Fong, S., Liu, Ls. et al. Dual feature selection and rebalancing strategy using metaheuristic optimization algorithms in X-ray image datasets. Multimed Tools Appl 78, 20913–20933 (2019). https://doi.org/10.1007/s11042-019-7354-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-7354-5