Skip to main content
Log in

Dual feature selection and rebalancing strategy using metaheuristic optimization algorithms in X-ray image datasets

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The imbalance and multi-dimension are two common problems in the medical image datasets, which affect the performances of the image processing procedures. The traditional methods to solve these two problems are notoriously difficult. Accordingly, this work employed metaheuristic methods to optimize the rebalancing process of the imbalanced class distribution for further use in the feature selection procedure for dimensionality reduction for the medical X-ray image datasets. Different metaheuristic algorithms were used to maximize the parameter values of the rebalancing and feature selection phases to preprocess the datasets. The proposed work devised a multi-objective optimization strategy in the process of the metaheuristic algorithms search to solve the problem of dual imbalanced dataset and feature selection. Afterward, a comparative study of the proposed optimized approach with the conventional methods was conducted to evaluate the proposed method performance. The results established the superiority of the proposed method to overcome the imbalanced and multi-dimensional problem. The proposed method generated a reasonable number of minority class samples and selected a sensible subset of features to ultimately obtain a very extraordinary accuracy with great credibility from a negative value of kappa and a false high accuracy. It produced higher credibility and correctness classification performance in the practical problem of medical X-ray images compared to other algorithms. Feature selection with Random-SMOTE (RSMOTE) using the self-adaptive Bat algorithm is superior to the optimization using particle swarm optimization. The proposed method using the Bat algorithm achieved 94.6% classification accuracy with 0.883 Kappa value using the lung X-ray first dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Xu, Z., King, I., Lyu, M. R. T., & Jin, R’ Discriminative semi-supervised feature selection via manifold regularization, Neural Netwo IEEE Trans 21, 7(2010): 1033–1047

    Article  Google Scholar 

  2. Jain A, Zongker D (1997) Feature selection: evaluation, application, and small sample performance. Pattern Analysis Mach Intel, IEEE Trans 19(2):153–158

    Article  Google Scholar 

  3. Nath SS et al. (2014) A survey of image classification methods and techniques. Control, Instrumentation, Communication and Computational Technologies (ICCICCT), 2014 International Conference on. IEEE

  4. Kira K, Rendell LA (1992) The feature selection problem: Traditional methods and a new algorithm. AAAI. 2

  5. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97.1:273–324

    Article  MATH  Google Scholar 

  6. Inza I et al (2004) Filter versus wrapper gene selection approaches in DNA microarray domains. Artif Intell Med 31.2:91–103

    Article  Google Scholar 

  7. Zhou X, Wang X, Dougherty ER (2004) Nonlinear probit gene classification using mutual information and wavelet-based feature selection. J Biol Syst 12(03):371–386

    Article  MATH  Google Scholar 

  8. Furey TS et al (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16.10:906–914

    Article  Google Scholar 

  9. Hsu, William H (2004) Genetic wrappers for feature selection in decision tree induction and variable ordering in Bayesian network structure learning. Inf Sci 163.1:103–122

    Article  MathSciNet  Google Scholar 

  10. Fong S, et al. (2015) Advances of Applying Metaheuristics to Data Mining Techniques Improving Knowledge Discovery through the Integration of Data Mining Techniques 75

  11. Wu G, Chang EY (2004) Aligning boundary in kernel space for learning imbalanced dataset Data Mining, 2004. ICDM'04. Fourth IEEE International Conference on. IEEE

  12. Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced datasets. Machine Learning: ECML 2004. Springer Berlin Heidelberg. 39–50

  13. Kubat M, Holte RC, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. Mach Learn 30(2–3):195–215

    Article  Google Scholar 

  14. Liu X-Y, Wu J, Zhou Z-H (2009) Exploratory undersampling for class-imbalance learning. Syst, Man, Cybernet, Part B: Cybernet, IEEE Trans 39(2):539–550

    Article  Google Scholar 

  15. Srinivas M et al. Multi-level Classification: A Generic Classification Method for Medical Datasets

  16. Thai-Nghe N, Gantner Z, Schmidt-Thieme L (2010) Cost-sensitive learning methods for imbalanced data. Neural Networks (IJCNN), The 2010 International Joint Conference on. IEEE

  17. Joshi MV, Kumar V, Agarwal RC (2001) Evaluating boosting algorithms to classify rare classes: Comparison and improvements. Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on. IEEE

  18. Krawczyk B, Schaefer G (2012) Ensemble fusion methods for medical data classification. Neural Network Applications in Electrical Engineering (NEUREL), 2012 11th Symposium on. IEEE

  19. Krawczyk B, Schaefer G, Wozniak M (2013) A cost-sensitive ensemble classifier for breast cancer classification. Applied Computational Intelligence and Informatics (SACI), 2013 IEEE 8th International Symposium on. IEEE

  20. Rohlfing T et al. (2004) Performance-based multi-classifier decision fusion for atlas-based segmentation of biomedical images. Biomedical Imaging: Nano to Macro, 2004. IEEE International Symposium on. IEEE

  21. Li X, Wang L, Sung E (2008) AdaBoost with SVM-based component classifiers. Eng Appl Artif Intell 21(5):785–795

    Article  Google Scholar 

  22. Galar M et al (2012) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. Part C: Applicat Rev IEEE Transact 42.4:463–484

    Google Scholar 

  23. Poli R, Kennedy J, Blackwell T (2007) Particle swarm optimization. Swarm intel 1(1):33–57

    Article  Google Scholar 

  24. Chatterjee S, et al. (2016) Particle swarm optimization trained neural network for structural failure prediction of multistoried RC buildings. Neural Computing and Applications 1–12

  25. Fister I et al. (2014) A novel hybrid self-adaptive bat algorithm. Sci World J

  26. Shiraishi J et al (2000) Development of a digital image database for chest radiographs with and without a lung nodule: receiver operating characteristic analysis of radiologists' detection of pulmonary nodules. Am J Roentgenol 174.1:71–74

    Article  Google Scholar 

  27. Van Ginneken B, Stegmann MB, Loog M (2006) Segmentation of anatomical structures in chest radiographs using supervised methods: a comparative study on a public database. Med Image Anal 10.1:19–40

    Article  Google Scholar 

  28. Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, Tarbox L, Prior F (2013) The Cancer imaging archive (TCIA): maintaining and operating a public information repository. J Digit Imaging 26(6):1045–1057

    Article  Google Scholar 

  29. Armato SGIII, McLennan G, Bidaut L, McNitt-Gray MF, Meyer CR, Reeves AP, Zhao B, Aberle DR, Henschke CI, Hoffman EA, Kazerooni EA, MacMahon H, van Beek EJR, Yankelevitz D et al (2011) The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on CT scans. Med Phys 38:915–931

    Article  Google Scholar 

  30. Li J et al. (2016) Adaptive Multi-objective Swarm Crossover Optimization for Imbalanced Data Classification. Advanced Data Mining and Applications: 12th International Conference, ADMA 2016, Gold Coast, QLD, Australia, December 12–15, 2016, Proceedings 12. Springer International Publishing

  31. Yang X-S (2010) A new metaheuristic bat-inspired algorithm." Nature inspired cooperative strategies for optimization (NICSO 2010). Springer Berlin Heidelberg. 65–74

  32. Chawla NV et al (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res:321–357

  33. Li J et al (2015) Improving the classification performance of biological imbalanced datasets by swarm optimization algorithms. J Supercomput:1–21

  34. Li J et al (2016) Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification. Bio Data Mining 9.1:37

    Article  Google Scholar 

  35. Ichikawa T, et al. High-b value diffusion-weighted MRI for detecting pancreatic adenocarcinoma: preliminary results. Am J Roentgenol 188.2 (2007): 409–414

  36. Li J et al (2016) Solving the under-fitting problem for decision tree algorithms by incremental swarm optimization in rare-event healthcare classification. J Med Imaging Health Inform 6.4:1102–1110

    Article  Google Scholar 

  37. Li, Jinyan, Simon Fong, and Yan Zhuang. (2015) Optimizing SMOTE by Metaheuristics with Neural Network and Decision Tree." Computational and Business Intelligence (ISCBI), 2015 3rd International Symposium on. IEEE

  38. Fonseca CM, Fleming PJ (1998) Multiobjective optimization and multiple constraint handling with evolutionary algorithms. I. A unified formulation. Syst, Man Cybernet, Part A: Syst Hum, IEEE Transact 28(1):26–37

    Article  Google Scholar 

  39. Fong S et al (2014) Feature selection in life science classification: metaheuristic swarm search. IT Prof 16.4:24–29

    Article  Google Scholar 

  40. Saba L, Dey N, Ashour AS, Samanta S, Nath SS, Chakraborty S, Sanches J, Kumar D, Marinho R, Suri JS (2016) Automated stratification of liver disease in ultrasound: an online accurate feature classification paradigm. Comput Methods Prog Biomed 130:118–134

    Article  Google Scholar 

  41. Ahmed SS, Dey N, Ashour AS, Sifaki-Pistolla D, Bălas-Timar D, Balas VE, Tavares JMR (2017) Effect of fuzzy partitioning in Crohn’s disease classification: a neuro-fuzzy-based approach. Med Biolog eng Comput 55(1):101–115

    Article  Google Scholar 

  42. Wang C, Li Z, Dey N, Ashour A, Fong S, Sherratt RS, Wu L, Shi F (2017) Histogram of oriented gradient based plantar pressure image feature extraction and classification employing fuzzy support vector machine. J Med Imaging Health Inform

  43. Samanta SO, Choudhury AL, Dey N, Ashour AS, Balas VE (2017) Quantum-inspired evolutionary algorithm for scaling factor optimization during manifold medical information embedding. InQuantum Inspired Comput Intell:285–326

  44. Chatterjee S, Sarkar S, Hore S, Dey N, Ashour AS, Balas VE (2017) Particle swarm optimization trained neural network for structural failure prediction of multistoried RC buildings. Neural Comput & Applic 28(8):2005–2016

    Article  Google Scholar 

  45. Naik A, Satapathy SC, Ashour AS, Dey N (2016) Social group optimization for global optimization of multimodal functions and data clustering problems. Neural Comput & Applic:1–17

  46. Beagum S, Dey N, Ashour AS, Sifaki-Pistolla D, Balas VE (2017) Nonparametric de-noising filter optimization using structure-based microscopic image classification. Microsc Res Tech 80(4):419–429

    Article  Google Scholar 

  47. Ashour AS, Samanta S, Dey N, Kausar N, Abdessalemkaraa WB, Hassanien AE (2015) Computed tomography image enhancement using cuckoo search: a log transform based approach. J Sign Inform Proces 6(03):244

    Article  Google Scholar 

  48. Chatterjee S, Sarkar S, Hore S, Dey N, Ashour AS, Balas VE (2017) Particle swarm optimization trained neural network for structural failure prediction of multistoried RC buildings. Neural Comput & Applic 28(8):2005–2016

    Article  Google Scholar 

  49. Yu J, Rui Y, Tao D (2014) Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Proc 23(5):2019–2032

    Article  MathSciNet  MATH  Google Scholar 

  50. Yu J, Tao D, Wang M, Rui Y (2015) Learning to rank using user clicks and visual features for image retrieval. IEEE Trans Cybernet 45(4):767–779

    Article  Google Scholar 

  51. Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670

    Article  MathSciNet  MATH  Google Scholar 

  52. Hong C, Yu J, Chen X (2013) Image-based 3D human pose recovery with locality sensitive sparse retrieval. InSystems, Man, and Cybernetics (SMC), IEEE International Conference on 2013 (pp. 2103–2108). IEEE

  53. Yu J, Yang X, Gao F, Tao D (2017) Deep multimodal distance metric learning using click constraints for image ranking. IEEE Trans Cybernet 47(12):4014–4024

    Article  Google Scholar 

  54. Li K, He FZ, Yu HP, Chen X (2017) A correlative classifiers approach based on particle filter and sample set for tracking occluded target. App Math-A J Chin Univ 32(3):294–312

    Article  MathSciNet  Google Scholar 

  55. Yu H, He F, Pan Y (2018) A novel region-based active contour model via local patch similarity measure for image segmentation. Multimed Tools Appl:1–23

  56. Yu H, He F (2018) Pan Y. A novel segmentation model for medical images with intensity inhomogeneity based on adaptive perturbation. Multimed Tools Appl:1–20

  57. Chen X, He F, Yu H (2018) A matting method based on full feature coverage. Multimed Tools Appl:1–29

  58. Li K, He F, Yu H, Chen X (2017) A parallel and robust object tracking approach synthesizing adaptive Bayesian learning and improved incremental subspace learning. Front Comput Sci:1–20

  59. Li K, He FZ, Robust Visual YHP (2018) Tracking based on convolutional features with illumination and occlusion handing. J Comput Sci Technol 33(1):223–236

    Article  Google Scholar 

  60. Zhou Y, He F, Hou N, Qiu Y (2018) Parallel ant colony optimization on multi-core SIMD CPUs. Futur Gener Comput Syst 79:473–487

    Article  Google Scholar 

  61. Zhou Y, He F, Qiu Y (2017) Dynamic strategy based parallel ant colony optimization on GPUs for TSPs. SCIENCE CHINA Inf Sci 60(6):068102

    Article  Google Scholar 

  62. Wu Y, He F, Zhang D, Li X (2018) Service-oriented feature-based data exchange for cloud-based design and manufacturing. IEEE Trans Serv Comput 11(2):341–353

    Article  Google Scholar 

  63. Drummond C, Holte RC (2003) C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. in Workshop on learning from imbalanced datasets II. Citeseer

  64. Li J, Fong S, Meng Y, et al (2016) Adaptive Multi-objective Swarm Crossover Optimization for Imbalanced Data Classification[M]// Advanced Data Mining and Applications

  65. Lei Yu, Liu H (2013) Feature selection for high-dimensional data: a fast correlation-based filter solution. twentieth international conference on international conference on Mach Learn

  66. Li J, Li H, Yu JL (2012) Application of random-SMOTE on imbalanced data mining. In: Fourth international conference on Business Intelligence & Financial Engineering

    Google Scholar 

  67. Singh R, Kumar H, Singla RK (2014) Analysis of feature selection techniques for network traffic dataset. In: International conference on Machine Intelligence & Research Advancement

    Google Scholar 

  68. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explorat 11(1)

Download references

Acknowledgements

The authors are thankful for the financial support from the Research Grants, “Temporal Data Stream Mining by Using Incrementally Optimized Very Fast Decision Forest (iOVFDF), Grant no. MYRG2015-00128-FST”, “Improving the Protein-Ligand Scoring Function for Molecular Docking by Fuzzy Rule-based Machine Learning Approaches, Grant no. MYRG2016-00217-FST”, and “A Scalable Data Stream Mining Methodology: Stream-based Holistic Analytics and Reasoning in Parallel, Grant no. FDCT/126/2014/A3”, offered by the University of Macau and Macau FDCT respectively. We thank also Ms. Dantong Wang, our former MSc student, for her technical contribution to this paper as well as a geat appreciation is directed to Dr. Li Tengyue (Department of Computer and Information Science, University of Macau, Taipa, Macau SAR) for her helps.

Funding

We are the authors confirm no funding obtained.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amira S. Ashour.

Ethics declarations

Conflict of interest

We are the authors confirm that no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, J., Fong, S., Liu, Ls. et al. Dual feature selection and rebalancing strategy using metaheuristic optimization algorithms in X-ray image datasets. Multimed Tools Appl 78, 20913–20933 (2019). https://doi.org/10.1007/s11042-019-7354-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-7354-5

Keywords

Navigation