Skip to main content
Log in

Bag-of-Visual-Words codebook generation using deep features for effective classification of imbalanced multi-class image datasets

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Classification of imbalanced multi-class image datasets is a challenging problem in computer vision. Most of the real-world datasets are imbalanced in nature because of the uneven distribution of the samples in each class. The problem with an imbalanced dataset is that the minority class having a smaller number of instance samples is left undetected. Most of the traditional machine learning algorithms can detect the majority class efficiently but lag behind in the efficient detection of the minority class, which ultimately degrades the overall performance of the classification model. In this paper, we have proposed a novel combination of visual codebook generation using deep features with the non-linear Chi2 SVM classifier to tackle the imbalance problem that arises while dealing with multi-class image datasets. The low-level deep features are first extracted by transfer learning using the ResNet-50 pre-trained network, and clustered using k-means. The center of each cluster is a visual word in the codebook. Each image is then translated into a set of features called the Bag-of-Visual-Words (BOVW) derived from the histogram of visual words in the vocabulary. The non-linear Chi2 SVM classifier is found most optimal for classifying the ensuing features, as proved by a detailed empirical analysis. Hence with the right combination of learning tools, we are able to tackle classification of multi-class imbalanced image datasets in an effective manner. This is proved from the higher scores of accuracy, F1-score and AUC metrics in our experiments on two challenging multi-class datasets: Graz-02 and TF-Flowers, as compared to the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, ... & Kudlur M (2016). Tensorflow: A system for large-scale machine learning. In 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16) (pp. 265–283).

  2. Bellet A, Habrard A, Sebban M (2013) A survey on metric learning for feature vectors and structured data. arXiv preprint arXiv:1306.6709

  3. Bellet A, Habrard A, Sebban M (2015) Metric learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 9(1):1–151

    Article  Google Scholar 

  4. Bosch A, Zisserman A, Munoz X (2007, October) Image classification using random forests and ferns. In 2007 IEEE 11th international conference on computer vision (pp. 1-8). IEEE.

  5. Brendel W, Bethge M (2019) Approximating cnns with bag-of-local-features models works surprisingly well on imagenet. arXiv preprint arXiv:1904.00760

  6. Buda M, Maki A, Mazurowski MA (2018) A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw 106:249–259

    Article  Google Scholar 

  7. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  Google Scholar 

  8. Cheng G, Li Z, Yao X, Guo L, Wei Z (2017) Remote sensing image scene classification using bag of convolutional features. IEEE Geosci Remote Sens Lett 14(10):1735–1739

    Article  Google Scholar 

  9. Convolutional Neural Networks (CNNs / ConvNets) (2019) The Stanford CS class notes, Spring 2019 Assignments, http://cs231n.github.io/convolutional-networks/, Accessed 28 August 2020.

  10. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  11. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248-255). IEEE

  12. Deselaers, T., Pimenidis, L., & Ney, H. (2008, December). Bag-of-visual-words models for adult image classification and filtering. In 2008 19th International Conference on Pattern Recognition (pp. 1-4). IEEE.

  13. Dittman DJ, Khoshgoftaar TM, Wald R, Napolitano A (2014, May). Comparison of data sampling approaches for imbalanced bioinformatics data. In The twenty-seventh international FLAIRS conference.

  14. Eitrich T, Lang B (2006) Efficient optimization of support vector machine learning parameters for unbalanced datasets. J Comput Appl Math 196(2):425–436

  15. Estabrooks A, Jo T, Japkowicz N (2004) A multiple resampling method for learning from imbalanced data sets. Comput Intell 20(1):18–36

    Article  MathSciNet  Google Scholar 

  16. Feng J, Liu Y, Wu L (2017) Bag of visual words model with deep spatial features for geographical scene classification. Computational intelligence and neuroscience 2017:1–14

    Google Scholar 

  17. Georgescu MI, Ionescu RT, Popescu M (2019) Local learning with deep and handcrafted features for facial expression recognition. IEEE Access 7:64827–64836

    Article  Google Scholar 

  18. Géron A (2019) Hands-on machine learning with Scikit-learn, Keras, and TensorFlow: concepts, tools, and techniques to build intelligent systems. O'Reilly Media

    Google Scholar 

  19. Haralick RM, Shapiro LG (1985) Image segmentation techniques. Computer vision, graphics, and image processing 29(1):100–132

    Article  Google Scholar 

  20. He H, Bai Y, Garcia EA Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence) pp 1322–1328 IEEE

  21. He K, Zhang X, Ren S, Sun J (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).

  22. Hou Q, Cheng MM, Hu X, Borji A, Tu Z, Torr PH (2017) Deeply supervised salient object detection with short connections. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pp 3203–3212

  23. Kotsiantis SB, Pintelas PE (2003) Mixture of expert agents for handling imbalanced data sets. Annals of Mathematics, Computing & Teleinformatics 1(1):46–55

    Google Scholar 

  24. Kumar MD, Babaie M, Zhu S, Kalra S, Tizhoosh HR (2017) A comparative study of CNN, BoVW and LBP for classification of histopathological images. In 2017 IEEE Symposium Series on Computational Intelligence (SSCI) pp 1–7. IEEE

  25. Lessmann S (2004) Solving imbalanced classification problems with support vector machines. In IC-AI 4:214–220

  26. Li P, Samorodnitsk G, Hopcroft J (2013) Sign cauchy projections and chi-square kernel. In Advances in Neural Information Processing Systems pp 2571–2579

  27. López V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141

    Article  Google Scholar 

  28. Mahmood A, Bennamoun M, An S, Sohel F (2017) Resfeats: residual network based features for image classification. In 2017 IEEE international conference on image processing (ICIP) pp 1597–1601 IEEE

  29. Okafor E, Pawara P, Karaaba F, Surinta O, Codreanu V, Schomaker L, Wiering M (2016, December). Comparative study between deep learning and bag of visual words for wild-animal recognition. In 2016 IEEE Symposium Series on Computational Intelligence (SSCI) (pp. 1-8). IEEE.

  30. Opelt A, Fussenegger M, Pinz A, Auer P (2004) Weak hypotheses and boosting for generic object detection and recognition. In European conference on computer vision Springer, Berlin, Heidelberg pp. 71–84

  31. Oskouei RJ, Bigham BS (2017) Over-sampling via under-sampling in strongly imbalanced data. International Journal of Advanced Intelligence Paradigms 9(1):58–66

    Article  Google Scholar 

  32. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Vanderplas J (2011) Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 12:2825–2830

  33. Powers DM (2011) Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation

    Google Scholar 

  34. Provost F (2000) Machine learning from imbalanced data sets 101. In Proceedings of the AAAI’2000 workshop on imbalanced data sets 68(2000):1–3 AAAI press

  35. Rahimi A, Recht B (2008) Random features for large-scale kernel machines. In Advances in neural information processing systems pp. 1177–1184

  36. Sáez JA, Krawczyk B, Woźniak M (2016) Analyzing the over-sampling of different classes and types of examples in multi-class imbalanced datasets. Pattern Recogn 57:164–178

    Article  Google Scholar 

  37. Saini M, Susan S (2018) Comparison of deep learning, data augmentation and bag of-visual-words for classification of imbalanced image datasets. In International Conference on Recent Trends in Image Processing and Pattern Recognition Springer, Singapore pp. 561–571

  38. Saini M, Susan S (2019) Data augmentation of minority class with transfer learning for classification of imbalanced breast Cancer dataset using inception-V3. In Iberian Conference on Pattern Recognition and Image Analysis Springer, Cham pp. 409–420

  39. Saini M, Susan S (2020) Deep transfer with minority data augmentation for imbalanced breast cancer dataset. Appl Soft Comput 97:106759

    Article  Google Scholar 

  40. Sculley D (2010) Web-scale k-means clustering. In Proceedings of the 19th international conference on World wide web pp. 1177–1178

  41. Shin HC, Roth HR, Gao M, Lu L, Xu Z, Nogues I, Yao J, Mollura D, Summers RM (2016) Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans Med Imaging 35(5):1285–1298

    Article  Google Scholar 

  42. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  43. Suh HK, Hofstee JW, IJsselmuiden J, van Henten EJ (2018) Sugar beet and volunteer potato classification using Bag-of-Visual-Words model, scale-invariant feature transform, or speeded up robust feature descriptors and crop row information. Biosyst Eng 166:210–226

    Article  Google Scholar 

  44. Susan S, Kumar A (2018, December). Hybrid of intelligent minority over-sampling and PSO-based intelligent majority under-sampling for learning from imbalanced datasets. In International Conference on Intelligent Systems Design and Applications (pp. 760-769). Springer, Cham.

  45. Susan S, Kumar A (2019) SSOMaj-SMOTE-SSOMin: three-step intelligent pruning of majority and minority samples for learning from imbalanced datasets. Appl Soft Comput 78:141–149

    Article  Google Scholar 

  46. Susan S, Jain A, Sharma A, Verma S, Jain S (2015) Fuzzy match index for scale-invariant feature transform (SIFT) features with application to face recognition with weak supervision. IET Image Process 9(11):951–958

    Article  Google Scholar 

  47. Susan S, Sethi D, Arora K CW-CAE: pulmonary nodule detection from imbalanced dataset using class-weighted convolutional autoencoder. In International Conference on Innovative Computing and Communications (pp. 825-833). Springer. Singapore.

  48. Syarif, I., Prugel-Bennett, A., & Wills, G. (2012, April). Unsupervised clustering approach for network anomaly detection. In International conference on networked digital technologies (pp. 135-145). Springer, Berlin, Heidelberg.

  49. Tahir MA, Kittler J, Yan F (2012) Inverse random under sampling for class imbalance problem and its application to multi-label classification. Pattern Recogn 45(10):3738–3750

    Article  Google Scholar 

  50. Tajbakhsh N, Shin JY, Gurudu SR, Hurst RT, Kendall CB, Gotway MB, Liang J (2016) Convolutional neural networks for medical image analysis: full training or fine tuning? IEEE Trans Med Imaging 35(5):1299–1312

    Article  Google Scholar 

  51. Tang Y (2013) Deep learning using linear support vector machines. arXiv preprint arXiv:1306.0239

  52. Tax DM, Duin RP (2000) Feature scaling in support vector data descriptions. Learning from Imbalanced Datasets, 25–30

  53. The TensorFlow Team (2019) January. Flowers, TensorFlow Datasets http://download.tensorflow.org/example_images/flower_photos.tgz

    Google Scholar 

  54. Voulodimos A, Doulamis N, Doulamis A, Protopapadakis E (2018) Deep learning for computer vision: a brief review. Computational intelligence and neuroscience 2018:1–13

    Google Scholar 

  55. Wang XD, Chen RC, Yan F, Zeng ZQ, Hong CQ (2019) Fast adaptive K-means subspace clustering for high-dimensional data. IEEE Access 7:42639–42651

    Article  Google Scholar 

  56. Wang X, Zheng Z, He Y, Yan F, Zeng Z, Yang Y (2020) Progressive local filter pruning for image retrieval acceleration. arXiv preprint arXiv:2001.08878

  57. Xia X, Xu C, Nan B (2017) Inception-v3 for flower classification. In 2017 2nd International Conference on Image, Vision and Computing (ICIVC) pp. 783–787 IEEE

  58. Yang H, Shao L, Zheng F, Wang L, Song Z (2011) Recent advances and trends in visual tracking: a review. Neurocomputing 74(18):3823–3831

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manisha Saini.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saini, M., Susan, S. Bag-of-Visual-Words codebook generation using deep features for effective classification of imbalanced multi-class image datasets. Multimed Tools Appl 80, 20821–20847 (2021). https://doi.org/10.1007/s11042-021-10612-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-10612-w

Keywords

Navigation