Abstract
Classification of imbalanced multi-class image datasets is a challenging problem in computer vision. Most of the real-world datasets are imbalanced in nature because of the uneven distribution of the samples in each class. The problem with an imbalanced dataset is that the minority class having a smaller number of instance samples is left undetected. Most of the traditional machine learning algorithms can detect the majority class efficiently but lag behind in the efficient detection of the minority class, which ultimately degrades the overall performance of the classification model. In this paper, we have proposed a novel combination of visual codebook generation using deep features with the non-linear Chi2 SVM classifier to tackle the imbalance problem that arises while dealing with multi-class image datasets. The low-level deep features are first extracted by transfer learning using the ResNet-50 pre-trained network, and clustered using k-means. The center of each cluster is a visual word in the codebook. Each image is then translated into a set of features called the Bag-of-Visual-Words (BOVW) derived from the histogram of visual words in the vocabulary. The non-linear Chi2 SVM classifier is found most optimal for classifying the ensuing features, as proved by a detailed empirical analysis. Hence with the right combination of learning tools, we are able to tackle classification of multi-class imbalanced image datasets in an effective manner. This is proved from the higher scores of accuracy, F1-score and AUC metrics in our experiments on two challenging multi-class datasets: Graz-02 and TF-Flowers, as compared to the state-of-the-art methods.
Similar content being viewed by others
References
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, ... & Kudlur M (2016). Tensorflow: A system for large-scale machine learning. In 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16) (pp. 265–283).
Bellet A, Habrard A, Sebban M (2013) A survey on metric learning for feature vectors and structured data. arXiv preprint arXiv:1306.6709
Bellet A, Habrard A, Sebban M (2015) Metric learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 9(1):1–151
Bosch A, Zisserman A, Munoz X (2007, October) Image classification using random forests and ferns. In 2007 IEEE 11th international conference on computer vision (pp. 1-8). IEEE.
Brendel W, Bethge M (2019) Approximating cnns with bag-of-local-features models works surprisingly well on imagenet. arXiv preprint arXiv:1904.00760
Buda M, Maki A, Mazurowski MA (2018) A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw 106:249–259
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Cheng G, Li Z, Yao X, Guo L, Wei Z (2017) Remote sensing image scene classification using bag of convolutional features. IEEE Geosci Remote Sens Lett 14(10):1735–1739
Convolutional Neural Networks (CNNs / ConvNets) (2019) The Stanford CS class notes, Spring 2019 Assignments, http://cs231n.github.io/convolutional-networks/, Accessed 28 August 2020.
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248-255). IEEE
Deselaers, T., Pimenidis, L., & Ney, H. (2008, December). Bag-of-visual-words models for adult image classification and filtering. In 2008 19th International Conference on Pattern Recognition (pp. 1-4). IEEE.
Dittman DJ, Khoshgoftaar TM, Wald R, Napolitano A (2014, May). Comparison of data sampling approaches for imbalanced bioinformatics data. In The twenty-seventh international FLAIRS conference.
Eitrich T, Lang B (2006) Efficient optimization of support vector machine learning parameters for unbalanced datasets. J Comput Appl Math 196(2):425–436
Estabrooks A, Jo T, Japkowicz N (2004) A multiple resampling method for learning from imbalanced data sets. Comput Intell 20(1):18–36
Feng J, Liu Y, Wu L (2017) Bag of visual words model with deep spatial features for geographical scene classification. Computational intelligence and neuroscience 2017:1–14
Georgescu MI, Ionescu RT, Popescu M (2019) Local learning with deep and handcrafted features for facial expression recognition. IEEE Access 7:64827–64836
Géron A (2019) Hands-on machine learning with Scikit-learn, Keras, and TensorFlow: concepts, tools, and techniques to build intelligent systems. O'Reilly Media
Haralick RM, Shapiro LG (1985) Image segmentation techniques. Computer vision, graphics, and image processing 29(1):100–132
He H, Bai Y, Garcia EA Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence) pp 1322–1328 IEEE
He K, Zhang X, Ren S, Sun J (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
Hou Q, Cheng MM, Hu X, Borji A, Tu Z, Torr PH (2017) Deeply supervised salient object detection with short connections. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pp 3203–3212
Kotsiantis SB, Pintelas PE (2003) Mixture of expert agents for handling imbalanced data sets. Annals of Mathematics, Computing & Teleinformatics 1(1):46–55
Kumar MD, Babaie M, Zhu S, Kalra S, Tizhoosh HR (2017) A comparative study of CNN, BoVW and LBP for classification of histopathological images. In 2017 IEEE Symposium Series on Computational Intelligence (SSCI) pp 1–7. IEEE
Lessmann S (2004) Solving imbalanced classification problems with support vector machines. In IC-AI 4:214–220
Li P, Samorodnitsk G, Hopcroft J (2013) Sign cauchy projections and chi-square kernel. In Advances in Neural Information Processing Systems pp 2571–2579
López V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141
Mahmood A, Bennamoun M, An S, Sohel F (2017) Resfeats: residual network based features for image classification. In 2017 IEEE international conference on image processing (ICIP) pp 1597–1601 IEEE
Okafor E, Pawara P, Karaaba F, Surinta O, Codreanu V, Schomaker L, Wiering M (2016, December). Comparative study between deep learning and bag of visual words for wild-animal recognition. In 2016 IEEE Symposium Series on Computational Intelligence (SSCI) (pp. 1-8). IEEE.
Opelt A, Fussenegger M, Pinz A, Auer P (2004) Weak hypotheses and boosting for generic object detection and recognition. In European conference on computer vision Springer, Berlin, Heidelberg pp. 71–84
Oskouei RJ, Bigham BS (2017) Over-sampling via under-sampling in strongly imbalanced data. International Journal of Advanced Intelligence Paradigms 9(1):58–66
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Vanderplas J (2011) Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 12:2825–2830
Powers DM (2011) Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation
Provost F (2000) Machine learning from imbalanced data sets 101. In Proceedings of the AAAI’2000 workshop on imbalanced data sets 68(2000):1–3 AAAI press
Rahimi A, Recht B (2008) Random features for large-scale kernel machines. In Advances in neural information processing systems pp. 1177–1184
Sáez JA, Krawczyk B, Woźniak M (2016) Analyzing the over-sampling of different classes and types of examples in multi-class imbalanced datasets. Pattern Recogn 57:164–178
Saini M, Susan S (2018) Comparison of deep learning, data augmentation and bag of-visual-words for classification of imbalanced image datasets. In International Conference on Recent Trends in Image Processing and Pattern Recognition Springer, Singapore pp. 561–571
Saini M, Susan S (2019) Data augmentation of minority class with transfer learning for classification of imbalanced breast Cancer dataset using inception-V3. In Iberian Conference on Pattern Recognition and Image Analysis Springer, Cham pp. 409–420
Saini M, Susan S (2020) Deep transfer with minority data augmentation for imbalanced breast cancer dataset. Appl Soft Comput 97:106759
Sculley D (2010) Web-scale k-means clustering. In Proceedings of the 19th international conference on World wide web pp. 1177–1178
Shin HC, Roth HR, Gao M, Lu L, Xu Z, Nogues I, Yao J, Mollura D, Summers RM (2016) Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans Med Imaging 35(5):1285–1298
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Suh HK, Hofstee JW, IJsselmuiden J, van Henten EJ (2018) Sugar beet and volunteer potato classification using Bag-of-Visual-Words model, scale-invariant feature transform, or speeded up robust feature descriptors and crop row information. Biosyst Eng 166:210–226
Susan S, Kumar A (2018, December). Hybrid of intelligent minority over-sampling and PSO-based intelligent majority under-sampling for learning from imbalanced datasets. In International Conference on Intelligent Systems Design and Applications (pp. 760-769). Springer, Cham.
Susan S, Kumar A (2019) SSOMaj-SMOTE-SSOMin: three-step intelligent pruning of majority and minority samples for learning from imbalanced datasets. Appl Soft Comput 78:141–149
Susan S, Jain A, Sharma A, Verma S, Jain S (2015) Fuzzy match index for scale-invariant feature transform (SIFT) features with application to face recognition with weak supervision. IET Image Process 9(11):951–958
Susan S, Sethi D, Arora K CW-CAE: pulmonary nodule detection from imbalanced dataset using class-weighted convolutional autoencoder. In International Conference on Innovative Computing and Communications (pp. 825-833). Springer. Singapore.
Syarif, I., Prugel-Bennett, A., & Wills, G. (2012, April). Unsupervised clustering approach for network anomaly detection. In International conference on networked digital technologies (pp. 135-145). Springer, Berlin, Heidelberg.
Tahir MA, Kittler J, Yan F (2012) Inverse random under sampling for class imbalance problem and its application to multi-label classification. Pattern Recogn 45(10):3738–3750
Tajbakhsh N, Shin JY, Gurudu SR, Hurst RT, Kendall CB, Gotway MB, Liang J (2016) Convolutional neural networks for medical image analysis: full training or fine tuning? IEEE Trans Med Imaging 35(5):1299–1312
Tang Y (2013) Deep learning using linear support vector machines. arXiv preprint arXiv:1306.0239
Tax DM, Duin RP (2000) Feature scaling in support vector data descriptions. Learning from Imbalanced Datasets, 25–30
The TensorFlow Team (2019) January. Flowers, TensorFlow Datasets http://download.tensorflow.org/example_images/flower_photos.tgz
Voulodimos A, Doulamis N, Doulamis A, Protopapadakis E (2018) Deep learning for computer vision: a brief review. Computational intelligence and neuroscience 2018:1–13
Wang XD, Chen RC, Yan F, Zeng ZQ, Hong CQ (2019) Fast adaptive K-means subspace clustering for high-dimensional data. IEEE Access 7:42639–42651
Wang X, Zheng Z, He Y, Yan F, Zeng Z, Yang Y (2020) Progressive local filter pruning for image retrieval acceleration. arXiv preprint arXiv:2001.08878
Xia X, Xu C, Nan B (2017) Inception-v3 for flower classification. In 2017 2nd International Conference on Image, Vision and Computing (ICIVC) pp. 783–787 IEEE
Yang H, Shao L, Zheng F, Wang L, Song Z (2011) Recent advances and trends in visual tracking: a review. Neurocomputing 74(18):3823–3831
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Saini, M., Susan, S. Bag-of-Visual-Words codebook generation using deep features for effective classification of imbalanced multi-class image datasets. Multimed Tools Appl 80, 20821–20847 (2021). https://doi.org/10.1007/s11042-021-10612-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-10612-w