skip to main content
research-article

A Latent Variable Augmentation Method for Image Categorization with Insufficient Training Samples

Published: 20 July 2021 Publication History

Abstract

Over the past few years, we have made great progress in image categorization based on convolutional neural networks (CNNs). These CNNs are always trained based on a large-scale image data set; however, people may only have limited training samples for training CNN in the real-world applications. To solve this problem, one intuition is augmenting training samples. In this article, we propose an algorithm called Lavagan (Latent Variables Augmentation Method based on Generative Adversarial Nets) to improve the performance of CNN with insufficient training samples. The proposed Lavagan method is mainly composed of two tasks. The first task is that we augment a number latent variables (LVs) from a set of adaptive and constrained LVs distributions. In the second task, we take the augmented LVs into the training procedure of the image classifier. By taking these two tasks into account, we propose a uniform objective function to incorporate the two tasks into the learning. We then put forward an alternative two-play minimization game to minimize this uniform loss function such that we can obtain the predictive classifier. Moreover, based on Hoeffding’s Inequality and Chernoff Bounding method, we analyze the feasibility and efficiency of the proposed Lavagan method, which manifests that the LV augmentation method is able to improve the performance of Lavagan with insufficient training samples. Finally, the experiment has shown that the proposed Lavagan method is able to deliver more accurate performance than the existing state-of-the-art methods.

References

[1]
M. Abramowitz and I. A. Stegun. 1972. Handbook of Mathematical Functions.National Bureau of Standards.
[2]
Joakim Andén and Stéphane Mallat. 2011. Multiscale scattering for audio classification. In Proceedings of the 12th International Society for Music Information Retrieval Conference. 657–662.
[3]
Antreas Antoniou, Amos Storkey, and Harrison Edwards. 2017. Data augmentation generative adversarial networks. arXiv:1711.04340. Retrieved from https://arxiv.org/abs/1711.04340.
[4]
Shuai Bai, Zhiqun He, Tingbing Xu, Zheng Zhu, Yuan Dong, and Hongliang Bai. 2018. Multi-hierarchical independent correlation filters for visual tracking.arXiv:1811.10302. Retrieved from https://arxiv.org/abs/1811.10302.
[5]
Yoshua Bengio. 2012. Deep learning of representations for unsupervised and transfer learning. In Proceedings of ICML Workshop on Unsupervised and Transfer Learning. 17–36.
[6]
Yoshua Bengio, Li Yao, Guillaume Alain, and Pascal Vincent. 2013. Generalized denoising auto-encoders as generative models. In Proceedings of the 26th International Conference on Neural Information Processing Systems (2013), 899–907.
[7]
Marc Berthod, Zoltan Kato, Shan Yu, and Josiane Zerubia. 1996. Bayesian image classification using Markov random fields. Image and Vision Computing 14, 4 (1996), 285–295.
[8]
Francesco Calimeri, Aldo Marzullo, Claudio Stamile, and Giorgio Terracina. 2017. Biomedical data augmentation using generative adversarial neural networks. In International Conference on Artificial Neural Networks. Springer, 626–634.
[9]
Tsung-Han Chan, Kui Jia, Shenghua Gao, Jiwen Lu, Zinan Zeng, and Yi Ma. 2015. PCANet: A simple deep learning baseline for image classification?IEEE Transactions on Image Processing 24, 12 (2015), 5017–5032.
[10]
Yan Chen, Xiangnan Yang, Bineng Zhong, Shengnan Pan, Duansheng Chen, and Huizhen Zhang. 2016. Cnntracker: Online discriminative object tracking via deep convolutional neural network. Applied Soft Computing 38 (2016), 1088–1098.
[11]
Ekin D. Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V. Le. 2019. Autoaugment: Learning augmentation strategies from data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 113–123.
[12]
Navneet Dalal and Bill Triggs. 2005. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 1. 886–893.
[13]
Janez Demšar. 2006. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, Jan (2006), 1–30.
[14]
Carl Doersch. 2016. Tutorial on variational autoencoders. arXiv:1606.05908. Retrieved from https://arxiv.org/abs/1606.05908.
[15]
Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor Darrell. 2014. DeCAF: A deep convolutional activation feature for generic visual recognition. In Proceedings of the 31st International Conference on International Conference on Machine Learning. 647–655.
[16]
Jiashi Feng and Trevor Darrell. 2015. Learning the structure of deep convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. 2749–2757.
[17]
Jie Geng, Jianchao Fan, Hongyu Wang, Xiaorui Ma, Baoming Li, and Fuliang Chen. 2015. High-resolution SAR image classification via deep convolutional autoencoders. IEEE Geoscience and Remote Sensing Letters 12, 11 (2015), 2351–2355.
[18]
Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics. 249–256.
[19]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the 27th International Conference on Advances in Neural Information Processing Systems. 2672–2680.
[20]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision. 1026–1034.
[21]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[22]
Sungeun Hong, Woobin Im, Jongbin Ryu, and Hyun S Yang. 2017. Sspp-dan: Deep domain adaptation network for face recognition with single sample per person. In Proceedings of the Image Processing, 2017 International Conference on IEEE, 825–829.
[23]
Ting-I Hsieh, Yi-Chen Lo, Hwann-Tzong Chen, and Tyng-Luh Liu. 2019. One-shot object detection with co-attention and co-excitation. In Proceedings of the 33rd Conference on Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 2721–2730. http://papers.nips.cc/paper/8540-one-shot-object-detection-with-co-attention-and-co-excitation.pdf.
[24]
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4700– 4708.
[25]
Aiwen Jiang, Chunheng Wang, and Yuanping Zhu. 2008. Calibrated rank-SVM for multi-label image categorization. In Proceedings of the International Joint Conference on Neural Networks, 2008, part of the IEEE World Congress on Computational Intelligence, 2008, Hong Kong, China, June 1-6, 2008.
[26]
Shuji Kawaguchi and Ryuei Nishii. 2007. Hyperspectral image classification by bootstrap AdaBoost with random decision stumps. IEEE Transactions on Geoscience and Remote Sensing 45, 11 (2007), 3845–3851.
[27]
Rohit Keshari, Mayank Vatsa, Richa Singh, and Afzel Noore. 2018. Learning structure and strength of CNN filters for small sample size training. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9349–9358.
[28]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv:1412.6980 Retrieved from https://arxiv.org/abs/1412.6980.
[29]
Diederik P. Kingma and Max Welling. 2014. Auto-encoding variational bayes. In Proceedings of the International Conference on Learning Representations (2014).
[30]
Alex Krizhevsky. 2009. Learning multiple layers of features from tiny images. (2009).
[31]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems25, 2 (2012), 1097–1105.
[32]
Yann LeCun, Fu Jie Huang, and Leon Bottou. 2004. Learning methods for generic object recognition with invariance to pose and lighting. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Vol. 2. II–104.
[33]
Roy B. Leipnik. 1991. On lognormal random variables: I-the characteristic function. The ANZIAM Journal 32, 3 (1991), 327–347.
[34]
Yitan Li, Linli Xu, Fei Tian, Liang Jiang, Xiaowei Zhong, and Enhong Chen. 2015. Word embedding revisited: A new representation learning and explicit matrix factorization perspective. In Proceedings of the 24th International Joint Conference on Artificial Intelligence.
[35]
Sungbin Lim, Ildoo Kim, Taesup Kim, Chiheon Kim, and Sungwoong Kim. 2019. Fast autoaugment. In Proceedings of the Advances in Neural Information Processing Systems. 6662–6672.
[36]
Chen Lin, Minghao Guo, Chuming Li, Xin Yuan, Wei Wu, Junjie Yan, Dahua Lin, and Wanli Ouyang. 2019. Online hyper-parameter learning for auto-augmentation strategy. In Proceedings of the IEEE International Conference on Computer Vision. 6579–6588.
[37]
Luyue Lin, Bo Liu, and Yanshan Xiao. 2019. COB method with online learning for object tracking. Neurocomputing (2019).
[38]
Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2017. Adversarial multi-task learning for text classification. arXiv:1704. 05742 Retrieved from https://arxiv.org/abs/1704.05742.
[39]
David G. Lowe. 1999. Object recognition from local scale-invariant features. In Proceedings of the 7th IEEE International Conference on Computer Vision.
[40]
Yann Lécun, Leon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278–2324.
[41]
Rongfu Mao, Haichao Zhu, Linke Zhang, and Aizhi Chen. 2006. A new method to assist small data set neural network learning. In Proceedings of the 6th International Conference on Intelligent Systems Design and Applications Vol. 1. 17–22.
[42]
Jonathan Masci, Ueli Meier, Dan C. Ciresan, and Jurgen Schmidhuber. 2011. Stacked convolutional auto-encoders for hierarchical feature extraction. In Proceedings of the 21st International Conference on Artificial Neural Networks (2011), 52–59.
[43]
Sancho McCann and David G. Lowe. 2012. Local naive bayes nearest neighbor for image classification. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. 3650–3656.
[44]
Neelesh B. Mehta, Jingxian Wu, Andreas F. Molisch, and Jin Zhang. 2007. Approximating a sum of random variables with a lognormal. IEEE Transactions on Wireless Communications 6, 7 (2007), 2690–2699.
[45]
Richard Nock, Paolo Piro, Frank Nielsen, and Michel Barlaud. [n.d.]. Boostingk-NN for categorization of natural scenes. International Journal of Computer Vision 100, 3 ([n. d.]), 294–314.
[46]
Edouard Oyallon, Eugene Belilovsky, and Sergey Zagoruyko. 2017. Scaling the scattering transform: Deep hybrid networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision. 5619–5628.
[47]
Pau Panareda Busto and Juergen Gall. 2017. Open set domain adaptation. In Proceedings of the IEEE International Conference on Computer Vision. 754–763.
[48]
M. Atif Qureshi and Derek Greene. 2018. EVE: Explainable vector based embedding technique using Wikipedia. Journal of Intelligent Information Systems 53, 4 (2018), 1–29.
[49]
Periyasamy Rajendran and Muthusamy Madheswaran. 2010. Hybrid medical image classification using association rule mining with decision tree algorithm. arXiv:1001.3503 Retrieved from https://arxiv.org/abs/1001.3503.
[50]
Antti Rasmus, Mathias Berglund, Mikko Honkala, Harri Valpola, and Tapani Raiko. 2015. Semi-supervised learning with ladder networks. In Proceedings of the Advances in Neural Information Processing Systems. 3546–3554.
[51]
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 779–788.
[52]
Mengye Ren, Renjie Liao, Ethan Fetaya, and Richard S. Zemel. 2018. Incremental few-shot learning with attention attractor networks. arXiv:1810.07218 Retrieved from https://arxiv.org/abs/1810.07218.
[53]
Kate Saenko, Brian Kulis, Mario Fritz, and Trevor Darrell. 2010. Adapting visual category models to new domains. In European Conference on Computer Vision. Springer, 213–226.
[54]
Jorge Sanchez and Florent Perronnin. 2011. High-dimensional signature compression for large-scale image classification. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. 1665–1672.
[55]
Xiangbo Shu, Guo-Jun Qi, Jinhui Tang, and Jingdong Wang. 2015. Weakly-shared deep transfer networks for heterogeneous-domain knowledge propagation. In Proceedings of the 23rd ACM International Conference on Multimedia. ACM, 35–44.
[56]
Marc Simard, Susan S. Saatchi, and Gianfranco De Grandi. 2000. The use of decision tree and multiscale texture for classification of JERS-1 SAR data over tropical forest. IEEE Transactions on Geoscience and Remote Sensing 38, 5 (2000), 2310–2321.
[57]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 Retrieved from https://arxiv.org/abs/1409.1556.
[58]
S. C. Suddarth and Y. L. Kergosien. 1990. Rule-injection hints as a means of improving network performance and learning time. In Proceedings of the EURASIP Workshop 1990 on Neural Networks. Springer-Verlag, Berlin, 120–129.
[59]
Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip HS Torr, and Timothy M Hospedales. 2018. Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1199–1208.
[60]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich, et al. 2015. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition.
[61]
Nikolaj Tatti. 2007. Distances between data sets based on summary statistics. Journal of Machine Learning Research 8, Jan (2007), 131–154.
[62]
Sebastian Thrun. 1996. Is learning the n-th thing any easier than learning the first? In Proceedings of the Advances in Neural Information Processing Systems 8, D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo (Eds.). MIT Press, 640–646. http://papers.nips.cc/paper/1034-is-learning-the-n-th-thing-any-easier-than-learning-the-first.pdf.
[63]
Bo Wang, Minghui Qiu, Xisen Wang, Yaliang Li, Yu Gong, Xiaoyi Zeng, Jun Huang, Bo Zheng, Deng Cai, and Jingren Zhou. 2019. A minimax game for instance based selective transfer learning. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 34–43.
[64]
Yuxuan Wang and DeLiang Wang. 2013. Towards scaling up classification-based speech separation. IEEE Transactions on Audio, Speech, and Language Processing 21, 7 (2013), 1381–1390.
[65]
Yu-Xiong Wang, Ross Girshick, Martial Hebert, and Bharath Hariharan. 2018. Low-shot learning from imaginary data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7278–7286.
[66]
Martin Wiegand and Saralees Nadarajah. 2018. Approximation methods for lognormal characteristic functions. Journal of Statistical Computation and Simulation 88, 18 (2018), 3650–3663.
[67]
Matthias Wilms, Heinz Handels, and Jan Ehrhardt. 2017. Multi-resolution multi-object statistical shape models based on the locality assumption. Medical Image Analysis 38 (2017), 17–29.
[68]
Wei Xiong, Bo Du, Lefei Zhang, Ruimin Hu, and Dacheng Tao. 2016. Regularizing deep convolutional neural networks with a structured decorrelation constraint. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining. 519–528.
[69]
Xiaofeng Zhang, Zhangyang Wang, Dong Liu, and Qing Ling. 2018. DADA: Deep adversarial data augmentation for extremely low data regime classification. arXiv:1809.00981 Retrieved from https://arxiv.org/abs/1809.00981.
[70]
Junbo Zhao, Michael Mathieu, Ross Goroshin, and Yann Lecun. 2016. Stacked what-where auto-encoders. Computer Science 15, 1 (2016), 3563–3593.
[71]
Xinyue Zhu, Yifan Liu, Jiahong Li, Tao Wan, and Zengchang Qin. 2018. Emotion classification with data augmentation using generative adversarial networks. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 349–360.
[72]
Yizhe Zhu, Mohamed Elhoseiny, Bingchen Liu, Xi Peng, and Ahmed Elgammal. 2018. A generative adversarial approach for zero-shot learning from noisy texts. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1004–1013.

Cited By

View all
  • (2023)MCoR-Miner: Maximal Co-Occurrence Nonoverlapping Sequential Rule MiningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.324121335:9(9531-9546)Online publication date: 1-Sep-2023

Index Terms

  1. A Latent Variable Augmentation Method for Image Categorization with Insufficient Training Samples

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Knowledge Discovery from Data
    ACM Transactions on Knowledge Discovery from Data  Volume 16, Issue 1
    February 2022
    475 pages
    ISSN:1556-4681
    EISSN:1556-472X
    DOI:10.1145/3472794
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 July 2021
    Accepted: 01 February 2021
    Revised: 01 June 2020
    Received: 01 February 2020
    Published in TKDD Volume 16, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. CNN
    2. image recognition

    Qualifiers

    • Research-article
    • Refereed

    Funding Sources

    • Natural Science Foundation of China
    • Guangdong Basic and Applied Basic Research Foundation
    • Science and Technology Planning Project of Guangzhou

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)15
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)MCoR-Miner: Maximal Co-Occurrence Nonoverlapping Sequential Rule MiningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.324121335:9(9531-9546)Online publication date: 1-Sep-2023

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media