research-article

A Latent Variable Augmentation Method for Image Categorization with Insufficient Training Samples

Authors:

Yanshan XiaoAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data (TKDD), Volume 16, Issue 1

Article No.: 3, Pages 1 - 35

https://doi.org/10.1145/3451165

Published: 20 July 2021 Publication History

Abstract

Over the past few years, we have made great progress in image categorization based on convolutional neural networks (CNNs). These CNNs are always trained based on a large-scale image data set; however, people may only have limited training samples for training CNN in the real-world applications. To solve this problem, one intuition is augmenting training samples. In this article, we propose an algorithm called Lavagan (Latent Variables Augmentation Method based on Generative Adversarial Nets) to improve the performance of CNN with insufficient training samples. The proposed Lavagan method is mainly composed of two tasks. The first task is that we augment a number latent variables (LVs) from a set of adaptive and constrained LVs distributions. In the second task, we take the augmented LVs into the training procedure of the image classifier. By taking these two tasks into account, we propose a uniform objective function to incorporate the two tasks into the learning. We then put forward an alternative two-play minimization game to minimize this uniform loss function such that we can obtain the predictive classifier. Moreover, based on Hoeffding’s Inequality and Chernoff Bounding method, we analyze the feasibility and efficiency of the proposed Lavagan method, which manifests that the LV augmentation method is able to improve the performance of Lavagan with insufficient training samples. Finally, the experiment has shown that the proposed Lavagan method is able to deliver more accurate performance than the existing state-of-the-art methods.

References

[1]

M. Abramowitz and I. A. Stegun. 1972. Handbook of Mathematical Functions.National Bureau of Standards.

Digital Library

[2]

Joakim Andén and Stéphane Mallat. 2011. Multiscale scattering for audio classification. In Proceedings of the 12th International Society for Music Information Retrieval Conference. 657–662.

[3]

Antreas Antoniou, Amos Storkey, and Harrison Edwards. 2017. Data augmentation generative adversarial networks. arXiv:1711.04340. Retrieved from https://arxiv.org/abs/1711.04340.

[4]

Shuai Bai, Zhiqun He, Tingbing Xu, Zheng Zhu, Yuan Dong, and Hongliang Bai. 2018. Multi-hierarchical independent correlation filters for visual tracking.arXiv:1811.10302. Retrieved from https://arxiv.org/abs/1811.10302.

[5]

Yoshua Bengio. 2012. Deep learning of representations for unsupervised and transfer learning. In Proceedings of ICML Workshop on Unsupervised and Transfer Learning. 17–36.

Digital Library

[6]

Yoshua Bengio, Li Yao, Guillaume Alain, and Pascal Vincent. 2013. Generalized denoising auto-encoders as generative models. In Proceedings of the 26th International Conference on Neural Information Processing Systems (2013), 899–907.

Digital Library

[7]

Marc Berthod, Zoltan Kato, Shan Yu, and Josiane Zerubia. 1996. Bayesian image classification using Markov random fields. Image and Vision Computing 14, 4 (1996), 285–295.

[8]

Francesco Calimeri, Aldo Marzullo, Claudio Stamile, and Giorgio Terracina. 2017. Biomedical data augmentation using generative adversarial neural networks. In International Conference on Artificial Neural Networks. Springer, 626–634.

[9]

Tsung-Han Chan, Kui Jia, Shenghua Gao, Jiwen Lu, Zinan Zeng, and Yi Ma. 2015. PCANet: A simple deep learning baseline for image classification?IEEE Transactions on Image Processing 24, 12 (2015), 5017–5032.

[10]

Yan Chen, Xiangnan Yang, Bineng Zhong, Shengnan Pan, Duansheng Chen, and Huizhen Zhang. 2016. Cnntracker: Online discriminative object tracking via deep convolutional neural network. Applied Soft Computing 38 (2016), 1088–1098.

Digital Library

[11]

Ekin D. Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V. Le. 2019. Autoaugment: Learning augmentation strategies from data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 113–123.

[12]

Navneet Dalal and Bill Triggs. 2005. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 1. 886–893.

Digital Library

[13]

Janez Demšar. 2006. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, Jan (2006), 1–30.

Digital Library

[14]

Carl Doersch. 2016. Tutorial on variational autoencoders. arXiv:1606.05908. Retrieved from https://arxiv.org/abs/1606.05908.

[15]

Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor Darrell. 2014. DeCAF: A deep convolutional activation feature for generic visual recognition. In Proceedings of the 31st International Conference on International Conference on Machine Learning. 647–655.

Digital Library

[16]

Jiashi Feng and Trevor Darrell. 2015. Learning the structure of deep convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. 2749–2757.

Digital Library

[17]

Jie Geng, Jianchao Fan, Hongyu Wang, Xiaorui Ma, Baoming Li, and Fuliang Chen. 2015. High-resolution SAR image classification via deep convolutional autoencoders. IEEE Geoscience and Remote Sensing Letters 12, 11 (2015), 2351–2355.

[18]

Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics. 249–256.

[19]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the 27th International Conference on Advances in Neural Information Processing Systems. 2672–2680.

Digital Library

[20]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision. 1026–1034.

Digital Library

[21]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.

[22]

Sungeun Hong, Woobin Im, Jongbin Ryu, and Hyun S Yang. 2017. Sspp-dan: Deep domain adaptation network for face recognition with single sample per person. In Proceedings of the Image Processing, 2017 International Conference on IEEE, 825–829.

[23]

Ting-I Hsieh, Yi-Chen Lo, Hwann-Tzong Chen, and Tyng-Luh Liu. 2019. One-shot object detection with co-attention and co-excitation. In Proceedings of the 33rd Conference on Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 2721–2730. http://papers.nips.cc/paper/8540-one-shot-object-detection-with-co-attention-and-co-excitation.pdf.

Digital Library

[24]

Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4700– 4708.

[25]

Aiwen Jiang, Chunheng Wang, and Yuanping Zhu. 2008. Calibrated rank-SVM for multi-label image categorization. In Proceedings of the International Joint Conference on Neural Networks, 2008, part of the IEEE World Congress on Computational Intelligence, 2008, Hong Kong, China, June 1-6, 2008.

[26]

Shuji Kawaguchi and Ryuei Nishii. 2007. Hyperspectral image classification by bootstrap AdaBoost with random decision stumps. IEEE Transactions on Geoscience and Remote Sensing 45, 11 (2007), 3845–3851.

[27]

Rohit Keshari, Mayank Vatsa, Richa Singh, and Afzel Noore. 2018. Learning structure and strength of CNN filters for small sample size training. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9349–9358.

[28]

Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv:1412.6980 Retrieved from https://arxiv.org/abs/1412.6980.

[29]

Diederik P. Kingma and Max Welling. 2014. Auto-encoding variational bayes. In Proceedings of the International Conference on Learning Representations (2014).

[30]

Alex Krizhevsky. 2009. Learning multiple layers of features from tiny images. (2009).

[31]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems25, 2 (2012), 1097–1105.

Digital Library

[32]

Yann LeCun, Fu Jie Huang, and Leon Bottou. 2004. Learning methods for generic object recognition with invariance to pose and lighting. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Vol. 2. II–104.

Digital Library

[33]

Roy B. Leipnik. 1991. On lognormal random variables: I-the characteristic function. The ANZIAM Journal 32, 3 (1991), 327–347.

[34]

Yitan Li, Linli Xu, Fei Tian, Liang Jiang, Xiaowei Zhong, and Enhong Chen. 2015. Word embedding revisited: A new representation learning and explicit matrix factorization perspective. In Proceedings of the 24th International Joint Conference on Artificial Intelligence.

Digital Library

[35]

Sungbin Lim, Ildoo Kim, Taesup Kim, Chiheon Kim, and Sungwoong Kim. 2019. Fast autoaugment. In Proceedings of the Advances in Neural Information Processing Systems. 6662–6672.

Digital Library

[36]

Chen Lin, Minghao Guo, Chuming Li, Xin Yuan, Wei Wu, Junjie Yan, Dahua Lin, and Wanli Ouyang. 2019. Online hyper-parameter learning for auto-augmentation strategy. In Proceedings of the IEEE International Conference on Computer Vision. 6579–6588.

[37]

Luyue Lin, Bo Liu, and Yanshan Xiao. 2019. COB method with online learning for object tracking. Neurocomputing (2019).

[38]

Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2017. Adversarial multi-task learning for text classification. arXiv:1704. 05742 Retrieved from https://arxiv.org/abs/1704.05742.

[39]

David G. Lowe. 1999. Object recognition from local scale-invariant features. In Proceedings of the 7th IEEE International Conference on Computer Vision.

Digital Library

[40]

Yann Lécun, Leon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278–2324.

[41]

Rongfu Mao, Haichao Zhu, Linke Zhang, and Aizhi Chen. 2006. A new method to assist small data set neural network learning. In Proceedings of the 6th International Conference on Intelligent Systems Design and Applications Vol. 1. 17–22.

Digital Library

[42]

Jonathan Masci, Ueli Meier, Dan C. Ciresan, and Jurgen Schmidhuber. 2011. Stacked convolutional auto-encoders for hierarchical feature extraction. In Proceedings of the 21st International Conference on Artificial Neural Networks (2011), 52–59.

Digital Library

[43]

Sancho McCann and David G. Lowe. 2012. Local naive bayes nearest neighbor for image classification. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. 3650–3656.

Digital Library

[44]

Neelesh B. Mehta, Jingxian Wu, Andreas F. Molisch, and Jin Zhang. 2007. Approximating a sum of random variables with a lognormal. IEEE Transactions on Wireless Communications 6, 7 (2007), 2690–2699.

Digital Library

[45]

Richard Nock, Paolo Piro, Frank Nielsen, and Michel Barlaud. [n.d.]. Boostingk-NN for categorization of natural scenes. International Journal of Computer Vision 100, 3 ([n. d.]), 294–314.

Digital Library

[46]

Edouard Oyallon, Eugene Belilovsky, and Sergey Zagoruyko. 2017. Scaling the scattering transform: Deep hybrid networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision. 5619–5628.

[47]

Pau Panareda Busto and Juergen Gall. 2017. Open set domain adaptation. In Proceedings of the IEEE International Conference on Computer Vision. 754–763.

[48]

M. Atif Qureshi and Derek Greene. 2018. EVE: Explainable vector based embedding technique using Wikipedia. Journal of Intelligent Information Systems 53, 4 (2018), 1–29.

Digital Library

[49]

Periyasamy Rajendran and Muthusamy Madheswaran. 2010. Hybrid medical image classification using association rule mining with decision tree algorithm. arXiv:1001.3503 Retrieved from https://arxiv.org/abs/1001.3503.

[50]

Antti Rasmus, Mathias Berglund, Mikko Honkala, Harri Valpola, and Tapani Raiko. 2015. Semi-supervised learning with ladder networks. In Proceedings of the Advances in Neural Information Processing Systems. 3546–3554.

Digital Library

[51]

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 779–788.

[52]

Mengye Ren, Renjie Liao, Ethan Fetaya, and Richard S. Zemel. 2018. Incremental few-shot learning with attention attractor networks. arXiv:1810.07218 Retrieved from https://arxiv.org/abs/1810.07218.

Digital Library

[53]

Kate Saenko, Brian Kulis, Mario Fritz, and Trevor Darrell. 2010. Adapting visual category models to new domains. In European Conference on Computer Vision. Springer, 213–226.

Digital Library

[54]

Jorge Sanchez and Florent Perronnin. 2011. High-dimensional signature compression for large-scale image classification. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. 1665–1672.

Digital Library

[55]

Xiangbo Shu, Guo-Jun Qi, Jinhui Tang, and Jingdong Wang. 2015. Weakly-shared deep transfer networks for heterogeneous-domain knowledge propagation. In Proceedings of the 23rd ACM International Conference on Multimedia. ACM, 35–44.

Digital Library

[56]

Marc Simard, Susan S. Saatchi, and Gianfranco De Grandi. 2000. The use of decision tree and multiscale texture for classification of JERS-1 SAR data over tropical forest. IEEE Transactions on Geoscience and Remote Sensing 38, 5 (2000), 2310–2321.

[57]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 Retrieved from https://arxiv.org/abs/1409.1556.

[58]

S. C. Suddarth and Y. L. Kergosien. 1990. Rule-injection hints as a means of improving network performance and learning time. In Proceedings of the EURASIP Workshop 1990 on Neural Networks. Springer-Verlag, Berlin, 120–129.

Digital Library

[59]

Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip HS Torr, and Timothy M Hospedales. 2018. Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1199–1208.

[60]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich, et al. 2015. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition.

[61]

Nikolaj Tatti. 2007. Distances between data sets based on summary statistics. Journal of Machine Learning Research 8, Jan (2007), 131–154.

Digital Library

[62]

Sebastian Thrun. 1996. Is learning the n-th thing any easier than learning the first? In Proceedings of the Advances in Neural Information Processing Systems 8, D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo (Eds.). MIT Press, 640–646. http://papers.nips.cc/paper/1034-is-learning-the-n-th-thing-any-easier-than-learning-the-first.pdf.

Digital Library

[63]

Bo Wang, Minghui Qiu, Xisen Wang, Yaliang Li, Yu Gong, Xiaoyi Zeng, Jun Huang, Bo Zheng, Deng Cai, and Jingren Zhou. 2019. A minimax game for instance based selective transfer learning. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 34–43.

Digital Library

[64]

Yuxuan Wang and DeLiang Wang. 2013. Towards scaling up classification-based speech separation. IEEE Transactions on Audio, Speech, and Language Processing 21, 7 (2013), 1381–1390.

Digital Library

[65]

Yu-Xiong Wang, Ross Girshick, Martial Hebert, and Bharath Hariharan. 2018. Low-shot learning from imaginary data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7278–7286.

[66]

Martin Wiegand and Saralees Nadarajah. 2018. Approximation methods for lognormal characteristic functions. Journal of Statistical Computation and Simulation 88, 18 (2018), 3650–3663.

[67]

Matthias Wilms, Heinz Handels, and Jan Ehrhardt. 2017. Multi-resolution multi-object statistical shape models based on the locality assumption. Medical Image Analysis 38 (2017), 17–29.

[68]

Wei Xiong, Bo Du, Lefei Zhang, Ruimin Hu, and Dacheng Tao. 2016. Regularizing deep convolutional neural networks with a structured decorrelation constraint. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining. 519–528.

[69]

Xiaofeng Zhang, Zhangyang Wang, Dong Liu, and Qing Ling. 2018. DADA: Deep adversarial data augmentation for extremely low data regime classification. arXiv:1809.00981 Retrieved from https://arxiv.org/abs/1809.00981.

[70]

Junbo Zhao, Michael Mathieu, Ross Goroshin, and Yann Lecun. 2016. Stacked what-where auto-encoders. Computer Science 15, 1 (2016), 3563–3593.

[71]

Xinyue Zhu, Yifan Liu, Jiahong Li, Tao Wan, and Zengchang Qin. 2018. Emotion classification with data augmentation using generative adversarial networks. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 349–360.

[72]

Yizhe Zhu, Mohamed Elhoseiny, Bingchen Liu, Xi Peng, and Ahmed Elgammal. 2018. A generative adversarial approach for zero-shot learning from noisy texts. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1004–1013.

Cited By

Li YZhang CLi JSong WQi ZWu YWu X(2023)MCoR-Miner: Maximal Co-Occurrence Nonoverlapping Sequential Rule MiningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.324121335:9(9531-9546)Online publication date: 1-Sep-2023
https://dl.acm.org/doi/10.1109/TKDE.2023.3241213

Index Terms

A Latent Variable Augmentation Method for Image Categorization with Insufficient Training Samples
1. General and reference
  1. Cross-computing tools and techniques

Recommendations

Efficient deep feature selection for remote sensing image recognition with fused deep learning architectures
Abstract
Convolutional neural networks (CNNs) have recently emerged as a popular topic for machine learning in various academic and industrial fields. It is often an important problem to obtain a dataset with an appropriate size for CNN training. However, ...
On the Effectiveness of Generative Adversarial Networks as HEp-2 Image Augmentation Tool
Image Analysis
Abstract
One of the big challenges in the recognition of biomedical samples is the lack of large annotated datasets. Their relatively small size, when compared to datasets like ImageNet, typically leads to problems with efficient training of current ...
Training CNNs for Image Registration from Few Samples with Model-based Data Augmentation
Medical Image Computing and Computer Assisted Intervention − MICCAI 2017
Abstract
Convolutional neural networks (CNNs) have been successfully used for fast and accurate estimation of dense correspondences between images in computer vision applications. However, much of their success is based on the availability of large ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data

ACM Transactions on Knowledge Discovery from Data Volume 16, Issue 1

February 2022

475 pages

ISSN:1556-4681

EISSN:1556-472X

DOI:10.1145/3472794

Editor:
Charu Aggarwal
IBM T. J. Watson Research, USA

Issue’s Table of Contents

Copyright © 2021 Association for Computing Machinery.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 July 2021

Accepted: 01 February 2021

Revised: 01 June 2020

Received: 01 February 2020

Published in TKDD Volume 16, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

Natural Science Foundation of China
Guangdong Basic and Applied Basic Research Foundation
Science and Technology Planning Project of Guangzhou

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
210
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li YZhang CLi JSong WQi ZWu YWu X(2023)MCoR-Miner: Maximal Co-Occurrence Nonoverlapping Sequential Rule MiningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.324121335:9(9531-9546)Online publication date: 1-Sep-2023
https://dl.acm.org/doi/10.1109/TKDE.2023.3241213

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Issue’s Table of Contents