research-article

Learning Sample-Specific Policies for Sequential Image Augmentation

Authors:

Xiaohui XieAuthors Info & Claims

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 4491 - 4500

https://doi.org/10.1145/3474085.3475602

Published: 17 October 2021 Publication History

Abstract

This paper presents a policy-driven sequential image augmentation approach for image-related tasks. Our approach applies a sequence of image transformations (e.g., translation, rotation) over a training image, one transformation at a time, with the augmented image from the previous time step treated as the input for the next transformation. This sequential data augmentation substantially improves sample diversity, leading to improved test performance, especially for data-hungry models (e.g., deep neural networks). However, the search for the optimal transformation of each image at each time step of the sequence has high complexity due to its combination nature. To address this challenge, we formulate the search task as a sequential decision process and introduce a deep policy network that learns to produce transformations based on image content. We also develop an iterative algorithm to jointly train a classifier and the policy network in the reinforcement learning setting. The immediate reward of a potential transformation is defined to encourage transformations producing hard samples for the current classifier. At each iteration, we employ the policy network to augment the training dataset, train a classifier with the augmented data, and train the policy net with the aid of the classifier. We apply the above approach to both public image classification benchmarks and a newly collected image dataset for material recognition. Comparisons to alternative augmentation approaches show that our policy-driven approach achieves comparable or improved classification performance while using significantly fewer augmented images. The code is available at https://github.com/Paul-LiPu/rl_autoaug.

References

[1]

Antreas Antoniou, Amos Storkey, and Harrison Edwards. 2017. Data augmentation generative adversarial networks. arXiv preprint arXiv:1711.04340 (2017).

[2]

Bowen Baker, Otkrist Gupta, Nikhil Naik, and Ramesh Raskar. 2016. Designing neural network architectures using reinforcement learning. arXiv preprint arXiv:1611.02167 (2016).

[3]

Qingxing Cao, Liang Lin, Yukai Shi, Xiaodan Liang, and Guanbin Li. 2017. Attention-aware face hallucination via deep reinforcement learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 690--698.

[4]

Dan Ciregan, Ueli Meier, and Jürgen Schmidhuber. 2012. Multi-column deep neural networks for image classification. In 2012 IEEE conference on computer vision and pattern recognition. IEEE, 3642--3649.

Digital Library

[5]

Ekin D Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V Le. 2018. Autoaugment: Learning augmentation policies from data. arXiv preprint arXiv:1805.09501 (2018).

[6]

Ekin D Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V Le. 2020. Randaugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 702--703.

[7]

Terrance DeVries and Graham W Taylor. 2017. Dataset augmentation in feature space. arXiv preprint arXiv:1702.05538 (2017).

[8]

Terrance DeVries and Graham W Taylor. 2017. Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552 (2017).

[9]

Debidatta Dwibedi, Ishan Misra, and Martial Hebert. 2017. Cut, paste and learn: Surprisingly easy synthesis for instance detection. In Proceedings of the IEEE International Conference on Computer Vision. 1301--1310.

[10]

Alhussein Fawzi, Horst Samulowitz, Deepak Turaga, and Pascal Frossard. 2016. Adaptive data augmentation for image classification. In 2016 IEEE international conference on image processing (ICIP). Ieee, 3688--3692.

[11]

Golnaz Ghiasi, Yin Cui, Aravind Srinivas, Rui Qian, Tsung-Yi Lin, Ekin D Cubuk, Quoc V Le, and Barret Zoph. 2020. Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation. arXiv preprint arXiv:2012.07177 (2020).

[12]

Ross Girshick, Ilija Radosavovic, Georgia Gkioxari, Piotr Dollár, and Kaiming He. 2018. Detectron.

[13]

Shixiang Gu, Ethan Holly, Timothy Lillicrap, and Sergey Levine. 2017. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In 2017 IEEE international conference on robotics and automation (ICRA). IEEE, 3389--3396.

Digital Library

[14]

Dongyoon Han, Jiwhan Kim, and Junmo Kim. 2017. Deep pyramidal residual networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5927--5935.

[15]

Ryuichiro Hataya, Jan Zdenek, Kazuki Yoshizoe, and Hideki Nakayama. 2020. Faster autoaugment: Learning augmentation strategies using backpropagation. In European Conference on Computer Vision. Springer, 1--16.

[16]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[17]

Daniel Ho, Eric Liang, Xi Chen, Ion Stoica, and Pieter Abbeel. 2019. Population based augmentation: Efficient learning of augmentation policy schedules. In International Conference on Machine Learning. PMLR, 2731--2741.

[18]

Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4700--4708.

[19]

Hiroshi Inoue. 2018. Data augmentation by pairing samples for images classification. arXiv preprint arXiv:1801.02929 (2018).

[20]

Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 1725--1732.

Digital Library

[21]

Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).

[22]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012), 1097--1105.

Digital Library

[23]

Joseph Lemley, Shabab Bazrafkan, and Peter Corcoran. 2017. Smart augmentation learning an optimal data augmentation strategy. Ieee Access 5 (2017), 5858--5869.

[24]

Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015).

[25]

Sungbin Lim, Ildoo Kim, Taesup Kim, Chiheon Kim, and Sungwoong Kim. 2019. Fast autoaugment. arXiv preprint arXiv:1905.00397 (2019).

Digital Library

[26]

Chen Lin, Minghao Guo, Chuming Li, Xin Yuan, Wei Wu, Junjie Yan, Dahua Lin, and Wanli Ouyang. 2019. Online hyper-parameter learning for autoaugmentation strategy. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6579--6588.

[27]

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector. In European conference on computer vision. Springer, 21--37.

[28]

Emmanuel Maggiori, Yuliya Tarabalka, Guillaume Charpiat, and Pierre Alliez. 2017. High-resolution aerial image labeling with convolutional neural networks. IEEE Transactions on Geoscience and Remote Sensing 55, 12 (2017), 7092--7103.

[29]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. nature 518, 7540 (2015), 529--533.

[30]

Oren Nuriel, Sagie Benaim, and LiorWolf. 2020. Permuted AdaIN: Enhancing the Representation of Local Cues in Image Classifiers. arXiv preprint arXiv:2010.05785 (2020).

[31]

Alexander J Ratner, Henry R Ehrenberg, Zeshan Hussain, Jared Dunnmon, and Christopher Ré. 2017. Learning to compose domain-specific transformations for data augmentation. Advances in neural information processing systems 30 (2017), 3239.

Digital Library

[32]

Zhou Ren, Xiaoyu Wang, Ning Zhang, Xutao Lv, and Li-Jia Li. 2017. Deep reinforcement learning-based image captioning with embedding reward. In Proceedings of the IEEE conference on computer vision and pattern recognition. 290--298.

[33]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234--241.

[34]

Ikuro Sato, Hiroki Nishimura, and Kensuke Yokoi. 2015. Apac: Augmented pattern classification with neural networks. arXiv preprint arXiv:1505.03229 (2015).

[35]

Ashish Shrivastava, Tomas Pfister, Oncel Tuzel, Joshua Susskind, Wenda Wang, and Russell Webb. 2017. Learning from simulated and unsupervised images through adversarial training. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2107--2116.

[36]

David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. nature 529, 7587 (2016), 484--489.

[37]

David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. 2017. Mastering the game of go without human knowledge. nature 550, 7676 (2017), 354--359.

[38]

Patrice Y Simard, David Steinkraus, John C Platt, et al. 2003. Best practices for convolutional neural networks applied to visual document analysis. In Icdar, Vol. 3. Citeseer.

Digital Library

[39]

Leon Sixt, Benjamin Wild, and Tim Landgraf. 2018. Rendergan: Generating realistic labeled data. Frontiers in Robotics and AI 5 (2018), 66.

[40]

Yuji Tokozume, Yoshitaka Ushiku, and Tatsuya Harada. 2018. Between-class learning for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5486--5494.

[41]

Toan Tran, Trung Pham, Gustavo Carneiro, Lyle Palmer, and Ian Reid. 2017. A bayesian data augmentation approach for learning deep models. arXiv preprint arXiv:1710.10564 (2017).

Digital Library

[42]

Hado Van Hasselt, Arthur Guez, and David Silver. 2016. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30.

Digital Library

[43]

YulinWang, Gao Huang, Shiji Song, Xuran Pan, Yitong Xia, and ChengWu. 2020. Regularizing deep networks with semantic data augmentation. arXiv preprint arXiv:2007.10538 (2020).

[44]

Yulin Wang, Xuran Pan, Shiji Song, Hong Zhang, Gao Huang, and Cheng Wu. 2019. Implicit semantic data augmentation for deep networks. Advances in Neural Information Processing Systems 32 (2019), 12635--12644.

Digital Library

[45]

Sebastien C Wong, Adam Gatt, Victor Stamatescu, and Mark D McDonnell. 2016. Understanding data augmentation for classification: when to warp?. In 2016 international conference on digital image computing: techniques and applications (DICTA). IEEE, 1--6.

[46]

Cihang Xie, Mingxing Tan, Boqing Gong, Jiang Wang, Alan L Yuille, and Quoc V Le. 2020. Adversarial examples improve image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 819--828.

[47]

Ke Yu, Chao Dong, Liang Lin, and Chen Change Loy. 2018. Crafting a toolchain for image restoration by deep reinforcement learning. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2443--2452.

[48]

Sangdoo Yun, Jongwon Choi, Youngjoon Yoo, Kimin Yun, and Jin Young Choi. 2017. Action-decision networks for visual tracking with deep reinforcement learning. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2711--2720.

[49]

Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. 2019. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6023--6032.

[50]

Sergey Zagoruyko and Nikos Komodakis. 2016. Wide residual networks. arXiv preprint arXiv:1605.07146 (2016).

[51]

Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. 2017. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412 (2017).

[52]

Zhun Zhong, Liang Zheng, Guoliang Kang, Shaozi Li, and Yi Yang. 2020. Random erasing data augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 13001--13008.

[53]

Barret Zoph, Ekin D Cubuk, Golnaz Ghiasi, Tsung-Yi Lin, Jonathon Shlens, and Quoc V Le. 2020. Learning data augmentation strategies for object detection. In European Conference on Computer Vision. Springer, 566--583.

Digital Library

[54]

Barret Zoph and Quoc V Le. 2016. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578 (2016).

Cited By

LIU Genghuan 刘ZENG Xiangjin 曾DOU Jiazhen 豆REN Zhenbo 任ZHONG Liyun 钟DI Jianglei 邸QIN Yuwen 秦(2024)基于深度学习的小目标检测技术研究进展(特邀)Infrared and Laser Engineering10.3788/IRLA2024025353:9(20240253)Online publication date: 2024
https://doi.org/10.3788/IRLA20240253
Li PZhao YLiu X(2024)Policy-driven Auto-Augmentation with Distillment Rewards for Scene Text RecognitionProceedings of the 6th ACM International Conference on Multimedia in Asia10.1145/3696409.3700215(1-8)Online publication date: 3-Dec-2024
https://dl.acm.org/doi/10.1145/3696409.3700215
Kang YZare SLin AHan ZOsher SNguyen H(2024)Game Theory Meets Data AugmentationIEEE Transactions on Artificial Intelligence10.1109/TAI.2024.33841295:12(6080-6094)Online publication date: Dec-2024
https://doi.org/10.1109/TAI.2024.3384129
Show More Cited By

Index Terms

Learning Sample-Specific Policies for Sequential Image Augmentation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object recognition
  2. Machine learning
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Reinforcement learning

Recommendations

Combining Active Learning and Data Augmentation for Image Classification
ICBDT '20: Proceedings of the 3rd International Conference on Big Data Technologies

To solve the problem that the data annotation in image classification task requires a lot of time and economic costs, and a large number of unlabeled images cannot be effectively utilized in reality, an image classification method combining active ...
Policy-driven Auto-Augmentation with Distillment Rewards for Scene Text Recognition
MMAsia '24: Proceedings of the 6th ACM International Conference on Multimedia in Asia
This paper presents a policy-driven approach to augment training images for scene text recognition (STR). Image augmentation has been proven effective in improving the generalization capabilities of deep neural networks and boosting system performance in ...
Feature transforms for image data augmentation
Abstract
A problem with convolutional neural networks (CNNs) is that they require large datasets to obtain adequate robustness; on small datasets, they are prone to overfitting. Many methods have been proposed to overcome this shortcoming with CNNs. In ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

October 2021

5796 pages

ISBN:9781450386517

DOI:10.1145/3474085

General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '21

Sponsor:

SIGMM

MM '21: ACM Multimedia Conference

October 20 - 24, 2021

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
147
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)2

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

LIU Genghuan 刘ZENG Xiangjin 曾DOU Jiazhen 豆REN Zhenbo 任ZHONG Liyun 钟DI Jianglei 邸QIN Yuwen 秦(2024)基于深度学习的小目标检测技术研究进展(特邀)Infrared and Laser Engineering10.3788/IRLA2024025353:9(20240253)Online publication date: 2024
https://doi.org/10.3788/IRLA20240253
Li PZhao YLiu X(2024)Policy-driven Auto-Augmentation with Distillment Rewards for Scene Text RecognitionProceedings of the 6th ACM International Conference on Multimedia in Asia10.1145/3696409.3700215(1-8)Online publication date: 3-Dec-2024
https://dl.acm.org/doi/10.1145/3696409.3700215
Kang YZare SLin AHan ZOsher SNguyen H(2024)Game Theory Meets Data AugmentationIEEE Transactions on Artificial Intelligence10.1109/TAI.2024.33841295:12(6080-6094)Online publication date: Dec-2024
https://doi.org/10.1109/TAI.2024.3384129
Xu ZWang SXu GLiu YYu MZhang HLukasiewicz TGu J(2024)Automatic data augmentation for medical image segmentation using Adaptive Sequence-length based Deep Reinforcement LearningComputers in Biology and Medicine10.1016/j.compbiomed.2023.107877169(107877)Online publication date: Feb-2024
https://doi.org/10.1016/j.compbiomed.2023.107877
Zhang HLi PLiu XYang XAn L(2023)An Iterative Semi-supervised Approach with Pixel-wise Contrastive Loss for Road Extraction in Aerial ImagesACM Transactions on Multimedia Computing, Communications, and Applications10.1145/360637420:3(1-21)Online publication date: 10-Nov-2023
https://dl.acm.org/doi/10.1145/3606374
Gao WZhang XGuo SZhang TXiang TQiu HWen YLiu Y(2023)Automatic Transformation Search Against Deep Leakage From GradientsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.326281345:9(10650-10668)Online publication date: 1-Sep-2023
https://dl.acm.org/doi/10.1109/TPAMI.2023.3262813

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten