ADCM: attention dropout convolutional module
Introduction
In recent years, due to increases in computational capacity and the advent of large-scale datasets, deep convolutional neural network (CNN) models have demonstrated their capabilities in the ILSVRC challenge. The millions of parameters within a CNN are able to learn complicated relationships between the input and output. From the perspective of model architecture, depth and width are two important factors that can improve a network’s performance. The success of VGGNet [1] and ResNet [2] demonstrated that stacking more non-linear convolutional layers can offer richer representation due to the larger global receptive fields. Furthermore, the philosophy of DenseNet [3], which stacks thousands of layers, also follow this concept precisely. The various backbone architectures, including GoogLeNet [4] and WideResNet [5], have shown that width is another important factor in improving network performance. These networks achieved optimum performance by using more channels and wider convolutions. Note that WideResNet is an extension of ResNet, and its performance on benchmark datasets has demonstrated that ResNet-28, which has an increased width, achieves higher accuracy than does RestNet-1000. It should be noticed that the computation required for deep learning models has been doubling every few months and increasing dramatically from 2012 to 2018, with a surprisingly large carbon footprint [6]. To maintain high model performance and save computational cost, building a universal bionic mechanism integrated into the deep learning models is more potential than stacking more nonlinear layers, and should be paid more attention.
Recently, attention has become enormously popular as an essential design component in the architecture of neural networks. When humans observe an object, we glimpse the whole scene but selectively focus on the interesting objects. The attention mechanism in deep learning is a bionic representation of the human perception system that informs the model where or what to emphasize. In 2016, Hu et al. [7] proposed squeeze-and-excitation networks (SENet) and won first place in the ILSVRC 2017. The success of SENet demonstrated that attention can significantly increase the representation power of a network by enhancing the important features and suppressing the unnecessary ones. To date, the attention mechanism has been successfully applied in natural language processing [8], [9], [10], recommender systems [11], [12], and computer vision, such as in fine-grained image recognition [13], [14], object detection [15], [16], [17], and image description [18], [19], [20], [21].
Apart from the aforementioned factors, we investigate another popular design component of the model, the dropout technique [22]. Although millions of parameters can improve a model’s learning ability, the large numbers of parameters also cause overfitting to occur easily and degrades network performance, especially when the training dataset is small. Dropout is an intrinsically important regularization technique that further enhances model performance by avoiding overfitting. Dropout has been widely applied to the fully connected layer within CNNs. The dropout technique randomly sets some neurons to zero during the training phase. Based on the original dropout technique, many variants (e.g., Drop-Connect [23] and Drop-Path [24]) have been proposed to inject noise into the model. However, such techniques are not suitable for convolutional layers, primarily because the basic information units of a fully connected layer and a convolutional layer are different. The unit of the convolutional layer is a channel containing multiple features that respond to specific patterns in the input images, while the unit of the fully connected layer is a single neuron. Another reason is that the features within a channel are spatially correlated. Thus, the input information flows through the convolutional layers despite some features being randomly filtered from the channel.
Attention and dropout are intrinsically bionic mechanisms. When we are learning from complex and disorderly information, forgetting some unimportant or uncertain information is essential to improve the learning efficiency. Our brains work in the same way, focus on the important information and neglect the unimportant or useless ones. Inspired by the bionic mechanisms, we propose the attention dropout convolutional module (ADCM) in this paper. First, the ADCM integrates the attention and dropout mechanisms into a general and lightweight module that can be seamlessly deployed in consecutive convolutional layers. It consists of two submodules, the channel attention dropout (CAD) and position attention dropout (PAD) submodules. Second, our attention mechanism is a further development of SENet. It focuses on the meaningful features and learns “what” and “where” it should emphasize or suppress. We designed the dropout-channel and dropout-region based on the attention mechanisms in the two submodules. They both work at the high convolutional layer and filter the channels and regions according to their activation status. Finally, we inserted the ADCM into the baseline networks and evaluated the results by conducting a classification task on the ImageNet, CUB200-2011, and Stanford Cars datasets. To further validate the wide applicability of ADCM, we evaluate it by performing object detection on the MS COCO dataset. Our contribution can be summarized as follows.
- •
The proposed dropout-channel and dropout-region are inspired by the learning mechanism of the human brain. The channels with low attention will be filtered out from the feature maps by the dropout-channel with higher probability. Due to the fact that random filtering features cannot prevent the information flow to the next convolutional layer, the dropout-region can directly filter out the unnecessary regions containing consecutive features.
- •
The proposed dropout-channel and dropout-region can be seamlessly integrated with the convolutional layer, which is different from the original dropout method. The channels and regions are filtered according to their activation statuses which are computed by the corresponding attention mechanism. To the best of our knowledge, this is the first work to construct the attention-based dropout mechanism in a general block.
- •
With the combination of attention and dropout mechanisms, ADCM helps the baselines to learn more robust features from the input images during training. Experimental results demonstrate that the ADCM can not only lead to a significant improvement over the baseline models at a negligible computational cost but also effectively reduce overfitting on the small benchmark datasets.
The remainder of this paper is organized as follows. Section II reviews related works on the attention and dropout mechanisms. Section III details the proposed module, including the CAD and PAD submodules. Section IV provides the experimental results and the comparisons between our module and other related methods on several benchmark datasets. Finally, Section V draws conclusions.
Section snippets
Attention mechanism
Recently, some efforts [25], [26] have demonstrated that an attention mechanism is crucial to human biological perception. To rapidly capture interesting information, the human visual system first glimpses an entire image scene; then, it ignores irrelevant noise and localizes the salient areas [27]. In 2014, Bahdananu et al. [28] first leveraged an attention mechanism in machine translation. Wang et al. [29] introduced a novel convolutional neural network, named the residual attention network
Our proposed approach
The ADCM consists of two submodules: channel attention dropout (CAD) and position attention dropout (PAD). These submodules are separately applied to the output of the previous convolutional layer and are utilized to generate the new output for the next layer. Their architecture is illustrated in Fig. 1. Formally, in two consecutive layers, we denote as the output of the previous convolutional layer, where Ui ∈ RH × W specifies the ith channel, C denotes the number of channels,
Experiments
In this section, the ADCM is evaluated on image classification on the benchmark datasets, i.e., the ImageNet, CUB200-2011, and Stanford Cars datasets, and object detection on the MS COCO dataset. Our experiments were executed on a PC running a Linux operating system and equipped with an Intel Core i7-7700K CPU with 32 GB of memory and two GeForce GTX 1080 GPUs.
Conclusions
In this paper, we propose a lightweight and general module named the ADCM. In contrast to the original dropout technique, the dropout-channel and dropout-region we designed are based on the attention mechanism. They work at the convolutional layer to filter both channels and regions, which helps the baseline models learn more robust features from the input images. Due to the attention mechanism behind each convolutional layer, the new feature map emphasizes the meaningful information and
CRediT authorship contribution statement
Zhigang Liu: Conceptualization, Methodology, Software, Investigation, Writing - original draft, Writing - review & editing, Project administration. Juan Du: Software, Validation, Visualization, Investigation. Mei Wang: Formal analysis, Data curation. Shuzhi Sam Ge: Supervision, Project administration.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China under Grant 61502094, Grant 51774090, and Grant 51104030, in part by the Heilongjiang Province Natural Science Foundation of China under Grant F2016002 and Grant LH2019F042, in part by the Youth Science Foundation of Northeast Petroleum University under Grant 2017PYZL-06, Grant 2018YDL-22 and Grant KYCXTD201903, in part by the Daqing Science and Technology Project(ZD-2019-08).
Zhigang Liu received the M.S. degree and Ph.D. degree in computer science from the Northeast Petroleum University (NEPU), Daqing, China, in 2009 and 2016, respectively. From 2018 to 2019, he was a visiting scholar with the Department of Electrical & Computer Engineering at National University of Singapore. He is currently an Associate Professor with the Department of Computer Science and Engineering, NEPU. His research interests include machine learning, computer vision, especially, data/label-
References (47)
- et al.
Textual sentiment analysis via three different attention convolutional neural networks and cross-modality consistent regression
Neurocomputing
(2018) - et al.
Multi-resolution attention convolutional neural network for crowd counting
Neurocomputing
(2019) - et al.
Emotion-modulated attention improves expression recognition: A deep learning model
Neurocomputing
(2017) - et al.
Very deep convolutional networks for large-scale image recognition
Proceedings of the International Conference on Learning Representations (ICLR)
(2015) - et al.
Deep residual learning for image recognition
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2016) - et al.
Densely connected convolutional networks
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2017) - et al.
Going deeper with convolutions
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2015) - et al.
Wide residual networks
Proceedings of the International Conference on Learning Representations (ICLR)
(2016) - S. Roy, D. Jesse, A.S. Noah, et al., Green AI, arXiv:1907.10597...
- et al.
Squeeze-and-excitation networks
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2018)
DiSAN: Directional self-attention network for RNN/CNN-free language understanding
Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI)
A structured self-attentive sentence embedding
Proceedings of the International Conference on Learning Representations (ICLR)
Sequential recommender system based on hierarchical attention networks
Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI)
NAIS: Neural attentive item similarity model for recommendation
IEEE Trans. Knowl. Data Eng.
Recurrent models of visual attention
Proceedings of the Conference on Neural Information Processing Systems (NIPS)
Spatial transformer networks
Proceedings of the Conference on Neural Information Processing Systems (NIPS)
Feature selective networks for object detection
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Multiscale visual attention networks for object detection in vhr remote sensing images
IEEE Geosci. Remote Sensing Lett.
Show, attend and tell: Neural image caption generation with visual attention
Proceedings of the International Conference on Machine Learning (ICML)
DRAW: A recurrent neural network for image generation
Proceedings of the International Conference on Machine Learning (ICML)
SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Dropout: a simple way to prevent neural networks from overfitting
J. Mach. Learn. Res.
Regularization of neural networks using dropconnect
Proceedings of the International Conference on Machine Learning (ICML)
Cited by (26)
Feature learning network with transformer for multi-label image classification
2023, Pattern RecognitionCitation Excerpt :SeeNet [49] designs a self-erasing network to prohibit attentions from spreading to unexpected background regions. ADCM [50] proposes an attention dropout convolutional module to emphasize the meaningful information and suppress unnecessary noise. Different from these methods, the purpose of our work is to suppress the most salient regions in images to excavate other potential valuable features.
A residual network with attention module for hyperspectral information of recognition to trace the origin of rice
2021, Spectrochimica Acta - Part A: Molecular and Biomolecular SpectroscopyEnvironment sound classification using an attention-based residual neural network
2021, NeurocomputingCitation Excerpt :The performance of the proposed attention-based deep model is compared with other attention mechanism from literature. The model attains the accuracy of 77.61%, 78.27%, 77.12%, 81.34% and 84.71% for the Scaled Dot product attention [28], Triplet attention [30], Channel attention [31], Dropout attention [39] and Symbiotic attention [40] techniques respectively. It is clear from Table 6 that the proposed method yields competitive accuracy to existing state-of-the-art deep models.
Person image generation with attention-based injection network
2021, NeurocomputingAttention induced multi-head convolutional neural network for human activity recognition
2021, Applied Soft ComputingCitation Excerpt :Each head of the model is provided with input time steps obtained after the segmentation step. We added attention to the proposed multi-head model to strengthen the representation capability by focusing on salient features and ignoring potentially confusing and irrelevant details [52]. In the proposed scheme, the squeeze-and-excitation (SE) module [26] is employed as an attention.
Improved deep CNNs based on Nonlinear Hybrid Attention Module for image classification
2021, Neural NetworksCitation Excerpt :Besides, multi-attention modules are also explored in different regions. Liu et al. proposed attention dropout convolutional module (ADCM), which is a lightweight attention module makes most use of dropout strategy in channel-wise and position-wise attention mechanisms (Liu, Du, Wang, & Ge, 0000). Chen et al. introduced a multi attention module to solve problems in visual tracking (Chen et al., 2019).
Zhigang Liu received the M.S. degree and Ph.D. degree in computer science from the Northeast Petroleum University (NEPU), Daqing, China, in 2009 and 2016, respectively. From 2018 to 2019, he was a visiting scholar with the Department of Electrical & Computer Engineering at National University of Singapore. He is currently an Associate Professor with the Department of Computer Science and Engineering, NEPU. His research interests include machine learning, computer vision, especially, data/label- and computation-efficient deep learning for visual recognition.
Juan Du received the M.S. degrees in computer science and technology from Northeast Petroleum University, Daqing, China, in 2009. She is currently an Associate Professor at the School of Computer and Information Technology, Northeast Petroleum University. Her research interests mainly include computer vision and image processing.
Mei Wang received her Ph.D. from Tianjin University, China, in 2015. She is currently a Professor at the School of Computer and Information Technology, Northeast Petroleum University. Her research interests include machine learning and pattern recognition and their applications in intelligent oilfield exploration system.
Shuzhi Sam Ge received the B.Sc. degree from the Beijing University of Aeronautics and Astronautics, Beijing, China, in 1986 and the Ph.D. degree from the Imperial College London, London, U.K., in 1993. He is the Director with the Social Robotics Laboratory of Interactive Digital Media Institute, Singapore and the Centre for Robotics, Chengdu, China, and a Professor with the Department of Electrical and Computer Engineering, National University of Singapore, Singapore, on leave from the School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu. He has co-authored four books and over 300 international journal and conference papers. His current research interests include social robotics, adaptive control, intelligent systems, and artificial intelligence. Dr. Ge is the Editor-in-Chief of the International Journal of Social Robotics (Springer). He has served/been serving as an Associate Editor for a number of flagship journals, including the IEEE TRANSACTIONS ON AUTOMATION CONTROL, the IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, the IEEE TRANSACTIONS ON NEURAL NETWORKS, and Automatica. He serves as a Book Editor for the Taylor and Francis Automation and Control Engineering Series. He served as the Vice President for Technical Activities from 2009 to 2010 and Membership Activities from 2011 to 2012, and a member of the Board of Governors from 2007 to 2009 at the IEEE Control Systems Society. He is a fellow of the International Federation of Automatic Control, the Institution of Engineering and Technology, and the Society of Automotive Engineering.