Convolution Tells Where to Look

Xu, Fan; Duan, Lijuan; Qiao, Yuanhua; Chen, Ji

doi:10.1007/978-3-030-88013-2_2

Convolution Tells Where to Look

Fan Xu^16,17,
Lijuan Duan^16,18,19,
Yuanhua Qiao²⁰ &
…
Ji Chen¹⁷

Conference paper
First Online: 22 October 2021

1729 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 13022))

Abstract

Many attention models have been introduced to boost the representational power of convolutional neural networks (CNNs). Most of them are self-attention models which generate an attention mask based on current features, like spatial attention and channel attention model. However, these attention models may not achieve good results when the current features are the low-level features of CNNs. In this work, we propose a new lightweight attention unit, feature difference (FD) model, which utilizes the difference between two feature maps to generate the attention mask. The FD module can be integrated into most of the state-of-the-art CNNs like ResNets and VGG just by adding some shortcut connections, which does not introduce any additional parameters and layers. Extensive experiments show that the FD model can help improve the performance of the baseline on four benchmarks, including CIFAR10, CIFAR100, ImageNet-1K, and VOC PASCAL. Note that ResNet44 (6.10% error) with FD model achieves better results than ResNet56 (6.24% error), while the former has fewer parameters than the latter one by 29%.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 248–255. IEEE (2009)
Google Scholar
Fox, M.D., Corbetta, M., Snyder, A.Z., Vincent, J.L., Raichle, M.E.: Spontaneous neuronal activity distinguishes human dorsal and ventral attention systems. Proc. Natl. Acad. Sci. 103(26), 10046–10051 (2006)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Google Scholar
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Tech. rep, Citeseer (2009)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Larochelle, H., Hinton, G.E.: Learning to combine foveal glimpses with a third-order Boltzmann machine. In: Advances in Neural Information Processing Systems, pp. 1243–1251 (2010)
Google Scholar
Li, X., Wang, W., Hu, X., Yang, J.: Selective kernel networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 510–519 (2019)
Google Scholar
Liu, N., Han, J.: Dhsnet: deep hierarchical saliency network for salient object detection. In: Computer Vision and Pattern Recognition, pp. 678–686 (2016)
Google Scholar
Nuechterlein, K.H., Parasuraman, R., Jiang, Q.: Visual sustained attention: image degradation produces rapid sensitivity decrement over time. Science 220(4594), 327–329 (1983)
Article Google Scholar
Pardo, J.V., Fox, P.T., Raichle, M.E.: Localization of a human system for sustained attention by positron emission tomography. Nature 349(6304), 61 (1991)
Article Google Scholar
Park, J., Woo, S., Lee, J., Kweon, I.S.: BAM: bottleneck attention module. In: British Machine Vision Conference 2018, BMVC 2018, Newcastle, UK, 3–6 September 2018, p. 147. BMVA Press (2018). http://bmvc2018.org/contents/papers/0092.pdf
Petersen, S.E., Posner, M.I.: The attention system of the human brain: 20 years after. Ann. Rev. Neurosci. 35, 73–89 (2012)
Article Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-CNN: towards real-time object detection with region proposal networks. In: International Conference on Neural Information Processing Systems, pp. 91–99 (2015)
Google Scholar
Rensink, R.A., O’Regan, J.K., Clark, J.J.: To see or not to see: the need for attention to perceive changes in scenes. Psychol. Sci. 8(5), 368–373 (1997)
Article Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv 2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Google Scholar
Schneider, W., Shiffrin, R.M.: Controlled and automatic human information processing: I. detection, search, and attention. Psychol. Rev. 84(1), 1 (1977)
Google Scholar
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556
Srivastava, R.K., Greff, K., Schmidhuber, J.: Highway networks. arXiv preprint arXiv:1505.00387 (2015)
Tan, M., et al.: Mnasnet: platform-aware neural architecture search for mobile. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2820–2828 (2019)
Google Scholar
Tan, M., Le, Q.V.: Efficientnet: rethinking model scaling for convolutional neural networks. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9–15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 6105–6114. PMLR (2019). http://proceedings.mlr.press/v97/tan19a.html
Wang, F., et al.: Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2017)
Google Scholar
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)
Google Scholar
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)
Google Scholar
Xu, K., et al.: Show, attend and tell: Neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057 (2015)
Google Scholar
Zagoruyko, S., Komodakis, N.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April 2017, Conference Track Proceedings. OpenReview.net (2017). https://openreview.net/forum?id=Sks9_ajex
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53
Chapter Google Scholar

Download references

Acknowledgements

This research is partially sponsored by Key Project of Beijing Municipal Education Commission (No. KZ201910005008).

Author information

Authors and Affiliations

Faculty of Information Technology, Beijing University of Technology, Beijing, China
Fan Xu & Lijuan Duan
Peng Cheng Laboratory, Shenzhen, China
Fan Xu & Ji Chen
Beijing Key Laboratory of Trusted Computing, Beijing, China
Lijuan Duan
National Engineering Laboratory for Key Technologies of Information Security Level Protection, Beijing, China
Lijuan Duan
College of Mathematics and Physics, Beijing University of Technology, Beijing, China
Yuanhua Qiao

Authors

Fan Xu
View author publications
You can also search for this author in PubMed Google Scholar
Lijuan Duan
View author publications
You can also search for this author in PubMed Google Scholar
Yuanhua Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Ji Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lijuan Duan .

Editor information

Editors and Affiliations

University of Science and Technology Beijing, Beijing, China
Huimin Ma
Chinese Academy of Sciences, Beijing, China
Liang Wang
Tsinghua University, Beijing, China
Changshui Zhang
Zhejiang University, Hangzhou, China
Fei Wu
Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Hunan University, Changsha, China
Yaonan Wang
Sun Yat-Sen University, Guangzhou, Guangdong, China
Jianhuang Lai
Beijing Jiaotong University, Beijing, China
Yao Zhao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, F., Duan, L., Qiao, Y., Chen, J. (2021). Convolution Tells Where to Look. In: Ma, H., et al. Pattern Recognition and Computer Vision. PRCV 2021. Lecture Notes in Computer Science(), vol 13022. Springer, Cham. https://doi.org/10.1007/978-3-030-88013-2_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-88013-2_2
Published: 22 October 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88012-5
Online ISBN: 978-3-030-88013-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics