Skip to main content

Convolution Tells Where to Look

  • Conference paper
  • First Online:
  • 1729 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 13022))

Abstract

Many attention models have been introduced to boost the representational power of convolutional neural networks (CNNs). Most of them are self-attention models which generate an attention mask based on current features, like spatial attention and channel attention model. However, these attention models may not achieve good results when the current features are the low-level features of CNNs. In this work, we propose a new lightweight attention unit, feature difference (FD) model, which utilizes the difference between two feature maps to generate the attention mask. The FD module can be integrated into most of the state-of-the-art CNNs like ResNets and VGG just by adding some shortcut connections, which does not introduce any additional parameters and layers. Extensive experiments show that the FD model can help improve the performance of the baseline on four benchmarks, including CIFAR10, CIFAR100, ImageNet-1K, and VOC PASCAL. Note that ResNet44 (6.10% error) with FD model achieves better results than ResNet56 (6.24% error), while the former has fewer parameters than the latter one by 29%.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 248–255. IEEE (2009)

    Google Scholar 

  2. Fox, M.D., Corbetta, M., Snyder, A.Z., Vincent, J.L., Raichle, M.E.: Spontaneous neuronal activity distinguishes human dorsal and ventral attention systems. Proc. Natl. Acad. Sci. 103(26), 10046–10051 (2006)

    Article  Google Scholar 

  3. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  4. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)

    Google Scholar 

  5. Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Tech. rep, Citeseer (2009)

    Google Scholar 

  6. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  7. Larochelle, H., Hinton, G.E.: Learning to combine foveal glimpses with a third-order Boltzmann machine. In: Advances in Neural Information Processing Systems, pp. 1243–1251 (2010)

    Google Scholar 

  8. Li, X., Wang, W., Hu, X., Yang, J.: Selective kernel networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 510–519 (2019)

    Google Scholar 

  9. Liu, N., Han, J.: Dhsnet: deep hierarchical saliency network for salient object detection. In: Computer Vision and Pattern Recognition, pp. 678–686 (2016)

    Google Scholar 

  10. Nuechterlein, K.H., Parasuraman, R., Jiang, Q.: Visual sustained attention: image degradation produces rapid sensitivity decrement over time. Science 220(4594), 327–329 (1983)

    Article  Google Scholar 

  11. Pardo, J.V., Fox, P.T., Raichle, M.E.: Localization of a human system for sustained attention by positron emission tomography. Nature 349(6304), 61 (1991)

    Article  Google Scholar 

  12. Park, J., Woo, S., Lee, J., Kweon, I.S.: BAM: bottleneck attention module. In: British Machine Vision Conference 2018, BMVC 2018, Newcastle, UK, 3–6 September 2018, p. 147. BMVA Press (2018). http://bmvc2018.org/contents/papers/0092.pdf

  13. Petersen, S.E., Posner, M.I.: The attention system of the human brain: 20 years after. Ann. Rev. Neurosci. 35, 73–89 (2012)

    Article  Google Scholar 

  14. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-CNN: towards real-time object detection with region proposal networks. In: International Conference on Neural Information Processing Systems, pp. 91–99 (2015)

    Google Scholar 

  15. Rensink, R.A., O’Regan, J.K., Clark, J.J.: To see or not to see: the need for attention to perceive changes in scenes. Psychol. Sci. 8(5), 368–373 (1997)

    Article  Google Scholar 

  16. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  17. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv 2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)

    Google Scholar 

  18. Schneider, W., Shiffrin, R.M.: Controlled and automatic human information processing: I. detection, search, and attention. Psychol. Rev. 84(1), 1 (1977)

    Google Scholar 

  19. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)

    Google Scholar 

  20. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556

  21. Srivastava, R.K., Greff, K., Schmidhuber, J.: Highway networks. arXiv preprint arXiv:1505.00387 (2015)

  22. Tan, M., et al.: Mnasnet: platform-aware neural architecture search for mobile. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2820–2828 (2019)

    Google Scholar 

  23. Tan, M., Le, Q.V.: Efficientnet: rethinking model scaling for convolutional neural networks. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9–15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 6105–6114. PMLR (2019). http://proceedings.mlr.press/v97/tan19a.html

  24. Wang, F., et al.: Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2017)

    Google Scholar 

  25. Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)

    Google Scholar 

  26. Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)

    Google Scholar 

  27. Xu, K., et al.: Show, attend and tell: Neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057 (2015)

    Google Scholar 

  28. Zagoruyko, S., Komodakis, N.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April 2017, Conference Track Proceedings. OpenReview.net (2017). https://openreview.net/forum?id=Sks9_ajex

  29. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53

    Chapter  Google Scholar 

Download references

Acknowledgements

This research is partially sponsored by Key Project of Beijing Municipal Education Commission (No. KZ201910005008).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lijuan Duan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xu, F., Duan, L., Qiao, Y., Chen, J. (2021). Convolution Tells Where to Look. In: Ma, H., et al. Pattern Recognition and Computer Vision. PRCV 2021. Lecture Notes in Computer Science(), vol 13022. Springer, Cham. https://doi.org/10.1007/978-3-030-88013-2_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-88013-2_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-88012-5

  • Online ISBN: 978-3-030-88013-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics