Residual Attention Encoding Neural Network for Terrain Texture Classification

Song, Xulin; Yang, Jingyu; Jin, Zhong

doi:10.1007/978-3-030-41299-9_5

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12047))

Included in the following conference series:

Asian Conference on Pattern Recognition

1444 Accesses

Abstract

Terrain texture classification plays an important role in computer vision applications, such as robot navigation, autonomous driving, etc. Traditional methods based on hand-crafted features often have sub-optimal performances due to the inefficiency in modeling the complex terrain variations. In this paper, we propose a residual attention encoding network (RAENet) for terrain texture classification. Specifically, RAENet incorporates a stack of residual attention blocks (RABs) and an encoding block (EB). By generating attention feature maps jointly with residual learning, RAB is different from the usually used which only combine feature from the current layer with the former one layer. RAB combines all the preceding layers to the current layer and is not only minimize the information loss in the convolution process, but also enhance the weights of the features that are conducive to distinguish between different classes. Then EB further adopts orderless encoder to keep the invariance to spatial layout in order to extract feature details before classification. The effectiveness of RAENet is evaluated on two terrain texture datasets. Experimental results show that RAENet achieves state-of-the-art performance.

This work is partially supported by National Natural Science Foundation of China under Grant Nos 61872188, U1713208, 61602244, 61672287, 61702262, 61773215.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

AFN: Attentional Feedback Network Based 3D Terrain Super-Resolution

Application Research of On-Board Satellite Remote Sensing Image Terrain Classification Based on Deep Learning

Dynamic Feature Attention Network for Remote Sensing Image Dehazing

Article 14 June 2023

References

University of Oulu texture database. http://www.outex.oulu.fi/temp/
Abadi, M., et al.: Tensorflow: A system for large-scale machine learning (2016)
Google Scholar
Akl, A., Yaacoub, C., Donias, M., Costa, J.P.D., Germain, C.: A survey of exemplar-based texture synthesis methods. Comput. Vis. Image Underst. 172, 12–24 (2018)
Article Google Scholar
Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: NetVLAD: CNN architecture for weakly supervised place recognition. IEEE Trans. Pattern Anal. Mach. Intell. TPAMI 99, 1 (2017)
Google Scholar
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. TPAMI 35(8), 1798–1828 (2013)
Article Google Scholar
Cao, C., Liu, X., Yi, Y., Yu, Y., Huang, T.S.: Look and think twice: capturing top-down visual attention with feedback convolutional neural networks. In: IEEE International Conference on Computer Vision, ICCV (2016)
Google Scholar
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. Computer Science (2014)
Google Scholar
Chen, L., et al.: SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR (2017)
Google Scholar
Chu, X., Yang, W., Ouyang, W., Ma, C., Yuille, A.L., Wang, X.: Multi-context attention for human pose estimation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR (2017)
Google Scholar
Cimpoi, M., Maji, S., Kokkinos, I., Vedaldi, A.: Deep filter banks for texture recognition, description, and segmentation. Int. J. Comput. Vis. 118(1), 65–94 (2016)
Article MathSciNet Google Scholar
Csurka, G.: Visual categorization with bags of keypoints. Workshop Stat. Learn. Comput. Vis. 44(247), 1–22 (2004)
Google Scholar
Di, H., Shan, C., Ardabilian, M., Wang, Y., Chen, L.: Local binary patterns and its application to facial image analysis: a survey. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 41(6), 765–781 (2011)
Article Google Scholar
Gatys, L.A., Ecker, A.S., Bethge, M.: Texture synthesis using convolutional neural networks. Adv. Neural Inf. Process. Syst. 70(1), 262–270 (2015)
Google Scholar
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: International Conference on Artificial Intelligence and Statistics (2011)
Google Scholar
Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Bing, S., Liu, T., Wang, X., Gang, W.: Recent advances in convolutional neural networks. Computer Science (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR (2016)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part IV. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_38
Chapter Google Scholar
Hinton, G.E.: Rectified linear units improve restricted boltzmann machines Vinod Nair. In: Proceedings of the 27th International Conference on Machine Learning, ICML (2010)
Google Scholar
Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E.: Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. TPAMI 99, 1 (2017)
Google Scholar
Huang, G., Liu, Z., Laurens, V.D.M., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR (2017)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning, ICML (2015)
Google Scholar
Itti, L., Koch, C.: Computational modelling of visual attention. Nat. Rev. Neurosci. 2(3), 194–203 (2001)
Article Google Scholar
Sivic, J., Russell, B.C., Efros, A.A., Zisserman, A., Freeman, W.T.: Discovering objects and their location in images. In: IEEE International Conference on Computer Vision, ICCV (2005)
Google Scholar
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Proceedings of the Conference on Machine Learning (1998)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. Computer Science (2014)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, NIPS (2012)
Google Scholar
Larochelle, H., Hinton, G.E.: Learning to combine foveal glimpses with a third-order Boltzmann machine. In: International Conference on Neural Information Processing Systems, NIPS (2010)
Google Scholar
Lecun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
Leung, T., Malik, J.: Representing and recognizing the visual appearance of materials using three-dimensional textons. Int. J. Comput. Vis. IJCV 43(1), 29–44 (2001)
Article Google Scholar
Li, F.F., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR (2005)
Google Scholar
Lin, M., Chen, Q., Yan, S.: Network in network. Computer Science (2013)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
Mnih, V., Heess, N., Graves, A., Kavukcuoglu, K.: Recurrent models of visual attention. In: International Conference on Neural Information Processing Systems, NIPS, vol. 3 (2014)
Google Scholar
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part VIII. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29
Chapter Google Scholar
Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. TPAMI 24(7), 971–987 (2002)
Article Google Scholar
Wang, Q., Teng, Z., Xing, J., Gao, J., Hu., W., Maybank, S.: Learning attentions: Residual attentional siamese network for high performance online visual track. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR (2018)
Google Scholar
Raad, L., Davy, A., Desolneux, A., Morel, J.M.: A survey of exemplar-based texture synthesis. Arxiv (2017)
Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR (2012)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Computer Science (2014)
Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR (2015)
Google Scholar
Varma, M., Zisserman, A.: A statistical approach to texture classification from single images. Int. J. Comput. Vis. 62(1–2), 61–81 (2005)
Article Google Scholar
Xu, K.: Show, attend and tell: Neural image caption generation with visual attention. In: Proceedings of the 34nd International Conference on Machine Learning, ICML (2017)
Google Scholar
Xue, J., Zhang, H., Dana, K.: Deep texture manifold for ground terrain recognition. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR (2018)
Google Scholar
Zhang, H., et al.: Context encoding for semantic segmentation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR (2018)
Google Scholar
Zhang, H., Xue, J., Dana, K.: Deep ten: texture encoding network. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR (2017)
Google Scholar
Zhang, X., Wang, T., Qi, J.: Progressive attention guided recurrent network for salient object detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
Xulin Song, Jingyu Yang & Zhong Jin
Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, Nanjing University of Science and Technology, Nanjing, China
Xulin Song, Jingyu Yang & Zhong Jin

Authors

Xulin Song
View author publications
You can also search for this author in PubMed Google Scholar
Jingyu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Zhong Jin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhong Jin .

Editor information

Editors and Affiliations

University of Malaya, Kuala Lumpur, Malaysia
Shivakumara Palaiahnakote
Consiglio Nazionale delle Ricerche, ICAR, Naples, Italy
Gabriella Sanniti di Baja
Chinese Academy of Sciences, Beijing, China
Liang Wang
Auckland University of Technology, Auckland, New Zealand
Wei Qi Yan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Song, X., Yang, J., Jin, Z. (2020). Residual Attention Encoding Neural Network for Terrain Texture Classification. In: Palaiahnakote, S., Sanniti di Baja, G., Wang, L., Yan, W. (eds) Pattern Recognition. ACPR 2019. Lecture Notes in Computer Science(), vol 12047. Springer, Cham. https://doi.org/10.1007/978-3-030-41299-9_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-41299-9_5
Published: 23 February 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41298-2
Online ISBN: 978-3-030-41299-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics