Elsevier

Pattern Recognition

Volume 122, February 2022, 108354
Pattern Recognition

Gradient-Aligned convolution neural network

https://doi.org/10.1016/j.patcog.2021.108354Get rights and content

Highlights

  • We propose a general Convolution operation, called GAConv, which can replace conventional operations in CNN to help it achieve rotation invariance.

  • With GAConv, Gradient-Aligned CNN (GACNN) can achieve rotation invariance without any data augmentation, feature-map augmentation, and filter enrichment.

  • In GACNN, rotation invariance does not learn from the training set, but bases on the network model. Different from the vanilla CNN, GACNN will output invariant results for all rotated versions of an object, no matter whether the network is trained or not.

  • We conduct classification experiments on designed dataset and realistic datasets. The results show that with the same computation cost, GACNN achieved better results than conventional CNN and some rotational invariant CNN.

Abstract

Although Convolution Neural Networks (CNN) have achieved great success in many applications of computer vision in recent years, rotation invariance is still a difficult problem for CNN. Especially for some images, the content can appear in the image at any angle of rotation, such as medical images, microscopic images, remote sensing images and astronomical images. In this paper, we propose a novel convolution operation, called Gradient-Aligned Convolution (GAConv), which can help CNN achieve rotation invariance by replacing vanilla convolutions in CNN. GAConv is implemented with a prior pixel-level gradient alignment operation before regular convolution. With GAConv, Gradient-Aligned CNN (GACNN) can achieve rotation invariance without any data augmentation, feature-map augmentation, and filter enrichment. In GACNN, rotation invariance does not learn from the training set, but bases on the network model. Different from the vanilla CNN, GACNN will output invariant results for all rotated versions of an object, no matter whether the network is trained or not. This means that we only need to train the network with one canonical version of the object and all other rotated versions of this object should be recognized with the same accuracy. Classification experiments have been conducted to evaluate GACNN compared with some rotation invariant approaches. GACNN achieved the best results on the 360 rotated test set of MNIST-rotation, Plankton-sub-rotation, and Galaxy Zoo 2.

Introduction

Since the birth of computer vision, rotation invariance has long been an important issue in the field. When we capture images from nature, the object may appear on the image at different rotation angles due to the relative position between the object and the camera. Especially for medical images, microscopic images, remote sensing images, and astronomical images, the angle of rotation can be arbitrary without being affected by gravity constraints in natural images.

When dealing with such images of the same object in different orientations, as shown in Fig. 1, we always want to get the same response so that our descriptors won’t be affected by the orientation of input. We can say that these responses or descriptors are rotation invariants. The formal definition of rotation invariance can be represented by Equation (1),invariance:f(I)=f(R(I)).In the image coordinate system, we can define 2D rotation transformation R on the input image I. The function f(·), which has the same output for different rotated versions of image I, has rotation invariance. The GACNN proposed in this paper is a special case of f(·).

In recent years, although CNN has achieved great success in many applications of computer vision, achieving rotation invariance through CNN has remained a challenge. CNN encodes local translation invariance somewhat implicitly by weight-shared convolution, pooling operation, and so on. However, there are some difficulties for encoding rotation. Interpolation is needed when rotating an image with an angle that is not multiple of 90, and the results may not be in the pixel grid of the original image domain. These complications make encoding rotation invariance in CNN quite challenging.

For human vision, psychophysical experiments have proven that prior rotation alignment exists to recognize rotated objects. Shepard & Metzler [1] introduced the concept of mental rotation in 1971, which has become one of the best-known experiments in the field. The conclusion is that the time it takes for humans to determine whether two rotated objects are the same depends on the angle of rotation between them. As shown in Fig. 1, the times it takes to recognize ‘7’ in the rotated versions are positively related to the angle between the rotated image and the first canonical version. It means that in order to recognize the image, humans actually rotate the object mentally.

Based on this basic mechanism of human vision, we propose a special convolution, called Gradient-Aligned Convolution (GAConv). Before the regular convolution operation, the gradient at each pixel will be calculated by the proposed Extended Circle Sobel operator and aligned to the canonical direction. It means that all the neighbors will be rotated to make sure the gradient at the central pixel is aligned to a chosen and fixed direction, as illustrated in Fig. 2. The left two figures show the gradients of the first two images in Fig. 1. In the right two figures, the gradients of the first two images in Fig. 1 are aligned to the same fixed canonical direction. From another perspective, the direction information of gradient will be ignored and only the magnitude of gradient participates in the subsequent convolution.

With GAConv, Gradient-Aligned CNN (GACNN) can achieve global rotation invariance without any data augmentation, feature map augmentation, and filter enrichment. Only one canonical direction is encoded into the filters. In GACNN, rotation invariance does not learn from the training set, but bases on the network model. Different from the vanilla CNN, GACNN will output invariant results for all rotated versions of an object, no matter whether the network is trained or not. This means we only need to train the network with one canonical version of the object. All other rotated versions of this object should be recognized with the same accuracy. Further, convolution operations in any CNN architecture can be replaced by GAConv to make the network achieve invariance to rotations. Classification experiments have been conducted to evaluate GACNN compared with regular CNN and some other rotation invariant approaches.

Section snippets

Related works

In the past decades, numerous rotation invariant methods have been designed. Techniques for achieving rotation invariance used in conventional methods can be roughly classified into three categories. The first is global alignment. A reference direction of the image should be obtained first. Then the entire image can be rotated to the reference direction to achieve global rotation invariance. The second is to ignore the orientation information in the image. Some statistical-based methods ignore

Gradient-Aligned convolution

Gradient-Aligned Convolution (GAConv) is an operation that can replace the regular convolutions in CNN. In conventional convolution, the convolution kernel K moves pixel by pixel on the input feature map and performs matrix entry-wise product with the corresponding region Ii on the input feature map. For GAConv, intuitively, a prior pixel-level gradient alignment is applied on the input feature-maps before the regular convolution, as presented in Fig. 2. In practice, we modified the process of

Experiments

In this section, we evaluate the effectiveness of GACNN on three datasets. In Section 4.1, we develop a new dataset, MNIST-rotation, based on MNIST [34]. In MNIST-rotation, only canonical versions of the digits are included in the training set. However, lots of rotated versions of digits are included in the test set. We conduct comparison experiments on MNIST-rotation with regular CNN and several recent rotation invariant approaches. We analyze the results from different perspectives to

Conclusion

In this paper, we proposed a novel rotation equivariant operation, named Gradient-Aligned Convolution (GAConv), which can replace regular Convolutions in CNN. GAConv is implemented with a prior pixel-level gradient alignment operation before the regular convolution. With GAConv, GACNN can achieve 360 rotation invariance without increasing the number of inputs and filters. Rotation invariant experiments have been conducted to evaluate the characteristics of GACNN. GACNN performs much better

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported by National Key R&D Program of China (No. 2017YFB1002703), National Key Basic Research Program of China (No. 2015CB554507), and National Natural Science Foundation of China (No. 61379082). This work is partially supported by China Scholarship Council (CSC).

You Hao is currently pursuing the Ph.D. degree with Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences. Recently, he is visiting at Medical Image Processing Group, Department of Radiology, University of Pennsylvania. His research interests include image invariants, medical image processing, and computer vision.

References (43)

  • P.Y. Simard et al.

    Best practices for convolutional neural networks applied to visual document analysis

    International Conference on Document Analysis and Recognition (ICDAR)

    (2003)
  • D.A. van Dyk et al.

    The art of data augmentation

    Journal of Computational and Graphical Statistics

    (2001)
  • S. Dieleman et al.

    Exploiting cyclic symmetry in convolutional neural networks

    Proceedings of the 33nd International Conference on Machine Learning

    (2016)
  • M. Jaderberg et al.

    Spatial transformer networks

    Advances in neural information processing systems

    (2015)
  • D. Laptev et al.

    Ti-pooling: Transformation-invariant pooling for feature learning in convolutional neural networks

    The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    (2016)
  • J. Li et al.

    Deep rotation equivariant network

    Neurocomputing

    (2018)
  • J.a.F. Henriques et al.

    Warped convolutions: Efficient invariance to spatial transformations

    Proceedings of the 34th International Conference on Machine Learning

    (2017)
  • L. Gao et al.

    Rotation-equivariant convolutional neural network ensembles in image processing

    Adjunct Proceedings of the 2019 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2019 ACM International Symposium on Wearable Computers

    (2019)
  • T. Cohen et al.

    Group equivariant convolutional networks

    Proceedings of the 33nd International Conference on Machine Learning

    (2016)
  • Y. Zhou et al.

    Oriented response networks

    The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    (2017)
  • X. Zhang et al.

    Rotation invariant local binary convolution neural networks

    The IEEE International Conference on Computer Vision (ICCV) Workshops

    (2017)
  • Cited by (6)

    • An intelligent and vision-based system for Baijiu brewing-sorghum discrimination

      2022, Measurement: Journal of the International Measurement Confederation
      Citation Excerpt :

      The Xception was implemented in the Python programming language using Pytorch library. Previous studies demonstrated that the required rotation invariance properties of pattern recognition could not be guaranteed in the process of CNN training and reasoning [12,24]. In this work, anti-aliasing algorithm was applied to modify the baseline Xception to reach a more robust model with rotation-invariance ability.

    You Hao is currently pursuing the Ph.D. degree with Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences. Recently, he is visiting at Medical Image Processing Group, Department of Radiology, University of Pennsylvania. His research interests include image invariants, medical image processing, and computer vision.

    Ping Hu is the Cloud Solution Architect in Microsoft. She was graduated from Institute of Computing Technology, Chinese Academy of Sciences as a Ph.D. Her current research interests are deep learning, computer vision and machine learning.

    Shirui Li received the Ph.D degree in computer science and application from University of Chinese Academy of Sciences, Beijing, China, 2018. He joined Baidu’s intelligent driving group in 2018. His research interests include 3d reconstruction, perception of autonomous vehicle, multimodal sensors calibration.

    Jayaram Udupa’s research focus has been developing theory and algorithms for image processing/ analysis, 3D visualization, machine learning, and numerous medical applications of these tools toward quantitative radiology, with many seminal contributions made to these areas for 40 years. He is a professor of Radiology at the University of Pennsylvania.

    Yubing Tong, PhD, is currently a senior research investigator and director of operations at medical image processing group (MIPG) at the University of Pennsylvania, Philadelphia, United States. His research interests include medical image processing and analysis, disease quantification, and machine learning.

    Hua Li is currently a Professor with Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences. His research interests include computer graphic and visualization, computer vision, shape analysis and image invariants.s

    View full text