Gradient-Aligned convolution neural network

doi:10.1016/j.patcog.2021.108354

Pattern Recognition

Volume 122, February 2022, 108354

https://doi.org/10.1016/j.patcog.2021.108354 Get rights and content

Highlights

•
We propose a general Convolution operation, called GAConv, which can replace conventional operations in CNN to help it achieve rotation invariance.
•
With GAConv, Gradient-Aligned CNN (GACNN) can achieve rotation invariance without any data augmentation, feature-map augmentation, and filter enrichment.
•
In GACNN, rotation invariance does not learn from the training set, but bases on the network model. Different from the vanilla CNN, GACNN will output invariant results for all rotated versions of an object, no matter whether the network is trained or not.
•
We conduct classification experiments on designed dataset and realistic datasets. The results show that with the same computation cost, GACNN achieved better results than conventional CNN and some rotational invariant CNN.

Abstract

Although Convolution Neural Networks (CNN) have achieved great success in many applications of computer vision in recent years, rotation invariance is still a difficult problem for CNN. Especially for some images, the content can appear in the image at any angle of rotation, such as medical images, microscopic images, remote sensing images and astronomical images. In this paper, we propose a novel convolution operation, called Gradient-Aligned Convolution (GAConv), which can help CNN achieve rotation invariance by replacing vanilla convolutions in CNN. GAConv is implemented with a prior pixel-level gradient alignment operation before regular convolution. With GAConv, Gradient-Aligned CNN (GACNN) can achieve rotation invariance without any data augmentation, feature-map augmentation, and filter enrichment. In GACNN, rotation invariance does not learn from the training set, but bases on the network model. Different from the vanilla CNN, GACNN will output invariant results for all rotated versions of an object, no matter whether the network is trained or not. This means that we only need to train the network with one canonical version of the object and all other rotated versions of this object should be recognized with the same accuracy. Classification experiments have been conducted to evaluate GACNN compared with some rotation invariant approaches. GACNN achieved the best results on the $360^{\circ}$ rotated test set of MNIST-rotation, Plankton-sub-rotation, and Galaxy Zoo 2.

Introduction

Since the birth of computer vision, rotation invariance has long been an important issue in the field. When we capture images from nature, the object may appear on the image at different rotation angles due to the relative position between the object and the camera. Especially for medical images, microscopic images, remote sensing images, and astronomical images, the angle of rotation can be arbitrary without being affected by gravity constraints in natural images.

When dealing with such images of the same object in different orientations, as shown in Fig. 1, we always want to get the same response so that our descriptors won’t be affected by the orientation of input. We can say that these responses or descriptors are rotation invariants. The formal definition of rotation invariance can be represented by Equation (1), $i n v a r i a n c e : f (I) = f (R (I)) .$ In the image coordinate system, we can define 2D rotation transformation $R$ on the input image $I$ . The function $f (\cdot)$ , which has the same output for different rotated versions of image $I$ , has rotation invariance. The GACNN proposed in this paper is a special case of $f (\cdot)$ .

In recent years, although CNN has achieved great success in many applications of computer vision, achieving rotation invariance through CNN has remained a challenge. CNN encodes local translation invariance somewhat implicitly by weight-shared convolution, pooling operation, and so on. However, there are some difficulties for encoding rotation. Interpolation is needed when rotating an image with an angle that is not multiple of $90^{\circ}$ , and the results may not be in the pixel grid of the original image domain. These complications make encoding rotation invariance in CNN quite challenging.

For human vision, psychophysical experiments have proven that prior rotation alignment exists to recognize rotated objects. Shepard & Metzler [1] introduced the concept of mental rotation in 1971, which has become one of the best-known experiments in the field. The conclusion is that the time it takes for humans to determine whether two rotated objects are the same depends on the angle of rotation between them. As shown in Fig. 1, the times it takes to recognize ‘7’ in the rotated versions are positively related to the angle between the rotated image and the first canonical version. It means that in order to recognize the image, humans actually rotate the object mentally.

Based on this basic mechanism of human vision, we propose a special convolution, called Gradient-Aligned Convolution (GAConv). Before the regular convolution operation, the gradient at each pixel will be calculated by the proposed Extended Circle Sobel operator and aligned to the canonical direction. It means that all the neighbors will be rotated to make sure the gradient at the central pixel is aligned to a chosen and fixed direction, as illustrated in Fig. 2. The left two figures show the gradients of the first two images in Fig. 1. In the right two figures, the gradients of the first two images in Fig. 1 are aligned to the same fixed canonical direction. From another perspective, the direction information of gradient will be ignored and only the magnitude of gradient participates in the subsequent convolution.

With GAConv, Gradient-Aligned CNN (GACNN) can achieve global rotation invariance without any data augmentation, feature map augmentation, and filter enrichment. Only one canonical direction is encoded into the filters. In GACNN, rotation invariance does not learn from the training set, but bases on the network model. Different from the vanilla CNN, GACNN will output invariant results for all rotated versions of an object, no matter whether the network is trained or not. This means we only need to train the network with one canonical version of the object. All other rotated versions of this object should be recognized with the same accuracy. Further, convolution operations in any CNN architecture can be replaced by GAConv to make the network achieve invariance to rotations. Classification experiments have been conducted to evaluate GACNN compared with regular CNN and some other rotation invariant approaches.

Section snippets

Related works

In the past decades, numerous rotation invariant methods have been designed. Techniques for achieving rotation invariance used in conventional methods can be roughly classified into three categories. The first is global alignment. A reference direction of the image should be obtained first. Then the entire image can be rotated to the reference direction to achieve global rotation invariance. The second is to ignore the orientation information in the image. Some statistical-based methods ignore

Gradient-Aligned convolution

Gradient-Aligned Convolution (GAConv) is an operation that can replace the regular convolutions in CNN. In conventional convolution, the convolution kernel $K$ moves pixel by pixel on the input feature map and performs matrix entry-wise product with the corresponding region $I_{i}$ on the input feature map. For GAConv, intuitively, a prior pixel-level gradient alignment is applied on the input feature-maps before the regular convolution, as presented in Fig. 2. In practice, we modified the process of

Experiments

In this section, we evaluate the effectiveness of GACNN on three datasets. In Section 4.1, we develop a new dataset, MNIST-rotation, based on MNIST [34]. In MNIST-rotation, only canonical versions of the digits are included in the training set. However, lots of rotated versions of digits are included in the test set. We conduct comparison experiments on MNIST-rotation with regular CNN and several recent rotation invariant approaches. We analyze the results from different perspectives to

Conclusion

In this paper, we proposed a novel rotation equivariant operation, named Gradient-Aligned Convolution (GAConv), which can replace regular Convolutions in CNN. GAConv is implemented with a prior pixel-level gradient alignment operation before the regular convolution. With GAConv, GACNN can achieve $360^{\circ}$ rotation invariance without increasing the number of inputs and filters. Rotation invariant experiments have been conducted to evaluate the characteristics of GACNN. GACNN performs much better

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported by National Key R&D Program of China (No. 2017YFB1002703), National Key Basic Research Program of China (No. 2015CB554507), and National Natural Science Foundation of China (No. 61379082). This work is partially supported by China Scholarship Council (CSC).

You Hao is currently pursuing the Ph.D. degree with Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences. Recently, he is visiting at Medical Image Processing Group, Department of Radiology, University of Pennsylvania. His research interests include image invariants, medical image processing, and computer vision.

References (43)

J. Flusser et al.
Pattern recognition by affine moment invariants
Pattern Recognit
(1993)
H. Yang et al.
Image analysis by log-polar exponent-fourier moments
Pattern Recognit
(2020)
K.M. Hosny et al.
New fractional-order legendre-fourier moments for pattern recognition applications
Pattern Recognit
(2020)
G. Cheng et al.
Rifd-cnn: Rotation-invariant and fisher discriminative convolutional neural networks for object detection
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2016)
R.N. Shepard et al.
Mental rotation of three-dimensional objects
Science
(1971)
M.-K. Hu
Visual pattern recognition by moment invariants
IRE Trans. Inf. Theory
(1962)
D.G. Lowe
Distinctive image features from scale-invariant keypoints
Int J Comput Vis
(2004)
H. Bay et al.
Surf: Speeded up robust features
European conference on computer vision
(2006)
T. Ojala et al.
Multiresolution gray-scale and rotation invariant texture classification with local binary patterns
IEEE Trans Pattern Anal Mach Intell
(2002)
G.M. Haley et al.
Rotation-invariant texture classification using modified gabor filters
Proceedings 1995 International Conference on Image Processing
(1995)

P.Y. Simard et al.

Best practices for convolutional neural networks applied to visual document analysis

International Conference on Document Analysis and Recognition (ICDAR)

(2003)

D.A. van Dyk et al.

The art of data augmentation

Journal of Computational and Graphical Statistics

(2001)

S. Dieleman et al.

Exploiting cyclic symmetry in convolutional neural networks

Proceedings of the 33nd International Conference on Machine Learning

(2016)

M. Jaderberg et al.

Spatial transformer networks

Advances in neural information processing systems

(2015)

D. Laptev et al.

Ti-pooling: Transformation-invariant pooling for feature learning in convolutional neural networks

The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

(2016)

J. Li et al.

Deep rotation equivariant network

Neurocomputing

(2018)

J.a.F. Henriques et al.

Warped convolutions: Efficient invariance to spatial transformations

Proceedings of the 34th International Conference on Machine Learning

(2017)

L. Gao et al.

Rotation-equivariant convolutional neural network ensembles in image processing

Adjunct Proceedings of the 2019 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2019 ACM International Symposium on Wearable Computers

(2019)

T. Cohen et al.

Group equivariant convolutional networks

Proceedings of the 33nd International Conference on Machine Learning

(2016)

Y. Zhou et al.

Oriented response networks

The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

(2017)

X. Zhang et al.

Rotation invariant local binary convolution neural networks

The IEEE International Conference on Computer Vision (ICCV) Workshops

(2017)

Cited by (6)

RIC-CNN: Rotation-Invariant Coordinate Convolutional Neural Network
2024, Pattern Recognition
Due to the lack of rotation invariance in traditional convolution operations, even acting a slight rotation on the input can severely degrade the performance of Convolutional Neural Networks (CNNs). To address this, we propose a Rotation-Invariant Coordinate Convolution (RIC-C), which achieves natural invariance to arbitrary rotations around the input center without additional trainable parameters or data augmentation. We first evaluate the rotational invariance of RIC-C using the MNIST dataset and compare its performance with most previous rotation-invariant CNN models. RIC-C achieves state-of-the-art classification on the MNIST-rot test set without data augmentation and with lower computational costs. Then, the interchangeability of RIC-C with traditional convolution operations is demonstrated by seamlessly integrating it into common CNN models like VGG, ResNet, and DenseNet. We conduct remote sensing image classification on the NWPU VHR-10, MTARSI and AID datasets and patch matching experiments on the UBC benchmark dataset, showing that RIC-C significantly enhances the performance of CNN models across different applications, especially when training data is limited. Our codes can be downloaded from https://github.com/HanlinMo/Rotation-Invariant-Coordinate-Convolutional-Neural-Network.git.
An intelligent and vision-based system for Baijiu brewing-sorghum discrimination
2022, Measurement: Journal of the International Measurement Confederation
Citation Excerpt :
The Xception was implemented in the Python programming language using Pytorch library. Previous studies demonstrated that the required rotation invariance properties of pattern recognition could not be guaranteed in the process of CNN training and reasoning [12,24]. In this work, anti-aliasing algorithm was applied to modify the baseline Xception to reach a more robust model with rotation-invariance ability.
A machine vision system was developed to discriminate in-demand and unwanted Baijiu brewing-sorghum at single kernel sample level. Three types of in-demand sorghum and seven types of unwanted sorghum were detected. Xception was employed to build classification model, reaching 89.08% and 88.21% correct classification rate for training and validation set, respectively. To achieve higher performance, two types of anti-aliased networks (anti-aliased max pooling (AntiMaxP) and anti-aliased convolutional (AntiConV)). Compared with the baseline Xception, the AntiMaxP and AntiConV both achieved higher overall accuracy. The AntiConV model obtained the best result, with accuracy of 89.22% and 89.15% for training and validation set, respectively. In view of practical application, the AntiConV model also obtained the most satisfactory result. Thus, AntiConV mdoel was integrated in the system. Adulterated samples were prepared to test the whole system. The results showed feasibility of the intelligent vision-based system to meet the practical application demands of Baijiu industry.
Lie Group Convolution Neural Networks with Scale-Rotation Equivariance
2024, SSRN
Scale-Rotation-Equivariant Lie Group Convolution Neural Networks (Lie Group-CNNs)
2023, arXiv
The construction and application of integral invariants and differential invariants of graphics and images
2022, Journal of Graphics
Recurrent Affine Transform Encoder for Image Representation
2022, IEEE Access

Ping Hu is the Cloud Solution Architect in Microsoft. She was graduated from Institute of Computing Technology, Chinese Academy of Sciences as a Ph.D. Her current research interests are deep learning, computer vision and machine learning.

Shirui Li received the Ph.D degree in computer science and application from University of Chinese Academy of Sciences, Beijing, China, 2018. He joined Baidu’s intelligent driving group in 2018. His research interests include 3d reconstruction, perception of autonomous vehicle, multimodal sensors calibration.

Jayaram Udupa’s research focus has been developing theory and algorithms for image processing/ analysis, 3D visualization, machine learning, and numerous medical applications of these tools toward quantitative radiology, with many seminal contributions made to these areas for 40 years. He is a professor of Radiology at the University of Pennsylvania.

Yubing Tong, PhD, is currently a senior research investigator and director of operations at medical image processing group (MIPG) at the University of Pennsylvania, Philadelphia, United States. His research interests include medical image processing and analysis, disease quantification, and machine learning.

Hua Li is currently a Professor with Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences. His research interests include computer graphic and visualization, computer vision, shape analysis and image invariants.s

View full text

Gradient-Aligned convolution neural network

Highlights

Abstract

Introduction

Section snippets

Related works

Gradient-Aligned convolution

Experiments

Conclusion

Declaration of Competing Interest

Acknowledgments

Pattern Recognit

Pattern Recognit

Pattern Recognit

Mental rotation of three-dimensional objects

Science

Visual pattern recognition by moment invariants

IRE Trans. Inf. Theory

Distinctive image features from scale-invariant keypoints

Int J Comput Vis

Surf: Speeded up robust features

European conference on computer vision

Multiresolution gray-scale and rotation invariant texture classification with local binary patterns

IEEE Trans Pattern Anal Mach Intell

Rotation-invariant texture classification using modified gabor filters

Proceedings 1995 International Conference on Image Processing

Best practices for convolutional neural networks applied to visual document analysis

International Conference on Document Analysis and Recognition (ICDAR)

The art of data augmentation

Journal of Computational and Graphical Statistics

Exploiting cyclic symmetry in convolutional neural networks

Proceedings of the 33nd International Conference on Machine Learning

Spatial transformer networks

Advances in neural information processing systems

Ti-pooling: Transformation-invariant pooling for feature learning in convolutional neural networks

The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Deep rotation equivariant network

Neurocomputing

Warped convolutions: Efficient invariance to spatial transformations

Proceedings of the 34th International Conference on Machine Learning

Rotation-equivariant convolutional neural network ensembles in image processing

Adjunct Proceedings of the 2019 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2019 ACM International Symposium on Wearable Computers

Group equivariant convolutional networks

Proceedings of the 33nd International Conference on Machine Learning

Oriented response networks

The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Rotation invariant local binary convolution neural networks

The IEEE International Conference on Computer Vision (ICCV) Workshops