Gradient-Aligned convolution neural network
Introduction
Since the birth of computer vision, rotation invariance has long been an important issue in the field. When we capture images from nature, the object may appear on the image at different rotation angles due to the relative position between the object and the camera. Especially for medical images, microscopic images, remote sensing images, and astronomical images, the angle of rotation can be arbitrary without being affected by gravity constraints in natural images.
When dealing with such images of the same object in different orientations, as shown in Fig. 1, we always want to get the same response so that our descriptors won’t be affected by the orientation of input. We can say that these responses or descriptors are rotation invariants. The formal definition of rotation invariance can be represented by Equation (1),In the image coordinate system, we can define 2D rotation transformation on the input image . The function , which has the same output for different rotated versions of image , has rotation invariance. The GACNN proposed in this paper is a special case of .
In recent years, although CNN has achieved great success in many applications of computer vision, achieving rotation invariance through CNN has remained a challenge. CNN encodes local translation invariance somewhat implicitly by weight-shared convolution, pooling operation, and so on. However, there are some difficulties for encoding rotation. Interpolation is needed when rotating an image with an angle that is not multiple of , and the results may not be in the pixel grid of the original image domain. These complications make encoding rotation invariance in CNN quite challenging.
For human vision, psychophysical experiments have proven that prior rotation alignment exists to recognize rotated objects. Shepard & Metzler [1] introduced the concept of mental rotation in 1971, which has become one of the best-known experiments in the field. The conclusion is that the time it takes for humans to determine whether two rotated objects are the same depends on the angle of rotation between them. As shown in Fig. 1, the times it takes to recognize ‘7’ in the rotated versions are positively related to the angle between the rotated image and the first canonical version. It means that in order to recognize the image, humans actually rotate the object mentally.
Based on this basic mechanism of human vision, we propose a special convolution, called Gradient-Aligned Convolution (GAConv). Before the regular convolution operation, the gradient at each pixel will be calculated by the proposed Extended Circle Sobel operator and aligned to the canonical direction. It means that all the neighbors will be rotated to make sure the gradient at the central pixel is aligned to a chosen and fixed direction, as illustrated in Fig. 2. The left two figures show the gradients of the first two images in Fig. 1. In the right two figures, the gradients of the first two images in Fig. 1 are aligned to the same fixed canonical direction. From another perspective, the direction information of gradient will be ignored and only the magnitude of gradient participates in the subsequent convolution.
With GAConv, Gradient-Aligned CNN (GACNN) can achieve global rotation invariance without any data augmentation, feature map augmentation, and filter enrichment. Only one canonical direction is encoded into the filters. In GACNN, rotation invariance does not learn from the training set, but bases on the network model. Different from the vanilla CNN, GACNN will output invariant results for all rotated versions of an object, no matter whether the network is trained or not. This means we only need to train the network with one canonical version of the object. All other rotated versions of this object should be recognized with the same accuracy. Further, convolution operations in any CNN architecture can be replaced by GAConv to make the network achieve invariance to rotations. Classification experiments have been conducted to evaluate GACNN compared with regular CNN and some other rotation invariant approaches.
Section snippets
Related works
In the past decades, numerous rotation invariant methods have been designed. Techniques for achieving rotation invariance used in conventional methods can be roughly classified into three categories. The first is global alignment. A reference direction of the image should be obtained first. Then the entire image can be rotated to the reference direction to achieve global rotation invariance. The second is to ignore the orientation information in the image. Some statistical-based methods ignore
Gradient-Aligned convolution
Gradient-Aligned Convolution (GAConv) is an operation that can replace the regular convolutions in CNN. In conventional convolution, the convolution kernel moves pixel by pixel on the input feature map and performs matrix entry-wise product with the corresponding region on the input feature map. For GAConv, intuitively, a prior pixel-level gradient alignment is applied on the input feature-maps before the regular convolution, as presented in Fig. 2. In practice, we modified the process of
Experiments
In this section, we evaluate the effectiveness of GACNN on three datasets. In Section 4.1, we develop a new dataset, MNIST-rotation, based on MNIST [34]. In MNIST-rotation, only canonical versions of the digits are included in the training set. However, lots of rotated versions of digits are included in the test set. We conduct comparison experiments on MNIST-rotation with regular CNN and several recent rotation invariant approaches. We analyze the results from different perspectives to
Conclusion
In this paper, we proposed a novel rotation equivariant operation, named Gradient-Aligned Convolution (GAConv), which can replace regular Convolutions in CNN. GAConv is implemented with a prior pixel-level gradient alignment operation before the regular convolution. With GAConv, GACNN can achieve rotation invariance without increasing the number of inputs and filters. Rotation invariant experiments have been conducted to evaluate the characteristics of GACNN. GACNN performs much better
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work is supported by National Key R&D Program of China (No. 2017YFB1002703), National Key Basic Research Program of China (No. 2015CB554507), and National Natural Science Foundation of China (No. 61379082). This work is partially supported by China Scholarship Council (CSC).
You Hao is currently pursuing the Ph.D. degree with Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences. Recently, he is visiting at Medical Image Processing Group, Department of Radiology, University of Pennsylvania. His research interests include image invariants, medical image processing, and computer vision.
References (43)
- et al.
Pattern recognition by affine moment invariants
Pattern Recognit
(1993) - et al.
Image analysis by log-polar exponent-fourier moments
Pattern Recognit
(2020) - et al.
New fractional-order legendre-fourier moments for pattern recognition applications
Pattern Recognit
(2020) - et al.
Rifd-cnn: Rotation-invariant and fisher discriminative convolutional neural networks for object detection
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2016) - et al.
Mental rotation of three-dimensional objects
Science
(1971) Visual pattern recognition by moment invariants
IRE Trans. Inf. Theory
(1962)Distinctive image features from scale-invariant keypoints
Int J Comput Vis
(2004)- et al.
Surf: Speeded up robust features
European conference on computer vision
(2006) - et al.
Multiresolution gray-scale and rotation invariant texture classification with local binary patterns
IEEE Trans Pattern Anal Mach Intell
(2002) - et al.
Rotation-invariant texture classification using modified gabor filters
Proceedings 1995 International Conference on Image Processing
(1995)
Best practices for convolutional neural networks applied to visual document analysis
International Conference on Document Analysis and Recognition (ICDAR)
The art of data augmentation
Journal of Computational and Graphical Statistics
Exploiting cyclic symmetry in convolutional neural networks
Proceedings of the 33nd International Conference on Machine Learning
Spatial transformer networks
Advances in neural information processing systems
Ti-pooling: Transformation-invariant pooling for feature learning in convolutional neural networks
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Deep rotation equivariant network
Neurocomputing
Warped convolutions: Efficient invariance to spatial transformations
Proceedings of the 34th International Conference on Machine Learning
Rotation-equivariant convolutional neural network ensembles in image processing
Adjunct Proceedings of the 2019 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2019 ACM International Symposium on Wearable Computers
Group equivariant convolutional networks
Proceedings of the 33nd International Conference on Machine Learning
Oriented response networks
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Rotation invariant local binary convolution neural networks
The IEEE International Conference on Computer Vision (ICCV) Workshops
Cited by (6)
RIC-CNN: Rotation-Invariant Coordinate Convolutional Neural Network
2024, Pattern RecognitionAn intelligent and vision-based system for Baijiu brewing-sorghum discrimination
2022, Measurement: Journal of the International Measurement ConfederationCitation Excerpt :The Xception was implemented in the Python programming language using Pytorch library. Previous studies demonstrated that the required rotation invariance properties of pattern recognition could not be guaranteed in the process of CNN training and reasoning [12,24]. In this work, anti-aliasing algorithm was applied to modify the baseline Xception to reach a more robust model with rotation-invariance ability.
Recurrent Affine Transform Encoder for Image Representation
2022, IEEE Access
You Hao is currently pursuing the Ph.D. degree with Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences. Recently, he is visiting at Medical Image Processing Group, Department of Radiology, University of Pennsylvania. His research interests include image invariants, medical image processing, and computer vision.
Ping Hu is the Cloud Solution Architect in Microsoft. She was graduated from Institute of Computing Technology, Chinese Academy of Sciences as a Ph.D. Her current research interests are deep learning, computer vision and machine learning.
Shirui Li received the Ph.D degree in computer science and application from University of Chinese Academy of Sciences, Beijing, China, 2018. He joined Baidu’s intelligent driving group in 2018. His research interests include 3d reconstruction, perception of autonomous vehicle, multimodal sensors calibration.
Jayaram Udupa’s research focus has been developing theory and algorithms for image processing/ analysis, 3D visualization, machine learning, and numerous medical applications of these tools toward quantitative radiology, with many seminal contributions made to these areas for 40 years. He is a professor of Radiology at the University of Pennsylvania.
Yubing Tong, PhD, is currently a senior research investigator and director of operations at medical image processing group (MIPG) at the University of Pennsylvania, Philadelphia, United States. His research interests include medical image processing and analysis, disease quantification, and machine learning.
Hua Li is currently a Professor with Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences. His research interests include computer graphic and visualization, computer vision, shape analysis and image invariants.s