Elsevier

Neurocomputing

Volume 378, 22 February 2020, Pages 387-398
Neurocomputing

Deep Gabor convolution network for person re-identification

https://doi.org/10.1016/j.neucom.2019.10.083Get rights and content

Abstract

Person re-identification is an import problem in computer vision fields and more and more deep neural network models have been developed for representation learning in this task due to their good performance. However, compared with hand-crafted feature representations, deep learned features cannot not be interpreted easily. To meet this demand, motivated by the Gabor filters’ good interpretability and the deep neural network models’ reliable learning ability, we propose a new convolution module for deep neural networks based on Gabor function (Gabor convolution). Compared with classical convolution module, every parameter in the proposed Gabor convolution kernel has a specific meaning while classical one has not. The Gabor convolution module has a good texture representation ability and is effective when it is embedded in the low layers of a network. Besides, in order to make the proposed Gabor module meaningful, a new loss function designed for this module is proposed as a regularizer of total loss function. By embedding the Gabor convolution module to the Resnet-50 network, we show that it has a good representation learning ability for person re-identification. And experiments on three widely used person re-identification datasets show favorable results compared with the state-of-the-arts.

Introduction

Person re-identification addresses the problem of matching persons across non-overlapping camera networks, which has attracted many researchers recent years. It can be regarded as a retrieval problem as well as a classification problem. Person re-identification is a challenge problem as there are various changes for person appearance, such as different illumination condition, different viewpoint and pose changes, etc.

As many computer vision tasks do, the first step for person re-identification is to extract feature representations for person images. Traditionally, a hand-crafted descriptor will be extracted such as color histogram of different color spaces (e.g. RGB, HSV, YCrCb, Lab), texture histogram (e.g. LBP, SILTP, Gabor filters) and combination of them (e.g. ELF [1], SDALF [2], LOMO [3], GOG [4]),enhanced LOMO [5]. Recently, with the success of deep neural networks (DNN) in computer vision fields, more and more works focus on representation learning and feature representation will be learned through DNN models, such as [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26]. A hand-crafted feature representation is direct and can be interpreted easily, but it is less discriminative than deep features. While a deep feature representation often has a more discriminative ability, but it cannot be interpreted as easy as hand-crafted features. It is a consistent demand that one DNN module can be interpreted as hand-crafted features do.

In this paper, to meet the demand for explaining what DNN modules learn, motivated by the Gabor filters’ good interpretability and the DNN models’ reliable learning ability, we propose a new convolution module for deep neural networks based on Gabor function, which has a good interpretability and compatibility with deep neural network models. Gabor filters are generated from Gabor function and have been extensively used in computer vision tasks as they show impressive ability to model texture information for images. Traditionally, for the usage of Gabor filters, as shown in Fig. 1(a), we will first generate a group filters based on Gabor function with a group of predefined parameters, and then convolute them with an image to get a series of feature maps and at last histograms are computed on these feature maps.

As we can expect, in order to apply Gabor filters, we have to manually select proper values of parameters of Gabor function which is a cumbersome task. Besides, hand-designed parameters only cover a very small range of parameter space which will be suboptimal for certain inputs. Motivated by the learning ability of deep neural networks, it is natural to come out that we can learn the parameters of Gabor function through a deep neural network model instead of manually designing for solving above drawbacks of Gabor filters. In order to make Gabor filters be compatible with deep neural network models, we design Gabor filter as a special convolution module that all the convolution kernels are generated from Gabor function with learnable parameters. As shown in Fig. 1(b), the generated Gabor filter is embedded into a neural network as a convolution layer and all the parameters are learned when the network is trained. In this pipeline, feature representations are acquired through the output of a certain layer. Next, we will refer the proposed convolution module as Gabor convolution.

Different from general convolution module, where all the elements of the convolution kernel are randomly generated and there are no relationships between them, the Gabor convolution kernel is generated from Gabor function and each element in the kernel is related to each other. As every parameter of Gabor convolution has a specific meaning and we can interpret what this module learns easily.

Note that every parameter of the Gabor convolution module has a specific range. In order to make the parameters of Gabor convolution legal when it is trained in a DNN model, we have to constrain the range of each parameter. To achieve this purpose, we propose a new regularizer loss function designed for the Gabor convolution module by taking advantage of the hinge function. Experiments show that with the proposed regularizer loss, the Gabor convolution module can learn legal parameters and improves the performance of person re-identification.

The main contributions of this paper can be summarized as

  • A new convolution module is proposed for the DNN based on Gabor function and compared with traditional convolution module, and the proposed Gabor Convolution is more suit for low-level and show admi effect.

  • A new regularizer loss function designed for the Gabor Convolution module is proposed by taking advantage of the hinge function.

  • Performance of person re-identification is improved by embedding Gabor convolution module to ResNet-50 and extensive experiments validate the effectiveness of the proposed Gabor convolution module.

In the next section, we will review the related works. And then we will present the proposed Gabor convolution module in Section 3. Section 4 presents an extensive comparison with state-of-the-art algorithms, and we analyze each component of our method. Section 5 concludes the paper and discusses the future works.

Section snippets

Related works

In this section, we will review two types of works that are related to our work, (1) Gabor Filter related works in person re-identification, (2) deep neural networks based models for person re-identification.

Revisit to Gabor filter and parameters standardization

A two-dimensional Gabor function is defined as a sinusoidal wave multiplied by a Gaussian function in a complex number formgΘ(x,y)=exp(x2+γ2y22σ2)exp(i(2πxλ+ϕ)),wherex=xcos(θ)+ysin(θ)y=xsin(θ)+ycos(θ),i is imaginary unit and Θ={λ,θ,ϕ,σ,γ} is the parameter. Note that Eq. (1) can be expressed in a real number form, where the real part isgΘreal(x,y)=exp(x2+γ2y22σ2)cos(2πxλ+ϕ),and the imaginary part isgΘimg(x,y)=exp(x2+γ2y22σ2)sin(2πxλ+ϕ).Both of the two parts can be used for

Evaluation and datasets

We conduct experiments on three widely used person re-identification datasets, namely, Market1501 [48], DukeMTMC-REID [49] and CUHK03 [50]. Besides, in order to validate the effectiveness of the proposed Gabor convolution module, we also perform a series of experiments on two image classification datasets MNIST and Cifar10 as this task is similar to person re-identification as far as feature representation learning. For the person re-identification task, each dataset is divided into training

Conclusion and future works

In this paper, we propose a new convolution module for deep neural network models for person re-identification name Gabor convolution. The proposed Gabor convolution module show good interpretability for deep neural networks models as well as superior performance in low-level layers of a network. Apart from the designation of the new module, a new regularizer loss function based on hinge function is proposed to constrain the range of each parameter of the Gabor convolution kernels.

Declaration of Competing Interest

None.

Yuan Yuan (M’05-SM’09) is currently a Full Professor with the School of Computer Science and the Center for OPTical IMagery Analysis and Learning, Northwestern Polytechnical University, Xi’an, China. She has authored or coauthored over 150 papers, including about 100 in reputable journals, such as the IEEE TRANSACTIONS AND PATTERN RECOGNITION, and also conference papers in CVPR, BMVC, ICIP, and ICASSP. Her current research interests include visual information processing and image/video content

References (69)

  • S. Zhou et al.

    Point to set similarity based deep feature learning for person reidentification

    Proceedings of IEEE Conference on Computer Vision and Pattern Recognition

    (2017)
  • W. Chen et al.

    Beyond triplet loss: a deep quadruplet network for person re-identification

    Proceedings of IEEE Conference on Computer Vision and Pattern Recognition

    (2017)
  • T. Xiao et al.

    Joint detection and identification feature learning for person search

    Proceedings of IEEE Conference on Computer Vision and Pattern Recognition

    (2017)
  • R. Yu et al.

    Hard-aware point-to-set deep metric for person re-identification

    Proceedings of European Conference on Computer Vision

    (2018)
  • E. Ahmed et al.

    An improved deep learning architecture for person re-identification

    Proceedings of IEEE Conference on Computer Vision and Pattern Recognition

    (2015)
  • L. Wu, C. Shen, A.v. d. Hengel, Personnet: Person Re-identification with Deep Convolutional Neural Networks, arXiv...
  • C. Zhao et al.

    Kernelized random kiss metric learning for person re-identification

    Neurocomputing

    (2017)
  • R.R. Varior et al.

    Gated siamese convolutional neural network architecture for human re-identification

    Proceedings of European Conference on Computer Vision

    (2016)
  • J. Lin et al.

    Consistent-aware deep learning for person re-identification in a camera network

    Proceedings of IEEE Conference on Computer Vision and Pattern Recognition

    (2017)
  • D. Li et al.

    Learning deep context-aware features over body and latent parts for person re-identification

    Proceedings of IEEE Conference on Computer Vision and Pattern Recognition

    (2017)
  • L. Zhao et al.

    Deeply-learned part-aligned representations for person re-identification.

    Proceedings of IEEE International Conference on Computer Vision

    (2017)
  • X. Liu et al.

    Hydraplus-net: attentive deep features for pedestrian analysis

    Proceedings of IEEE International Conference on Computer Vision

    (2017)
  • J. Si et al.

    Dual attention matching network for context-aware feature sequence based person re-identification

    Proceedings of IEEE Conference on Computer Vision and Pattern Recognition

    (2018)
  • H. Liu et al.

    End-to-end comparative attention networks for person re-identification

    IEEE Trans. Image Process.

    (2017)
  • H. Zhao et al.

    Spindle net: person re-identification with human body region guided feature decomposition and fusion

    Proceedings of IEEE Conference on Computer Vision and Pattern Recognition

    (2017)
  • L. Zheng, Y. Huang, H. Lu, Y. Yang, Pose Invariant Embedding for Deep Person Re-identification, arXiv preprint...
  • C. Su et al.

    Pose-driven deep convolutional model for person re-identification

    Proceedings of IEEE International Conference on Computer Vision

    (2017)
  • M.M. Kalayeh et al.

    Human semantic parsing for person re-identification

    Proceedings of IEEE Conference on Computer Vision and Pattern Recognition

    (2018)
  • J.G. Daugman

    Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters

    J. Opt. Soc. Am. A

    (1985)
  • B. Ma et al.

    Bicov: a novel image representation for person re-identification and face verification

    Proceedings of British Machive Vision Conference

    (2012)
  • C. Liu et al.

    Person re-identification: what features are important?

    Proceedings of European Conference on Computer Vision

    (2012)
  • W. Li et al.

    Locally aligned feature transforms across views

    Proceedings of IEEE Conference on Computer Vision and Pattern Recognition

    (2013)
  • Q. Wang et al.

    Weakly supervised adversarial domain adaptation for semantic segmentation in urban scenes

    IEEE Trans. Image Process.

    (2019)
  • Q. Wang et al.

    Robust hierarchical deep learning for vehicular management

    IEEE Trans. Circuits Syst. Video Technol.

    (2019)
  • Cited by (40)

    • ASPD-Net: Self-aligned part mask for improving text-based person re-identification with adversarial representation learning

      2022, Engineering Applications of Artificial Intelligence
      Citation Excerpt :

      Finally, Section 5 gives the conclusion of this paper. Person re-identification has drawn increasing attention in both academical and industrial fields (Sun et al., 2018; Zhong et al., 2019; Song et al., 2018; Wang et al., 2021b; Zhang et al., 2021b; Li et al., 2021; Chen et al., 2021; Zhang et al., 2021a; Liu et al., 2018; Li et al., 2017b; Su et al., 2017; Cheng et al., 2016; Yuan et al., 2020; Lu et al., 2020; Zhu et al., 2021b, 2020; Daihong et al., 2021, 2022). With the development of deep learning, deep learning methods are in general playing a major role in current state-of-the-art works.

    • SUM: Serialized Updating and Matching for text-based person retrieval

      2022, Knowledge-Based Systems
      Citation Excerpt :

      Image denoising networks combined with common used person re-identification networks can be adapted to this kind of problem. Yuan et al. [33] propose a Gabor convolution module for deep neural networks based on Gabor function, which has a good texture representation ability and is effective when it is embedded in the low layers of a network. Taking advantage of the hinge function, they also design a new regularizer loss function to make the proposed Gabor Convolution module meaningful.

    • A unified perspective of classification-based loss and distance-based loss for cross-view gait recognition

      2022, Pattern Recognition
      Citation Excerpt :

      Although deep neural network based models are able to achieve good performance in representation learning, deep learned features cannot be interpreted easily. To meet this demand, Yuan et al. [24] proposed a Gabor convolution module which showed both good interpretability and superior performance. Gait recognition is a metric learning problem.

    View all citing articles on Scopus

    Yuan Yuan (M’05-SM’09) is currently a Full Professor with the School of Computer Science and the Center for OPTical IMagery Analysis and Learning, Northwestern Polytechnical University, Xi’an, China. She has authored or coauthored over 150 papers, including about 100 in reputable journals, such as the IEEE TRANSACTIONS AND PATTERN RECOGNITION, and also conference papers in CVPR, BMVC, ICIP, and ICASSP. Her current research interests include visual information processing and image/video content analysis.

    Jian’an Zhang received the B.E. degree in information and computing science from the Ocean University of China, Qing Dao, China, 2015. He is currently persuing the Ph.D. degree with the Center for Optical Imagery Analysis and Learning, Northwestern Polytechnical University, Xian. His research interests include computer vision and pattern recognition.

    Qi Wang (M’15-SM’15) received the B.E. degree in automation and the Ph.D. degree in pattern recognition and intelligent systems from the University of Science and Technology of China, Hefei, China, in 2005 and 2010, respectively. He is currently a Professor with the School of Computer Science, with the Unmanned System Research Institute, and with the Center for OPTical IMageryAnalysis and Learning (OPTIMAL), Northwestern Polytechnical University, Xi’an, China. His research interests include computer vision and pattern recognition.

    View full text