Elsevier

Knowledge-Based Systems

Volume 235, 10 January 2022, 107624
Knowledge-Based Systems

Manifold-based aggregation clustering for unsupervised vehicle re-identification

https://doi.org/10.1016/j.knosys.2021.107624Get rights and content

Highlights

  • A Manifold-based Aggregation Clustering framework is proposed for unsupervised V-reID without any annotations.

  • An agglomeration-classification loss is proposed to learn aggregated features.

  • A manifold-based distance and cluster diversity are formulated to determine the seeded vehicle criterion.

Abstract

Most vehicle re-identification (V-reID) approaches are based on supervised learning methods which require a considerable amount of tedious and impractical annotations. In this paper, we propose a novel unsupervised V-reID approach based on Manifold-based Aggregation Clustering (MAC) with the unknown number of clusters. The proposed MAC is implemented by alternatively conducting two modules, i.e., deep feature learning module and aggregation clustering module. Specifically, deep feature learning module is responsible for training a convolutional neural network to encourage deep features to be close to the centroids of corresponding clusters which are yielded by an aggregation clustering mechanism based on manifold distance in the feature space. Moreover, the classification-agglomeration loss and manifold-based seeds searching criterion are proposed to improve the discriminative power of the learned features and deal with the problem of varied visual appearance respectively. Note that both annotations and even the certain number of vehicle identities are unknown for the proposed method, which is totally consistent with the real-world unsupervised V-reID condition. Extensive experiments on VehicleID and Veri-776 benchmark datasets show that the proposed method outperforms the state-of-the-art unsupervised V-reID approaches.

Introduction

Vehicle re-identification (V-reID) can be conducted by ranking the vehicles in the gallery according to the similarities with the query vehicle. V-reID plays a significant role in modern public transportation systems, video surveillance for traffic control and security [1], [2]. Therefore, it has become popular in the re-identification community over the past few years. Thanks to the development of deep learning, an increasing number of well-behaved methods for V-reID have been proposed with the deep learning technology.

Compared with person re-identification [3], [4], the characteristic difference of V-reID is that near-duplicate vehicles may be named with different IDs. Meanwhile, the equipment setting of camera makes a series of vehicle images belonging to the same ID with huge varied visual appearances. Thus, most of V-reID methods aim to decrease the difference of intra-class while increasing that of inter-class with the help of supervised deep learning technology. To this end, most V-reID works are always conducted with supervised learning methods which require a considerable account of additional annotations for training datasets [5], [6], [7], e.g., vehicle ID, camera ID, color, viewpoint, etc. For instance, vehicle IDs can be used for supervised learning, camera IDs assist in indicating spatial–temporal prior knowledge, to name a few. With a variety of annotations, researchers can train a flexible deep network from different perspectives. With the development of supervised V-reID methods [5], [8], [9], [10], [11], it can be observed empirically that better performance can be achieved the more attribution labels are given. However, the annotation work served to supervised deep learning is tedious and impractical in the real large-scale vehicle datasets. Thus, this paper aims to extract the deep vehicle feature in an unsupervised manner for solving the V-reID problem.

Actually, unsupervised deep learning for image representation has been a hot topic in the deep neural network community and has achieved remarkable success in image analysis, e.g., image classification [12] and object detection [13]. One can accomplish the learning task without the class-specific information1 to facilitate unsupervised deep learning. The unsupervised person re-identification approaches have been proposed [14], [15], [16] recently. Generally, the re-identification methods can be further subdivided into two categories, transfer learning from supervised source domain to unsupervised target domain and the unsupervised methods with known number of clusters. As it is known that the number of categories cannot be obtained exactly from the large-scale real vehicle dataset, this paper devotes to unsupervised V-reID with unknown number of clusters or any annotations.

As shown in Fig. 1, the motivation comes from the self-organized learning mechanism. It is believed that there are a few valuable seeded vehicles in the primary population which can be employed to reveal the intrinsic structure. In order to train a deep CNN in an unsupervised manner to improve the discriminative vehicle feature, we search for a few seeds to generate the clusters which can facilitate the optimization of CNN parameters. In this paper, the feature space is derived from deep CNN and the partition can be realized by searching for the seeded vehicle features and then merging them into clusters. Moreover, the seed searching criterion is proposed based on the manifold distance to measure the similarities between clusters due to the varied visual vehicle appearances with the same vehicle ID. Unlike the self-supervised deep learning methods that employ the certain number of identities as the prior knowledge to generate pseudo labels, the proposed approach concerns the minimization problem of the proposed classification-agglomeration loss formulated with the deep feature and the corresponding centroids of unknown number of clusters. The details of methodology will be elaborated in the methodology section.

To sum up, the contributions of this paper can be summarized in three-fold:

  • We propose a novel unsupervised V-reID approach containing deep feature learning and aggregation clustering modules in the challenging case where a variety of annotations and even the certain number of identities are not accessible.

  • The classification-agglomeration loss is designed to optimize the deep CNN parameters in the deep feature learning module, which not only enlarges the distance between different clusters but also shrinks the distance between vehicles in one cluster.

  • We propose a manifold-based seed searching criterion to encourage the vehicles with a close manifold distance in the feature space to be aggregated into the same clusters. Moreover, a diversity penalty term is introduced to keep the number of samples in clusters balanced, which is consistent with the real-world distribution.

Section snippets

Supervised V-reID

To learn the discriminative feature from large-scale vehicle datasets with the problems of viewpoint variations across multiple cameras, illumination and occlusion, supervised V-reID approaches adopt the deep neural networks to approximate the mapping function that projects original vehicle images into similarity-measured feature space. Given the vehicle IDs, the metric based loss functions, e.g., triplet loss [8], [17], have been proposed at the beginning stage of supervised V-reID approaches.

Methodology

As shown in Fig. 2, the proposed Manifold-based Aggregation Clustering (MAC) framework consists of two modules, i.e., deep feature learning module and aggregation clustering module. The learning procedure neither introduces the annotation nor the number of clusters. By keeping searching for the seeded vehicles for the partitions in the feature space, the CNN is trained towards discriminative feature learning. In this section, we first outline the proposed framework and then elaborate the

Datasets

To evaluate the effectiveness of the proposed method, we conduct experiments on the popular benchmark datasets including VehicleID [8] and Veri-776 [5]. As a larger-scale dataset designed especially for vehicle Re-ID, VehicleID [8] contains as much as 221763 images belonging to 26267 vehicles. VehicleID extracts images of 800, 1600 and 2400 vehicles to form three kinds of test sets (small, medium and large respectively).

Veri-776 [5] consists of more than 50000 images of 776 vehicles captured by

Conclusion

In this paper, we propose a manifold-based aggregation clustering method to tackle the problem of vehicle Re-ID in a more strictly unsupervised setting where any kind of annotation information and even the certain number of identities are unavailable by jointly training a CNN model and aggregating clusters. Apart from a non-parametric classifier, we propose an agglomeration loss to facilitate network training and introduce a manifold-based distance as the criterion to search robust seeded

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research was supported by Zhejiang Provincial Natural Science Foundation of China under Grant No. LQ20F030015. Our deepest gratitude goes to the anonymous reviewers for their careful reviews and valuable suggestions which are very helpful for improving this paper.

References (40)

  • K. Yan, Y. Tian, Y. Wang, W. Zeng, T. Huang, Exploiting multi-grain ranking constraints for precisely searching...
  • Y. Lou, Y. Bai, J. Liu, S. Wang, L. Duan, Veri-wild: a large dataset and a new method for vehicle re-identification in...
  • P. Wang, B. Jiao, L. Yang, Y. Yang, S. Zhang, W. Wei, Y. Zhang, Vehicle re-identification in aerial imagery: dataset...
  • X. Ji, A. Vedaldi, J. Henriques, Invariant information clustering for unsupervised image classification and...
  • Y. Yang, A. Loquercio, D. Scaramuzza, S. Soatto, Unsupervised moving object detection via contextual information...
  • Y. Lin, X. Dong, L. Zheng, Y. Yan, Y. Yang, A bottom-up clustering approach to unsupervised person re-identification,...
  • YuH.-X. et al.

    Unsupervised person re-identification by deep asymmetric metric embedding

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2020)
  • LiM. et al.

    Unsupervised tracklet person re-identification

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2020)
  • Y. Zhang, D. Liu, Z. Zha, Improving triplet-wise training of convolutional neural network for vehicle...
  • Y. Sun, M. Li, J. Lu, Part-based multi-stream model for vehicle searching, in: International Conference on Pattern...
  • View full text