Elsevier

Pattern Recognition

Volume 128, August 2022, 108670
Pattern Recognition

Preserving similarity order for unsupervised clustering

https://doi.org/10.1016/j.patcog.2022.108670Get rights and content

Highlights

  • Our method takes the ordering of pairwise distance as the supervisory signal to learn the similarity score function.

  • Our similarity score function captures both local structure and global structure of the data sample distribution.

  • We propose a simple but effective strategy to identify the boundary samples from a given dataset.

Abstract

Unsupervised clustering categorizes a sample set into several groups, where the samples in the same group share high-level concepts. As the clustering performances are heavily determined by the metric to assess the similarity between sample pairs, we propose to learn a deep similarity score function and use it to capture the correlations between sample pairs for improved clustering. We formulate the learning procedure in a ranking framework and introduce two new supervisory signals to train our model. Specifically, we train the similarity score function to guarantee 1) a sample should have a higher level of similarity with its nearest neighbors than others in order to achieve correct clustering, and 2) the ordering of the similarity between neighboring sample pairs should be preserved in order to achieve robust clustering. To this end, we not only study the relevance between neighboring sample pairs for local structure learning, but also study the relevance between each sample and the boundary samples for global structure learning. Extensive experiments on seven public available datasets validate the effectiveness of our proposed framework, including face image clustering, object image clustering, and real-world image clustering.

Introduction

Unsupervised machine learning techniques mine knowledge from datasets where no sample labels are available. These techniques are widely applied in a range of applications, such as visual similarity analysis [1], domain adaptation [2], surveillance [3], and image retrieval [4]. Clustering is one of the most popular unsupervised learning tasks in artificial intelligence and computer vision [5], [6]. It categorizes a collection of unlabeled data samples into several groups, where the samples in the same group are expected to share high-level concepts. Clustering techniques are not only widely used in data analytics, but also applicable in automatic data labeling and data visualization.

Due to the curse of dimensionality, some popular clustering techniques, including K-means and Gaussian mixture model, are not directly applicable to high-dimensional data samples. To solve this problem, many unsupervised dimension reduction or low-dimensional representation learning techniques are proposed to reveal the intrinsic structure of a given dataset, such as multi-dimensional scaling (MDS) [7], locally linear embedding (LLE) [8], and spectral embedding (SE) [9]. These subspace learning techniques can improve the clustering performances by learning a more clustering-friendly representation space. In other words, these methods enhance the intra-cluster similarity and the inter-cluster dissimilarity simultaneously in the representation space.

In order to learn proper low-level sample representations, many unsupervised dataset analysis techniques define their objective functions based on the pairwise difference [7], [8], [9], i.e., yiyj, where yi and yj are embeddings of two data samples (Section 2.1 provides more details). Thus, we can consider the pairwise difference as the implicit supervisory signal in representation learning. The importance of the pairwise difference or pairwise distance in unsupervised data analysis lies in its capability in revealing the relationships among the data samples. In a clustering task, however, the ordering of the pairwise distances directly influences the final clustering results, and is more important than the values themselves in most cases. Yet to the best of our knowledge, the existing research has not taken into consideration the ordering of pairwise distances for unsupervised clustering. To this end, we propose to learn a score function in a ranking framework based on the ordering of pairwise distances. The learned similarity score function is catered for the dataset and able to reveal the intrinsic dataset structure.

The existing methods [7], [8], [9] are also limited to low-level features and cannot discover the deep correlation between samples. Driven by the tremendous success of deep neural networks in various computer vision tasks [10], [11], researchers propose to boost the clustering performance with deep representations [12], [13]. In comparison with the supervised deep learning, the training procedures of these unsupervised deep learning methods [14], [15] are more difficult due to the non-availability of sample labels. To alleviate this difficulty, a number of supervisory signals are proposed, such as the soft cluster label [14] and K-means-friendliness of the representations [15]. Researchers also guide the training procedure of convolutional neural networks with k-means [16] and constrained dominate sets [17]. Different from the existing deep clustering methods, the proposed representation learning method is formulated in a ranking framework and adopts the ordering of pairwise distances as the supervisory signal. In summary, our proposed method maps the data samples into an embedding space where 1) the ordering of similarities between neighboring sample pairs is preserved and 2) the similarities between neighboring sample pairs are larger than those between distant sample pairs.

In our ranking framework for score function learning, we define the relevant set for each data sample to be the union of a neighboring sample set and a boundary sample set. While we can easily find the neighboring samples for a given sample, few methods exist for boundary sample discovery. To this end, we propose a simple but effective method to identify the boundary samples and provide our theoretical analysis. While the pairwise relevances between a sample and its neighbors capture the local structure, the ones between a sample and the boundary samples capture the global structure. In this way, we are able to exploit both the local structure and the global structure in our proposed unsupervised clustering.

We highlight our contributions as follows:

  • Unlike the existing works that take the values of the pairwise distance as the supervisory signals, our proposed takes the ordering of these values, which can directly influence the clustering results, as the supervisory signal.

  • In comparison with the existing state of the arts, our proposed method not only captures the local structure by the relevance between a sample and its neighbors, but also captures the global structure by the relevance between a sample and the boundary samples.

  • Inspired from the observation that a boundary sample is more separable from its neighbors, we propose a simple but effective strategy to identify the boundary samples from a given dataset.

The rest of this paper is organized as follows. Section 2 presents the related works on unsupervised dataset analysis and learning to rank. Section 3 proposes our method on deep clustering. Section 4 shows how we identify the relevant sample set. Section 5 conducts experiments to evaluate the proposed method and Section 6 concludes this paper.

Section snippets

Related work

To pave the way for our proposed research, we briefly overview the existing relevant work on unsupervised dataset analysis, and reformulate their objective functions to show the importance of pairwise differences. In addition, we also present the related work on learning to rank.

Approach

As detailed in Section 2.1, the existing unsupervised data analysis approaches guide the data sample embedding procedure with the pairwise difference between samples. However, our extensive literature survey reveals that none of the existing work has explicitly studied the ordering of pairwise similarities (or distances) in unsupervised clustering. We explicitly take into consideration the ordering information of the pairwise similarities and formulate them in a ranking framework for embedding

Relevant Sample Set

To reveal both the local and global structure with the learned score function, we can simply consider that a data sample xi is relevant to the whole sample set, i.e., Rxi=X. However, it will be computationally expensive for large datasets and it is difficult to disentangle the local structure from the global structure. For efficient and effective learning, we consider that a sample xi is relevant to the union of two sets, i.e., Rxi=N(xi)Ω(X), with the neighboring sample set N(xi) to reveal the

Experiments

To evaluate our proposed methods, we carry out extensive experiments in a number of phases. In the first phase, we show the effectiveness of our boundary sample discovery method on synthetic datasets and object image datasets. In the second phase, we conduct image clustering on three different tasks, including face image clustering, object image clustering, and real-world image clustering. We also analyze the influence of the size of relevant sample set and the initial representations on the

Conclusion and future work

In this paper, we propose a new method to learn a similarity score function and achieve improved performances for unsupervised clustering. Compared with the existing state of the arts, the novelty and the value of our proposed can be validated by three original contributions: (i) an ordering of pairwise differences is introduced as the supervisory signal; (ii) not only the relevance between neighboring sample pairs is considered to capture the local structure, but also the relevance between

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

The authors wish to acknowledge the financial support from: (i) Natural Science Foundation China (NSFC) under the Grant no. 62032015; and (ii) Natural Science Foundation China (NSFC) under the Grant no. 62172285.

Jinghua Wang received the BEng degree from Shandong University, China, in 2005, the MS degree from the Harbin Institute of Technology, China, in 2009, and the PhD degree from The Hong Kong Polytechnic University, Hong Kong, in 2013. He is currently an Assistant Professor with College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China. His current research interests include computer vision and machine learning.

References (41)

  • A. Gupta et al.

    Parameterized principal component analysis

    Pattern Recognit.

    (2018)
  • Y. Zhang et al.

    Semi-supervised local multi-manifold isomap by linear embedding for feature extraction

    Pattern Recognit.

    (2018)
  • X. Shi et al.

    Pairwise based deep ranking hashing for histopathology image classification and retrieval

    Pattern Recognit.

    (2018)
  • A.Y. Ng et al.

    On spectral clustering: analysis and an algorithm

    NIPS

    (2001)
  • G.E. Hinton et al.

    Reducing the dimensionality of data with neural networks

    Science

    (2006)
  • J. Wang et al.

    An unsupervised deep learning framework via integrated optimization of representation learning and GMM-based modeling

    14th Asian Conference on Computer Vision

    (2018)
  • P. Zhou et al.

    Deep adversarial subspace clustering

    CVPR

    (2018)
  • J. Xie et al.

    Unsupervised deep embedding for clustering analysis

    ICML

    (2016)
  • K.G. Dizaji et al.

    Deep clustering via joint convolutional autoencoder embedding and relative entropy minimization

    ICCV

    (2017)
  • H. Fan et al.

    Unsupervised person re-identification: clustering and fine-tuning

    ACM Trans. Multimedia Comput.Commun. Appl.

    (2017)
  • Cited by (2)

    Jinghua Wang received the BEng degree from Shandong University, China, in 2005, the MS degree from the Harbin Institute of Technology, China, in 2009, and the PhD degree from The Hong Kong Polytechnic University, Hong Kong, in 2013. He is currently an Assistant Professor with College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China. His current research interests include computer vision and machine learning.

    Li Wang received the BE degree from Southeast University, China, in 2006 and the ME degree from Shanghai Jiao Tong University, China, in 2009. He received the PhD degree from School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, in 2016. Currently, he is a research scientist at Institute for Infocomm Research, A*STAR, Singapore. His research interests include deep learning, computer vision and image processing.

    Jianmin Jiang received PhD from the University of Nottingham, UK, in 1994. From 1997 to 2001, he worked as a full professor of Computing at the University of Glamorgan, Wales, UK. In 2002, he joined the University of Bradford, UK, as a Chair Professor of Digital Media, and Director of Digital Media & Systems Research Institute. He worked at the University of Surrey, UK, as a full professor during 2010–2014 and a distinguished professor (1000-plan) at Tianjin University, China, during 2010–2013. He is currently a Distinguished Professor and director of the Research Institute for Future Media Computing at the College of Computer Science & Software Engineering, Shenzhen University, China. He was a chartered engineer, fellow of IEE, fellow of RSA, member of EPSRC College in the UK, and EU FP-6/7 evaluator. His research interests include, image/video processing in compressed domain, digital video coding, medical imaging, computer graphics, machine learning and AI applications in digital media processing, retrieval and analysis. He has published around 400 refereed research papers.

    View full text