Elsevier

Pattern Recognition Letters

Volume 151, November 2021, Pages 88-94
Pattern Recognition Letters

Image deep clustering based on local-topology embedding

https://doi.org/10.1016/j.patrec.2021.08.004Get rights and content

Highlights

  • The feature representation consists of two parts: data and its local-topology information.

  • We propose a replacement strategy to find local-topology representation of data.

  • Data augmentation is applied to improve the generalization performance of the model.

  • A two-stage image deep clustering algorithm is presented based on local-topology embedding called ITEC.

Abstract

Reasonable feature representation plays an important role in improving the performance of clustering algorithms. However, recent deep clustering studies only focusing on feature representation at the pixel level leads to feature representation with low discrimination. Our key insight is that considering local-topology information between images would help to get a highly discriminative representation, and therefore we design a replacement strategy to find local-topology representation of data, and propose a two-stage image deep clustering algorithm based on local-topology embedding called ITEC. Specifically, we take advantage of data augmentation technique to improve the generalization performance of the learning models; then local-topology representation of data is embedded into the representation of data itself, so as to better complete tasks of image clustering. Extensive experiments demonstrate that local-topology information effectively promotes the performance of deep clustering significantly.

Introduction

The popularization of smart terminal devices and the development of social networks produce massive valuable data, which are however unlabeled or unorganized. Unsupervised learning, which includes tasks such as cluster analysis, density estimation, anomaly monitoring, can train a machine to learn valuable information from unlabeled data. Clustering analysis is an extensively studied field of unsupervised learning [1], [2], [15], [16]. Traditional clustering algorithms are mostly suitable for low-dimensional, structured data, but perform poorly on high-dimensional unstructured data since they fail to obtain reasonable feature representation. Feature representation is a crucial step since what is deemed as a good clustering result is feature-dependent. These challenges can be solved easily by the image deep clustering technique.

In recent years, the research of deep clustering has gained much attention and has made many ground-breaking improvements [3], [4]. Recent clustering [5], [32], [33], [35] methods use the expressive power of Deep Neural Networks (DNNs) or autoencoders to enhance the results under traditional clustering objectives. The model maps original input space into a new feature space and learns feature autonomously from data. However, they learn feature only from the individual samples in the dataset, while the topology between images are ignored [6]. In the field of graph neural network, it focuses on mining topology information [37], [38]. There may be some commonalities between connected points in the topology. In order to extract more comprehensive feature representation from original data, we try to combine the two features in deep clustering tasks.

The main contributions of this work are summarized as follows:

1.We establish the link between the feature of data itself and the representation of local-topology information, which few people pay attention to in deep clustering.

2. Data augmentation technique and replacement strategy make deep clustering more efficient.

3.We adopt a two-stage deep clustering algorithm ITEC, and experimental results verify its feasibility.

Section snippets

Related work

The success of DNN provides a new approach to the clustering research, called deep clustering, which originally combines general clustering objectives with deep learning techniques. The classical deep clustering algorithms used autoencoder to carry out nonlinear mapping to data and then completed clustering tasks [4], [7], [8]. In the early days, domain exports proposed the stacked autoencoder [3], [9] to learn data feature, which however requires pretraining at each layer that is

Local-topology embedding based image deep clustering model

In the following sub-sections, we mainly introduce the details of ITEC algorithm. The framework of the model is shown in Fig. 1, consisting of two blocks. The feature learning of data itself and the feature learning of local topological information. Assume that X={x1,x2,,xn} is an image dataset, in which the number of samples is n and the sample can be expressed as xRd. The image clustering task divides image dataset X into k clusters, in which images in the same cluster are as similar as

Dataset and implementation details

We compare ITEC with several baselines on four widely used benchmark datasets: mnist-full [22], mnist-test, usps [23], fashion [24]. The dataset statistics are summarized in Table 3. The visual presentation of datasets is shown in Fig. 5.

The network structure of feature learning model of data itself is

conv564conv5128conv3256flattendense10densereshapedeconv3128deconv564deconv51

The superscript represents the number of convolution kernels, and the subscript represents the size of

Conclusion

In this study, we introduce a local-topology information representation learning model and a local-topology embedding based image deep clustering algorithm (ITEC) to deal with image clustering tasks. Experimental results show that ITEC has a significant improvement in performance compared with the state-of-the-art image clustering algorithms. The feasibility of the two-stage image deep clustering algorithm is verified. Our work may generate possible future works. For instance, local-topology

Declaration of Competing Interest

None.

Acknowledgements

This work was supported by the National Natural Science Foundation of China (61672332), the Key R&D program ( international science and technology cooperation project) of Shanxi Province, China (No. 201903D421003), the Program for the San Jin Young Scholars of Shanxi.

References (38)

  • E. Aljalbout, V. Golkov, Y. Siddiqui, M. Strobel and D. Cremers, Clustering with deep learning: taxonomy and new...
  • F. Li et al.

    Discriminatively boosted image clustering with fully convolutional auto-encoders

    Patter Recognit.

    (2017)
  • X. Guo et al.

    Adaptive self-paced deep clustering with data augmentation

    IEEE Trans. Knowl. Data Eng.

    (2020)
  • Y. Yan et al.

    Image clustering via deep embedded dimensionality reduction and probability-based triplet loss

    IEEE Trans. Image Process.

    (2020)
  • D. Xia et al.

    Late fusion multi-view clustering based on local multi-kernel learning

    J. Comput. Res. Dev.

    (2020)
  • F. Li et al.

    Clustering method based on samples stability

    Sci. Sin. Inf.

    (2020)
  • F. Li et al.

    Clustering ensemble based on sample’s stability

    Artif. Intell.

    (2019)
  • Y. Qian et al.

    Space structure and clustering of categorical data

    IEEE Trans. Neural Netw. Learn. Syst.

    (2017)
  • J. Gurin, B. Boots, Improving image clustering with multiple pretrained cnn feature extractors, arXiv:1807.07760...
  • Cited by (5)

    • Generic network for domain adaptation based on self-supervised learning and deep clustering

      2022, Neurocomputing
      Citation Excerpt :

      A well semantically clustered representation should have high similarities within clusters compared to low similarities across clusters. One group of deep clustering methods aim to minimize the KL divergence between the cluster assignment embedding and a target distribution, ensuring more accurate and stable clusters [28–30]. These methods are based on K-means mechanisms to select feature centroids, which are susceptible to degenerate solutions [29].

    • Adaptive Projected Clustering with Graph Regularization

      2022, Proceedings - International Conference on Pattern Recognition

    Editor: ”Jiwen Lu”.

    View full text