Image deep clustering based on local-topology embedding☆
Introduction
The popularization of smart terminal devices and the development of social networks produce massive valuable data, which are however unlabeled or unorganized. Unsupervised learning, which includes tasks such as cluster analysis, density estimation, anomaly monitoring, can train a machine to learn valuable information from unlabeled data. Clustering analysis is an extensively studied field of unsupervised learning [1], [2], [15], [16]. Traditional clustering algorithms are mostly suitable for low-dimensional, structured data, but perform poorly on high-dimensional unstructured data since they fail to obtain reasonable feature representation. Feature representation is a crucial step since what is deemed as a good clustering result is feature-dependent. These challenges can be solved easily by the image deep clustering technique.
In recent years, the research of deep clustering has gained much attention and has made many ground-breaking improvements [3], [4]. Recent clustering [5], [32], [33], [35] methods use the expressive power of Deep Neural Networks (DNNs) or autoencoders to enhance the results under traditional clustering objectives. The model maps original input space into a new feature space and learns feature autonomously from data. However, they learn feature only from the individual samples in the dataset, while the topology between images are ignored [6]. In the field of graph neural network, it focuses on mining topology information [37], [38]. There may be some commonalities between connected points in the topology. In order to extract more comprehensive feature representation from original data, we try to combine the two features in deep clustering tasks.
The main contributions of this work are summarized as follows:
1.We establish the link between the feature of data itself and the representation of local-topology information, which few people pay attention to in deep clustering.
2. Data augmentation technique and replacement strategy make deep clustering more efficient.
3.We adopt a two-stage deep clustering algorithm ITEC, and experimental results verify its feasibility.
Section snippets
Related work
The success of DNN provides a new approach to the clustering research, called deep clustering, which originally combines general clustering objectives with deep learning techniques. The classical deep clustering algorithms used autoencoder to carry out nonlinear mapping to data and then completed clustering tasks [4], [7], [8]. In the early days, domain exports proposed the stacked autoencoder [3], [9] to learn data feature, which however requires pretraining at each layer that is
Local-topology embedding based image deep clustering model
In the following sub-sections, we mainly introduce the details of ITEC algorithm. The framework of the model is shown in Fig. 1, consisting of two blocks. The feature learning of data itself and the feature learning of local topological information. Assume that is an image dataset, in which the number of samples is and the sample can be expressed as . The image clustering task divides image dataset into clusters, in which images in the same cluster are as similar as
Dataset and implementation details
We compare ITEC with several baselines on four widely used benchmark datasets: mnist-full [22], mnist-test, usps [23], fashion [24]. The dataset statistics are summarized in Table 3. The visual presentation of datasets is shown in Fig. 5.
The network structure of feature learning model of data itself is
The superscript represents the number of convolution kernels, and the subscript represents the size of
Conclusion
In this study, we introduce a local-topology information representation learning model and a local-topology embedding based image deep clustering algorithm (ITEC) to deal with image clustering tasks. Experimental results show that ITEC has a significant improvement in performance compared with the state-of-the-art image clustering algorithms. The feasibility of the two-stage image deep clustering algorithm is verified. Our work may generate possible future works. For instance, local-topology
Declaration of Competing Interest
None.
Acknowledgements
This work was supported by the National Natural Science Foundation of China (61672332), the Key R&D program ( international science and technology cooperation project) of Shanxi Province, China (No. 201903D421003), the Program for the San Jin Young Scholars of Shanxi.
References (38)
- et al.
Infogan: interpretable representation learning by information maximizing generative adversarial nets
in: NIPS
(2016) - et al.
Gradient-based learning applied to document recognition
Proc. IEEE
(1998) - et al.
Algorithm AS 136: a k-means clustering algorithm
J. R. Stat. Soc.
(1979) Machine Learning
(2016)- et al.
Unsupervised deep embedding for clustering analysis
ICML
(2016) - et al.
Auto-encoder based data clustering
in: CIARP
(2013) - et al.
Joint unsupervised learning of deep representations and image clusters, 5147–5156
in: CVPR
(2016) - et al.
Structural deep network embedding
in: KDD
(2016) - et al.
Deep clustering with convolutional autoencoders
ICONIP 373–382
(2017) - et al.
A deep convolutional auto-encoder with embedded clustering
in: ICIP
(2018)
Discriminatively boosted image clustering with fully convolutional auto-encoders
Patter Recognit.
Adaptive self-paced deep clustering with data augmentation
IEEE Trans. Knowl. Data Eng.
Image clustering via deep embedded dimensionality reduction and probability-based triplet loss
IEEE Trans. Image Process.
Late fusion multi-view clustering based on local multi-kernel learning
J. Comput. Res. Dev.
Clustering method based on samples stability
Sci. Sin. Inf.
Clustering ensemble based on sample’s stability
Artif. Intell.
Space structure and clustering of categorical data
IEEE Trans. Neural Netw. Learn. Syst.
Cited by (5)
Feature learning based on connectivity estimation for unbiased mammography mass classification
2024, Computer Vision and Image UnderstandingGeneric network for domain adaptation based on self-supervised learning and deep clustering
2022, NeurocomputingCitation Excerpt :A well semantically clustered representation should have high similarities within clusters compared to low similarities across clusters. One group of deep clustering methods aim to minimize the KL divergence between the cluster assignment embedding and a target distribution, ensuring more accurate and stable clusters [28–30]. These methods are based on K-means mechanisms to select feature centroids, which are susceptible to degenerate solutions [29].
Adaptive Projected Clustering with Graph Regularization
2022, Proceedings - International Conference on Pattern Recognition
- ☆
Editor: ”Jiwen Lu”.