Deep soft clustering: simultaneous deep embedding and soft-partition clustering

Li, Kang; Ni, Tongguang; Xue, Jing; Jiang, Yizhang

doi:10.1007/s12652-021-02997-1

Deep soft clustering: simultaneous deep embedding and soft-partition clustering

Original Research
Published: 13 March 2021

Volume 14, pages 5581–5593, (2023)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Kang Li¹,
Tongguang Ni²,
Jing Xue³ &
…
Yizhang Jiang¹

689 Accesses
7 Citations
Explore all metrics

Abstract

Traditional clustering methods are not very effective when dealing with high-dimensional and huge datasets. Even if there are some traditional dimensionality reduction methods such as Principal components analysis (PCA), Linear discriminant analysis (LDA) and T-distributed stochastic neighbor embedding (T-SNE), they still can not significantly improve the effect of the clustering algorithm in this scenario. Recent studies have combined Non-linear dimensionality reduction achieved by deep neural networks with hard-partition clustering, and have achieved reliability results, but these methods can not update the parameters of dimensionality reduction and clustering at the same time. We found that soft-partition clustering can be well combined with deep embedding, and the membership of Fuzzy c-means (FCM) can solve the problem that gradient descent can not be implemented because the assignment process of the hard-partition clustering algorithm is discrete, so that the algorithm can update the parameters of deep neural network (DNN) and cluster centroids at the same time. We build an continuous objective function that combine the soft-partition clustering with deep embedding, so that the learning representations can be cluster-friendly. The experimental results show that our proposed method of simultaneously optimizing the parameters of deep dimensionality reduction and clustering is better than the method with separate optimization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Maintaining Consistency with Constraints: A Constrained Deep Clustering Method

Efficient regularized spectral data embedding

Article 24 February 2020

Deep Embedded Clustering with Random Projection Penalty

References

Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceed ings of the 18thannual ACM-SIAM symposium on Discretealgorithms, pp 1027
Baldi P, Hornik K (1989) Neural networks and principal com-ponent analysis: Learning from examples without local min-ima.Neural networks, 2(1):53–58
Bezdek JC, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10(2–3):191–203
Article Google Scholar
Celeux G, Govaert G (1992) A classification em algorithm for clustering and two615stochastic versions. Comput Stat Data Anal 14(3):315–333
Article MATH Google Scholar
Chen X, Zhou Q, Lan R, et al (2020) Sensorineural hearing loss classification via deep-HLNet and few-shot learning. Multim Tools Appl: 1–14
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets[J]. J Mach Learn Res: 1–30.
Dubois D, Prade H (1988) Fuzzy sets and systems. Academic Press, New York
MATH Google Scholar
Fard MM, Thonet T, Gaussier E (2020) Deep k-means: jointly clustering with k-means and learning representations. Pattern Recogn Lett 138:185–192
Article Google Scholar
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188
Article Google Scholar
Garcia S, Herrera F (2008) An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons. J Mach Learn Res: 2677–2694.
Guo X, Gao L, Liu X, Yin J (2017) Improved deep embedded clustering with local structure preservation. In: Proceedings of IJCAI, IJCAI ’17, pp 1753–1759
Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Educat Psychol 24:417–441
Article MATH Google Scholar
Ji P, Zhang T, Li H, Salzmann M, Reid I (2017) Deep subspace clustering networks. In: Pro ceedings of NIPS, NIPS ’17, pp 23–32
Jiang YZ, Chung FL, Wang ST, Deng ZH, Wang J, Qian PJ (2015) Collaborative fuzzy cluster- ing from multiple weighted views. IEEE Trans Cybern 45(4):688–701
Article Google Scholar
Jiang YZ, Deng ZH, Chung FL, Wang GJ, Qian PJ, Choi KS, Wang ST (2017a) Recognition of epileptic EEG signals using a novel multi-view TSK fuzzy sys-tem. IEEE Trans Fuzzy Systems 25(1):3–20
Article Google Scholar
Jiang YZ, Wu DR, Deng ZH, Qian PJ, Wang J, Wang GJ, Chung FL, Choi KS, Wang ST (2017) Seizure classification from eeg signals using transfer learning, semi-supervised learning and TSK fuzzy system. IEEE Trans Neural Syst Rehabili Eng 25(12): 2270–2284
Jiang YZ, Gu XQ, Wu DR, Hang WL, Xue J, Qiu S, Lin CT (2019a) A novel negative-transfer-resistant fuzzy cluster-ing model with a shared cross-domain transfer latent space and its application to brain CT Image segmentation. IEEE/ACM Transa Comput Biol Bioinform. https://doi.org/10.1109/TCBB.2019.2963873
Article Google Scholar
Jiang YZ, Zhao KF, Xia KJ, Xue J, Zhou LY, Ding Y, Qian PJ (2019) a novel distributed multi task fuzzy clustering algorithm for automatic MR brain image segmen-tation. J Med Syst 43(5): 118:1–118:9
Jiang YZ, Zhang YP, Lin C, Wu DR, Lin CT (2020) EEG-based driver drowsiness estimation using an online multi-view and transfer TSK fuzzy system. IEEE Trans Intell Transp Syst. https://doi.org/10.1109/TITS.2020.2973673
Article Google Scholar
Jolliffe IT (2002) Principal Component Analysis, 2nd edn. Springer Verlag
MATH Google Scholar
Karayiannis NB (1994) MECA: maximum entropy clustering algorithm. In: Proceedings of the IEEE International Conference on Fuzzy System, Orlando, pp. 630–635
Kavukcuoglu K, Fergus R, LeCun Y, et al (2009) Learning invari-ant features through topographicfilter maps. InComputer Vi-sion and Pattern Recognition, 2009. CVPR 2009. IEEE Con-ference on, pages 1605–1612. IEEE
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenetclassification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Kuhn HW (1955) The Hungarian method for the assignment problem[J]. Naval Res Log Quart 2(1–2):83–97
Article MathSciNet Google Scholar
Lee H, Pham P, Largman Y, Ng AY (2009) Unsupervisedfeature learning for audio classification using convolutionaldeep belief networks. In: Advances in neural information pro-cessing systems, pp 1096–1104
Li R, Mukaidono M (1995) A maximum-entropy approach to fuzzy clustering. In: Proceedings on IEEE International Conference on Fuzzy System, pp 2227–2232
Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inform Theor 28(2):129–137
Article MathSciNet MATH Google Scholar
Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605
MATH Google Scholar
MacQueen JB (1967), Some methods for classification and analysis of multivariate observa tions. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297
Miyamoto S, Umayahara K (1998) Fuzzy clustering by quadratic regularization. In: Proceedings of the 1998 IEEE International Conference on Fuzzy Systems and IEEE World Congress on Computational Intelligence, pp. 1394–1399
Miyamoto S, Ichihashi H, Honda K (2008) Algorithms for Fuzzy Clustering. Springer, Berlin
MATH Google Scholar
Pal NR, Pal K, Keller JM, Bezdek JC (2005) A possibilistic fuzzy c-means clustering algorithm. IEEE Trans Fuzzy Syst 13(4):517–530
Article Google Scholar
Peng X, Xiao S, Feng J, Yau WY, Yi Z (2016) Deep subspace clustering with sparsity prior. In: Proceedings of IJCAI, IJCAI ’16, pp 1925–1931
Peng X, Feng J, Lu J, Yau Wy, Yi Z (2017) Cascade subspace clustering. In: Proceedings of AAAI,AAAI ’17, pp 2478–2484
Rifai S, Vincent P, Muller X, Glorot X, Bengio Y (2011) Contractive auto-encoders: explicit invariance during feature extraction. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 833–840
Rumelhart DE, Hinton GE, Williams RJ (2012) Learning representations by back-propagating errors. Cogn Model 5(3):1
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint 1409.1556
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–9
Valdivia A, Martínez-Cámara E, Chaturvedi I, Luzón MV, Cambria E, Ong YS, Herrera F (2020) What do people think about this monument? Understanding negative reviews via deep learning, clustering and descriptive rules. J Ambient Intell Hum Comput 11(1):39–52
Article Google Scholar
Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust fea tures with denoising au-toencoders. In: Proceedings of the 25th international confer-ence on Machine learning, pp 1096–1103. ACM
Wang SH, Govindaraj VV, Górriz JM, Zhang X, Zhang YD (2020) Covid-19 classification by FGCNet with deep feature fusion from graph convolutional network and convolutional neural network. Inform Fus 67:208–229
Article Google Scholar
Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: Proceedings of ICML, ICML ’16, pp 478–487
Yang B, Fu X, Sidiropoulos ND, Hong M (2017) Towards K-means friendly spaces: simulta neous deep learning and clustering. In: Proceedings of ICML, ICML ’17, pp 3861–3870
Yang JC, Shi Rui, Ni BB (2020) MedMNIST classification decathlon: a lightweight AutoML benchmark for medical image analysis. arXiv preprint 2010.14925
Yao X, Wang X, Wang S et al (2020) A comprehensive survey on convolutional neural network in medical image analysis. Multim Tools Appl. https://doi.org/10.1007/s11042-020-09634-7
Article Google Scholar
Yolcu G, Oztel I, Kazan S, Oz C, Bunyak F (2020) Deep learning-based face analysis system for monitoring customer interest. J Ambient Intell Hum Comp 11(1):237–248
Article Google Scholar
Zadeh LA (1965) Fuzzy sets. Inform Control 8(3):338–353
Article MATH Google Scholar
Zhang YD, Dong Z, Wang SH, Yu X, Yao X, Zhou Q, Martinez FJ (2020) Advances in mul timodal data fusion in neuroimaging: Overview, challenges, and novel orientation. Inform Fus 64:149–187
Article Google Scholar
Zhang Y, Guttery DS, Wang SH (2020) Abnormal breast detection by an improved AlexNet model. Ann Oncol 31:S277
Article Google Scholar
Zhang YD, Satapathy SC, Zhu LY, et al (2020) A seven-layer convolutional neural network for chest CT based COVID-19 diagnosis using stochastic pooling. IEEE Sens J
Zhang YP, Wang SH, Xia KJ, Jiang YZ, Qian PJ (2021) Alzheimer’s disease multiclass diagno sis via multimodal neu-roimaging embedding feature selection and fusion. Inform Fus 66:170–183
Article Google Scholar

Download references

Acknowledgement

This work was supported in part by the National Natural Science Foundation of China under Grants 61702225 and 61806026, by the Natural Science Foundation of Jiangsu Province under Grant BK 20180956, and in part by Six Talent Peaks Project of Jiangsu Province under Grant XYDXX-127, and by the Jiangsu Committee of Health under Grant H2018071.

Author information

Authors and Affiliations

School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214122, China
Kang Li & Yizhang Jiang
School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou, 213164, China
Tongguang Ni
Department of Nephrology, the Affiliated Wuxi People’s Hospital of Nanjing Medical University, Wuxi, 214023, China
Jing Xue

Authors

Kang Li
View author publications
You can also search for this author in PubMed Google Scholar
Tongguang Ni
View author publications
You can also search for this author in PubMed Google Scholar
Jing Xue
View author publications
You can also search for this author in PubMed Google Scholar
Yizhang Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yizhang Jiang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Evaluation measures

We used the standard unsupervised measures to evaluate the performance of clustering algorithm: the clustering accuracy (ACC). For all algorithms we set the number of ground-truth categories as the number of clusters, then evaluate performance with ACC:

$$ ACC(C,S) = \mathop {\max }\limits_{\psi } \frac{1}{N}\sum\limits_{i = 1}^{N} {\chi \left\{ {s_{i} = \psi (c_{i} )} \right\}} $$

where $s_{i}$ is the groun-truth classes,$c_{i}$ is the obtained cluster, Where $\chi$ is the indicator function: $\chi \{ true\} = 1$ and $\chi \{ false\} = 0$;$\psi$ ranges over all possible one-to-one mapping between the obtained clusters and groud-truth class and this match is based on the Hungarian algorithm (Kuhn 1955).

In addition, we used Normalized Mutual Information (NMI) which is an information-thoretic measure that based on the mutual information:

$$ NMI(\Omega ,C) = \frac{I(\Omega ;C)}{{(H(\Omega ) + H(C))/2}} $$

where $I$ represents mutual information, $H$ is entropy

$$ \begin{gathered} I(\Omega ,C) = \sum\limits_{k} {\sum\limits_{j} {P(w_{k} \cap c_{j} )\log \frac{{P(w_{k} \cap c_{j} )}}{{P(w_{k} )P(c_{j} )}}} } \hfill \\ =\sum\limits_{k} {\sum\limits_{j} {\frac{{\left| {w_{k} \cap c_{j} } \right|}}{N}\log \frac{{N\left| {w_{k} \cap c_{j} } \right|}}{{\left| {w_{k} } \right|\left| {c_{j} } \right|}}} } \hfill \\ \end{gathered} $$

where $P(w_{k} ),P(c_{j} ),P(w_{k} \cap c_{j} )$ can be regarded as the probability that the data instance belongs to the cluster $w_{k}$, belongs to the ground-truth class $c_{j}$, and belongs to both at the same time. The second equivalent formula is derived from the maximum likelihood estimate of the probability.

$$ \begin{gathered} H(\Omega ) = - \sum\limits_{k} {P(w_{k} )\log P(w_{k} )} \hfill \\ = - \sum\limits_{k} {\frac{{\left| {w_{k} } \right|}}{N}\log \frac{{\left| {w_{k} } \right|}}{N}} \hfill \\ \end{gathered} $$

Appendix B: The boundary significance of the weighted index $m$

For the FCM algorithm where m belongs to the interval $[1,\infty )$, there are the following situations:

When $m = 1$, FCM algorithm degenerates into KM algorithm.

When $m \to 1^{ + }$, FCM degenerates to KM algorithm with probability 1.

When $m \to + \infty$, FCM loses its division characteristics, and $U = [\mu_{ik} ] = [\frac{1}{c}]$.

Proof

1.
$m \to 1^{ + }$ $\because d_{ik} = \left\| {x_{k} - v_{i} } \right\|_{A}^{2} \, \therefore {\text{d}}_{k} \ge 0$

When d = 0, we can get.

$$ \mu_{ik} = \left\{ {\begin{array}{*{20}c} {1,{\text{ d}}_{ik} = 0 = \min_{j} \left\{ {d_{jk} } \right\}} \\ {0,{\text{ other}}s} \\ \end{array} } \right.;{ 1} \le i \le c, \, 1 \le k \le n $$

when $\forall d_{ik} \ne 0;1 \le i \le c,$ let $d^{(k)}_{\min } = \mathop {\min }\limits_{1 \le j \le c} \left\{ {d_{jk} } \right\} \ne 0,$ we can get

$$ \mu_{ik} = \frac{{\left( {\frac{1}{{d_{jk} }}} \right)^{{\frac{1}{m - 1}}} }}{{\sum\limits_{p = 1}^{c} {\left( {\frac{1}{{d_{pk} }}} \right)^{{\frac{1}{m - 1}}} } }} = \frac{{\left( {\frac{{d^{(k)}_{\min } }}{{d_{ik} }}} \right)^{{\frac{1}{m - 1}}} }}{{\sum\limits_{p = 1}^{c} {\left( {\frac{{d^{(k)}_{\min } }}{{d_{pk} }}} \right)^{{\frac{1}{m - 1}}} } }} $$

$$ \because d_{\min }^{(k)} = \mathop {\min }\limits_{1 \le j \le c} \left\{ {d_{jk} } \right\}\therefore 0 < \frac{{d^{(k)}_{\min } }}{{d_{ik} }} \le 1 $$

When $d_{ik} = d_{\min }^{(k)}$,$\frac{{d_{\min }^{(k)} }}{{d_{ik} }} = 1$,we can get

$$ \mathop {\lim }\limits_{{m\mathop{\longrightarrow}\limits^{ + }1}} \left( {\frac{{d_{\min }^{(k)} }}{{d_{ik} }}} \right)^{{\frac{1}{m - 1}}} = \mathop {\lim }\limits_{{m\mathop{\longrightarrow}\limits^{ + }1}} 1^{{\frac{1}{m - 1}}} = 1^{ + \infty } = 1 $$

When $d_{ik} \ne d_{\min }^{(k)}$,$\frac{{d_{\min }^{(k)} }}{{d_{ik} }} < 1$,we can get

$$ \mathop {\lim }\limits_{{m\mathop{\longrightarrow}\limits^{ + }1}} \left( {\frac{{d_{\min }^{(k)} }}{{d_{ik} }}} \right)^{{\frac{1}{m - 1}}} = \left( {\frac{{d_{\min }^{(k)} }}{{d_{ik} }}} \right)^{ + \infty } = 0 $$

Obviously, when $m = 1$, the FCM algorithm degenerates from softening to hardening, which is equivalent to K-means.

2.
$m \to + \infty$ $\because d_{ik} = \left\| {x_{k} - v_{i} } \right\|_{A}^{2} \, \therefore {\text{d}}_{k} \ge 0$

We do not consider the special case of large $\exists d_{ik} = 0$,so $\forall d_{ik} \ne 0;1 \le i \le c$,let $d_{\max }^{(k)} = \mathop {\max }\limits_{1 \le j \le c} \left\{ {d_{jk} } \right\} \ne 0$,we can get

$$ \mu_{ik} = \frac{{\left( {\frac{1}{{d_{jk} }}} \right)^{{\frac{1}{m - 1}}} }}{{\sum\limits_{p = 1}^{c} {\left( {\frac{1}{{d_{pk} }}} \right)^{{\frac{1}{m - 1}}} } }} = \frac{{\left( {\frac{{d^{(k)}_{\max } }}{{d_{ik} }}} \right)^{{\frac{1}{m - 1}}} }}{{\sum\limits_{p = 1}^{c} {\left( {\frac{{d^{(k)}_{\max } }}{{d_{pk} }}} \right)^{{\frac{1}{m - 1}}} } }} $$

$\because d_{\max }^{(k)} = \mathop {\max }\limits_{1 \le j \le c} \left\{ {d_{jk} } \right\},1 \le \frac{{d^{(k)}_{\max } }}{{d_{ik} }} < + \infty$,

$$ \therefore \mathop {\lim }\limits_{m \to + \infty } \left( {\frac{{d^{(k)}_{\max } }}{{d_{ik} }}} \right)^{{\frac{1}{m - 1}}} = \left( {\frac{{d^{(k)}_{\max } }}{{d_{ik} }}} \right)^{ + 0} = 1 $$

$$ \therefore \mathop {\lim }\limits_{m \to + \infty } \sum\limits_{p = 1}^{c} {\left( {\frac{{d^{(k)}_{\max } }}{{d_{pk} }}} \right)^{{\frac{1}{m - 1}}} = \sum\limits_{p = 1}^{c} {\mathop {\lim }\limits_{m \to + \infty } } } \left( {\frac{{d^{(k)}_{\max } }}{{d_{ik} }}} \right)^{{\frac{1}{m - 1}}} = \sum\limits_{p = 1}^{c} 1 = c $$

So, we can get

$$ \begin{gathered} \mathop {\lim }\limits_{m \to + \infty } \mu_{ik} = \mathop {\lim }\limits_{m \to + \infty } \frac{{\left( {\frac{1}{{d_{jk} }}} \right)^{{\frac{1}{m - 1}}} }}{{\sum\limits_{p = 1}^{c} {\left( {\frac{1}{{d_{pk} }}} \right)^{{\frac{1}{m - 1}}} } }} = \frac{{\left( {\frac{{d^{(k)}_{\min } }}{{d_{ik} }}} \right)^{{\frac{1}{m - 1}}} }}{{\sum\limits_{p = 1}^{c} {\left( {\frac{{d^{(k)}_{\min } }}{{d_{pk} }}} \right)^{{\frac{1}{m - 1}}} } }} \hfill \\ \, = \frac{{\mathop {\lim }\limits_{m \to + \infty } \left( {\frac{{d^{(k)}_{\max } }}{{d_{ik} }}} \right)^{{\frac{1}{m - 1}}} }}{{\mathop {\lim }\limits_{m \to + \infty } \sum\limits_{p = 1}^{c} {\left( {\frac{{d^{(k)}_{\max } }}{{d_{pk} }}} \right)^{{\frac{1}{m - 1}}} } }}=\frac{{1}}{c};1 \le i \le c,1 \le k \le n \hfill \\ \end{gathered} $$

Obviously, when $m \to + \infty$, the membership value in the FCM algorithm is all $\frac{1}{c}$, The probability that the data instance belongs to each cluster is equal, and the algorithm will not work.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, K., Ni, T., Xue, J. et al. Deep soft clustering: simultaneous deep embedding and soft-partition clustering. J Ambient Intell Human Comput 14, 5581–5593 (2023). https://doi.org/10.1007/s12652-021-02997-1

Download citation

Received: 21 October 2020
Accepted: 01 March 2021
Published: 13 March 2021
Issue Date: May 2023
DOI: https://doi.org/10.1007/s12652-021-02997-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep soft clustering: simultaneous deep embedding and soft-partition clustering

Abstract

Access this article

Similar content being viewed by others

Maintaining Consistency with Constraints: A Constrained Deep Clustering Method

Efficient regularized spectral data embedding

Deep Embedded Clustering with Random Projection Penalty

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendices

Appendix A: Evaluation measures

Appendix B: The boundary significance of the weighted index \(m\)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep soft clustering: simultaneous deep embedding and soft-partition clustering

Abstract

Access this article

Similar content being viewed by others

Maintaining Consistency with Constraints: A Constrained Deep Clustering Method

Efficient regularized spectral data embedding

Deep Embedded Clustering with Random Projection Penalty

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendices

Appendix A: Evaluation measures

Appendix B: The boundary significance of the weighted index \(m\)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation