Skip to main content
Log in

Deep soft clustering: simultaneous deep embedding and soft-partition clustering

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Traditional clustering methods are not very effective when dealing with high-dimensional and huge datasets. Even if there are some traditional dimensionality reduction methods such as Principal components analysis (PCA), Linear discriminant analysis (LDA) and T-distributed stochastic neighbor embedding (T-SNE), they still can not significantly improve the effect of the clustering algorithm in this scenario. Recent studies have combined Non-linear dimensionality reduction achieved by deep neural networks with hard-partition clustering, and have achieved reliability results, but these methods can not update the parameters of dimensionality reduction and clustering at the same time. We found that soft-partition clustering can be well combined with deep embedding, and the membership of Fuzzy c-means (FCM) can solve the problem that gradient descent can not be implemented because the assignment process of the hard-partition clustering algorithm is discrete, so that the algorithm can update the parameters of deep neural network (DNN) and cluster centroids at the same time. We build an continuous objective function that combine the soft-partition clustering with deep embedding, so that the learning representations can be cluster-friendly. The experimental results show that our proposed method of simultaneously optimizing the parameters of deep dimensionality reduction and clustering is better than the method with separate optimization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceed ings of the 18thannual ACM-SIAM symposium on Discretealgorithms, pp 1027

  • Baldi P, Hornik K (1989) Neural networks and principal com-ponent analysis: Learning from examples without local min-ima.Neural networks, 2(1):53–58

  • Bezdek JC, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10(2–3):191–203

    Article  Google Scholar 

  • Celeux G, Govaert G (1992) A classification em algorithm for clustering and two615stochastic versions. Comput Stat Data Anal 14(3):315–333

    Article  MATH  Google Scholar 

  • Chen X, Zhou Q, Lan R, et al (2020) Sensorineural hearing loss classification via deep-HLNet and few-shot learning. Multim Tools Appl: 1–14

  • Demšar J (2006) Statistical comparisons of classifiers over multiple data sets[J]. J Mach Learn Res: 1–30.

  • Dubois D, Prade H (1988) Fuzzy sets and systems. Academic Press, New York

    MATH  Google Scholar 

  • Fard MM, Thonet T, Gaussier E (2020) Deep k-means: jointly clustering with k-means and learning representations. Pattern Recogn Lett 138:185–192

    Article  Google Scholar 

  • Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188

    Article  Google Scholar 

  • Garcia S, Herrera F (2008) An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons. J Mach Learn Res: 2677–2694.

  • Guo X, Gao L, Liu X, Yin J (2017) Improved deep embedded clustering with local structure preservation. In: Proceedings of IJCAI, IJCAI ’17, pp 1753–1759

  • Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Educat Psychol 24:417–441

    Article  MATH  Google Scholar 

  • Ji P, Zhang T, Li H, Salzmann M, Reid I (2017) Deep subspace clustering networks. In: Pro ceedings of NIPS, NIPS ’17, pp 23–32

  • Jiang YZ, Chung FL, Wang ST, Deng ZH, Wang J, Qian PJ (2015) Collaborative fuzzy cluster- ing from multiple weighted views. IEEE Trans Cybern 45(4):688–701

    Article  Google Scholar 

  • Jiang YZ, Deng ZH, Chung FL, Wang GJ, Qian PJ, Choi KS, Wang ST (2017a) Recognition of epileptic EEG signals using a novel multi-view TSK fuzzy sys-tem. IEEE Trans Fuzzy Systems 25(1):3–20

    Article  Google Scholar 

  • Jiang YZ, Wu DR, Deng ZH, Qian PJ, Wang J, Wang GJ, Chung FL, Choi KS, Wang ST (2017) Seizure classification from eeg signals using transfer learning, semi-supervised learning and TSK fuzzy system. IEEE Trans Neural Syst Rehabili Eng 25(12): 2270–2284

  • Jiang YZ, Gu XQ, Wu DR, Hang WL, Xue J, Qiu S, Lin CT (2019a) A novel negative-transfer-resistant fuzzy cluster-ing model with a shared cross-domain transfer latent space and its application to brain CT Image segmentation. IEEE/ACM Transa Comput Biol Bioinform. https://doi.org/10.1109/TCBB.2019.2963873

    Article  Google Scholar 

  • Jiang YZ, Zhao KF, Xia KJ, Xue J, Zhou LY, Ding Y, Qian PJ (2019) a novel distributed multi task fuzzy clustering algorithm for automatic MR brain image segmen-tation. J Med Syst 43(5): 118:1–118:9

  • Jiang YZ, Zhang YP, Lin C, Wu DR, Lin CT (2020) EEG-based driver drowsiness estimation using an online multi-view and transfer TSK fuzzy system. IEEE Trans Intell Transp Syst. https://doi.org/10.1109/TITS.2020.2973673

    Article  Google Scholar 

  • Jolliffe IT (2002) Principal Component Analysis, 2nd edn. Springer Verlag

    MATH  Google Scholar 

  • Karayiannis NB (1994) MECA: maximum entropy clustering algorithm. In: Proceedings of the IEEE International Conference on Fuzzy System, Orlando, pp. 630–635

  • Kavukcuoglu K, Fergus R, LeCun Y, et al (2009) Learning invari-ant features through topographicfilter maps. InComputer Vi-sion and Pattern Recognition, 2009. CVPR 2009. IEEE Con-ference on, pages 1605–1612. IEEE

  • Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenetclassification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  • Kuhn HW (1955) The Hungarian method for the assignment problem[J]. Naval Res Log Quart 2(1–2):83–97

    Article  MathSciNet  Google Scholar 

  • Lee H, Pham P, Largman Y, Ng AY (2009) Unsupervisedfeature learning for audio classification using convolutionaldeep belief networks. In: Advances in neural information pro-cessing systems, pp 1096–1104

  • Li R, Mukaidono M (1995) A maximum-entropy approach to fuzzy clustering. In: Proceedings on IEEE International Conference on Fuzzy System, pp 2227–2232

  • Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inform Theor 28(2):129–137

    Article  MathSciNet  MATH  Google Scholar 

  • Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605

    MATH  Google Scholar 

  • MacQueen JB (1967), Some methods for classification and analysis of multivariate observa tions. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297

  • Miyamoto S, Umayahara K (1998) Fuzzy clustering by quadratic regularization. In: Proceedings of the 1998 IEEE International Conference on Fuzzy Systems and IEEE World Congress on Computational Intelligence, pp. 1394–1399

  • Miyamoto S, Ichihashi H, Honda K (2008) Algorithms for Fuzzy Clustering. Springer, Berlin

    MATH  Google Scholar 

  • Pal NR, Pal K, Keller JM, Bezdek JC (2005) A possibilistic fuzzy c-means clustering algorithm. IEEE Trans Fuzzy Syst 13(4):517–530

    Article  Google Scholar 

  • Peng X, Xiao S, Feng J, Yau WY, Yi Z (2016) Deep subspace clustering with sparsity prior. In: Proceedings of IJCAI, IJCAI ’16, pp 1925–1931

  • Peng X, Feng J, Lu J, Yau Wy, Yi Z (2017) Cascade subspace clustering. In: Proceedings of AAAI,AAAI ’17, pp 2478–2484

  • Rifai S, Vincent P, Muller X, Glorot X, Bengio Y (2011) Contractive auto-encoders: explicit invariance during feature extraction. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 833–840

  • Rumelhart DE, Hinton GE, Williams RJ (2012) Learning representations by back-propagating errors. Cogn Model 5(3):1

  • Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint 1409.1556

  • Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–9

  • Valdivia A, Martínez-Cámara E, Chaturvedi I, Luzón MV, Cambria E, Ong YS, Herrera F (2020) What do people think about this monument? Understanding negative reviews via deep learning, clustering and descriptive rules. J Ambient Intell Hum Comput 11(1):39–52

    Article  Google Scholar 

  • Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust fea tures with denoising au-toencoders. In: Proceedings of the 25th international confer-ence on Machine learning, pp 1096–1103. ACM

  • Wang SH, Govindaraj VV, Górriz JM, Zhang X, Zhang YD (2020) Covid-19 classification by FGCNet with deep feature fusion from graph convolutional network and convolutional neural network. Inform Fus 67:208–229

    Article  Google Scholar 

  • Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: Proceedings of ICML, ICML ’16, pp 478–487

  • Yang B, Fu X, Sidiropoulos ND, Hong M (2017) Towards K-means friendly spaces: simulta neous deep learning and clustering. In: Proceedings of ICML, ICML ’17, pp 3861–3870

  • Yang JC, Shi Rui, Ni BB (2020) MedMNIST classification decathlon: a lightweight AutoML benchmark for medical image analysis. arXiv preprint 2010.14925

  • Yao X, Wang X, Wang S et al (2020) A comprehensive survey on convolutional neural network in medical image analysis. Multim Tools Appl. https://doi.org/10.1007/s11042-020-09634-7

    Article  Google Scholar 

  • Yolcu G, Oztel I, Kazan S, Oz C, Bunyak F (2020) Deep learning-based face analysis system for monitoring customer interest. J Ambient Intell Hum Comp 11(1):237–248

    Article  Google Scholar 

  • Zadeh LA (1965) Fuzzy sets. Inform Control 8(3):338–353

    Article  MATH  Google Scholar 

  • Zhang YD, Dong Z, Wang SH, Yu X, Yao X, Zhou Q, Martinez FJ (2020) Advances in mul timodal data fusion in neuroimaging: Overview, challenges, and novel orientation. Inform Fus 64:149–187

    Article  Google Scholar 

  • Zhang Y, Guttery DS, Wang SH (2020) Abnormal breast detection by an improved AlexNet model. Ann Oncol 31:S277

    Article  Google Scholar 

  • Zhang YD, Satapathy SC, Zhu LY, et al (2020) A seven-layer convolutional neural network for chest CT based COVID-19 diagnosis using stochastic pooling. IEEE Sens J

  • Zhang YP, Wang SH, Xia KJ, Jiang YZ, Qian PJ (2021) Alzheimer’s disease multiclass diagno sis via multimodal neu-roimaging embedding feature selection and fusion. Inform Fus 66:170–183

    Article  Google Scholar 

Download references

Acknowledgement

This work was supported in part by the National Natural Science Foundation of China under Grants 61702225 and 61806026, by the Natural Science Foundation of Jiangsu Province under Grant BK 20180956, and in part by Six Talent Peaks Project of Jiangsu Province under Grant XYDXX-127, and by the Jiangsu Committee of Health under Grant H2018071.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yizhang Jiang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Evaluation measures

We used the standard unsupervised measures to evaluate the performance of clustering algorithm: the clustering accuracy (ACC). For all algorithms we set the number of ground-truth categories as the number of clusters, then evaluate performance with ACC:

$$ ACC(C,S) = \mathop {\max }\limits_{\psi } \frac{1}{N}\sum\limits_{i = 1}^{N} {\chi \left\{ {s_{i} = \psi (c_{i} )} \right\}} $$

where \(s_{i}\) is the groun-truth classes,\(c_{i}\) is the obtained cluster, Where \(\chi\) is the indicator function: \(\chi \{ true\} = 1\) and \(\chi \{ false\} = 0\);\(\psi\) ranges over all possible one-to-one mapping between the obtained clusters and groud-truth class and this match is based on the Hungarian algorithm (Kuhn 1955).

In addition, we used Normalized Mutual Information (NMI) which is an information-thoretic measure that based on the mutual information:

$$ NMI(\Omega ,C) = \frac{I(\Omega ;C)}{{(H(\Omega ) + H(C))/2}} $$

where \(I\) represents mutual information, \(H\) is entropy

$$ \begin{gathered} I(\Omega ,C) = \sum\limits_{k} {\sum\limits_{j} {P(w_{k} \cap c_{j} )\log \frac{{P(w_{k} \cap c_{j} )}}{{P(w_{k} )P(c_{j} )}}} } \hfill \\ =\sum\limits_{k} {\sum\limits_{j} {\frac{{\left| {w_{k} \cap c_{j} } \right|}}{N}\log \frac{{N\left| {w_{k} \cap c_{j} } \right|}}{{\left| {w_{k} } \right|\left| {c_{j} } \right|}}} } \hfill \\ \end{gathered} $$

where \(P(w_{k} ),P(c_{j} ),P(w_{k} \cap c_{j} )\) can be regarded as the probability that the data instance belongs to the cluster \(w_{k}\), belongs to the ground-truth class \(c_{j}\), and belongs to both at the same time. The second equivalent formula is derived from the maximum likelihood estimate of the probability.

$$ \begin{gathered} H(\Omega ) = - \sum\limits_{k} {P(w_{k} )\log P(w_{k} )} \hfill \\ = - \sum\limits_{k} {\frac{{\left| {w_{k} } \right|}}{N}\log \frac{{\left| {w_{k} } \right|}}{N}} \hfill \\ \end{gathered} $$

Appendix B: The boundary significance of the weighted index \(m\)

For the FCM algorithm where m belongs to the interval \([1,\infty )\), there are the following situations:

When \(m = 1\), FCM algorithm degenerates into KM algorithm.

When \(m \to 1^{ + }\), FCM degenerates to KM algorithm with probability 1.

When \(m \to + \infty\), FCM loses its division characteristics, and \(U = [\mu_{ik} ] = [\frac{1}{c}]\).

Proof

  1. 1.

    \(m \to 1^{ + }\) \(\because d_{ik} = \left\| {x_{k} - v_{i} } \right\|_{A}^{2} \, \therefore {\text{d}}_{k} \ge 0\)

When d = 0, we can get.

$$ \mu_{ik} = \left\{ {\begin{array}{*{20}c} {1,{\text{ d}}_{ik} = 0 = \min_{j} \left\{ {d_{jk} } \right\}} \\ {0,{\text{ other}}s} \\ \end{array} } \right.;{ 1} \le i \le c, \, 1 \le k \le n $$

when \(\forall d_{ik} \ne 0;1 \le i \le c,\) let \(d^{(k)}_{\min } = \mathop {\min }\limits_{1 \le j \le c} \left\{ {d_{jk} } \right\} \ne 0,\) we can get

$$ \mu_{ik} = \frac{{\left( {\frac{1}{{d_{jk} }}} \right)^{{\frac{1}{m - 1}}} }}{{\sum\limits_{p = 1}^{c} {\left( {\frac{1}{{d_{pk} }}} \right)^{{\frac{1}{m - 1}}} } }} = \frac{{\left( {\frac{{d^{(k)}_{\min } }}{{d_{ik} }}} \right)^{{\frac{1}{m - 1}}} }}{{\sum\limits_{p = 1}^{c} {\left( {\frac{{d^{(k)}_{\min } }}{{d_{pk} }}} \right)^{{\frac{1}{m - 1}}} } }} $$
$$ \because d_{\min }^{(k)} = \mathop {\min }\limits_{1 \le j \le c} \left\{ {d_{jk} } \right\}\therefore 0 < \frac{{d^{(k)}_{\min } }}{{d_{ik} }} \le 1 $$

When \(d_{ik} = d_{\min }^{(k)}\),\(\frac{{d_{\min }^{(k)} }}{{d_{ik} }} = 1\),we can get

$$ \mathop {\lim }\limits_{{m\mathop{\longrightarrow}\limits^{ + }1}} \left( {\frac{{d_{\min }^{(k)} }}{{d_{ik} }}} \right)^{{\frac{1}{m - 1}}} = \mathop {\lim }\limits_{{m\mathop{\longrightarrow}\limits^{ + }1}} 1^{{\frac{1}{m - 1}}} = 1^{ + \infty } = 1 $$

When \(d_{ik} \ne d_{\min }^{(k)}\),\(\frac{{d_{\min }^{(k)} }}{{d_{ik} }} < 1\),we can get

$$ \mathop {\lim }\limits_{{m\mathop{\longrightarrow}\limits^{ + }1}} \left( {\frac{{d_{\min }^{(k)} }}{{d_{ik} }}} \right)^{{\frac{1}{m - 1}}} = \left( {\frac{{d_{\min }^{(k)} }}{{d_{ik} }}} \right)^{ + \infty } = 0 $$

Obviously, when \(m = 1\), the FCM algorithm degenerates from softening to hardening, which is equivalent to K-means.

  1. 2.

    \(m \to + \infty\) \(\because d_{ik} = \left\| {x_{k} - v_{i} } \right\|_{A}^{2} \, \therefore {\text{d}}_{k} \ge 0\)

We do not consider the special case of large \(\exists d_{ik} = 0\),so \(\forall d_{ik} \ne 0;1 \le i \le c\),let \(d_{\max }^{(k)} = \mathop {\max }\limits_{1 \le j \le c} \left\{ {d_{jk} } \right\} \ne 0\),we can get

$$ \mu_{ik} = \frac{{\left( {\frac{1}{{d_{jk} }}} \right)^{{\frac{1}{m - 1}}} }}{{\sum\limits_{p = 1}^{c} {\left( {\frac{1}{{d_{pk} }}} \right)^{{\frac{1}{m - 1}}} } }} = \frac{{\left( {\frac{{d^{(k)}_{\max } }}{{d_{ik} }}} \right)^{{\frac{1}{m - 1}}} }}{{\sum\limits_{p = 1}^{c} {\left( {\frac{{d^{(k)}_{\max } }}{{d_{pk} }}} \right)^{{\frac{1}{m - 1}}} } }} $$

\(\because d_{\max }^{(k)} = \mathop {\max }\limits_{1 \le j \le c} \left\{ {d_{jk} } \right\},1 \le \frac{{d^{(k)}_{\max } }}{{d_{ik} }} < + \infty\),

$$ \therefore \mathop {\lim }\limits_{m \to + \infty } \left( {\frac{{d^{(k)}_{\max } }}{{d_{ik} }}} \right)^{{\frac{1}{m - 1}}} = \left( {\frac{{d^{(k)}_{\max } }}{{d_{ik} }}} \right)^{ + 0} = 1 $$
$$ \therefore \mathop {\lim }\limits_{m \to + \infty } \sum\limits_{p = 1}^{c} {\left( {\frac{{d^{(k)}_{\max } }}{{d_{pk} }}} \right)^{{\frac{1}{m - 1}}} = \sum\limits_{p = 1}^{c} {\mathop {\lim }\limits_{m \to + \infty } } } \left( {\frac{{d^{(k)}_{\max } }}{{d_{ik} }}} \right)^{{\frac{1}{m - 1}}} = \sum\limits_{p = 1}^{c} 1 = c $$

So, we can get

$$ \begin{gathered} \mathop {\lim }\limits_{m \to + \infty } \mu_{ik} = \mathop {\lim }\limits_{m \to + \infty } \frac{{\left( {\frac{1}{{d_{jk} }}} \right)^{{\frac{1}{m - 1}}} }}{{\sum\limits_{p = 1}^{c} {\left( {\frac{1}{{d_{pk} }}} \right)^{{\frac{1}{m - 1}}} } }} = \frac{{\left( {\frac{{d^{(k)}_{\min } }}{{d_{ik} }}} \right)^{{\frac{1}{m - 1}}} }}{{\sum\limits_{p = 1}^{c} {\left( {\frac{{d^{(k)}_{\min } }}{{d_{pk} }}} \right)^{{\frac{1}{m - 1}}} } }} \hfill \\ \, = \frac{{\mathop {\lim }\limits_{m \to + \infty } \left( {\frac{{d^{(k)}_{\max } }}{{d_{ik} }}} \right)^{{\frac{1}{m - 1}}} }}{{\mathop {\lim }\limits_{m \to + \infty } \sum\limits_{p = 1}^{c} {\left( {\frac{{d^{(k)}_{\max } }}{{d_{pk} }}} \right)^{{\frac{1}{m - 1}}} } }}=\frac{{1}}{c};1 \le i \le c,1 \le k \le n \hfill \\ \end{gathered} $$

Obviously, when \(m \to + \infty\), the membership value in the FCM algorithm is all \(\frac{1}{c}\), The probability that the data instance belongs to each cluster is equal, and the algorithm will not work.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, K., Ni, T., Xue, J. et al. Deep soft clustering: simultaneous deep embedding and soft-partition clustering. J Ambient Intell Human Comput 14, 5581–5593 (2023). https://doi.org/10.1007/s12652-021-02997-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-021-02997-1

Keywords

Navigation