Abstract
Traditional clustering methods are not very effective when dealing with high-dimensional and huge datasets. Even if there are some traditional dimensionality reduction methods such as Principal components analysis (PCA), Linear discriminant analysis (LDA) and T-distributed stochastic neighbor embedding (T-SNE), they still can not significantly improve the effect of the clustering algorithm in this scenario. Recent studies have combined Non-linear dimensionality reduction achieved by deep neural networks with hard-partition clustering, and have achieved reliability results, but these methods can not update the parameters of dimensionality reduction and clustering at the same time. We found that soft-partition clustering can be well combined with deep embedding, and the membership of Fuzzy c-means (FCM) can solve the problem that gradient descent can not be implemented because the assignment process of the hard-partition clustering algorithm is discrete, so that the algorithm can update the parameters of deep neural network (DNN) and cluster centroids at the same time. We build an continuous objective function that combine the soft-partition clustering with deep embedding, so that the learning representations can be cluster-friendly. The experimental results show that our proposed method of simultaneously optimizing the parameters of deep dimensionality reduction and clustering is better than the method with separate optimization.
Similar content being viewed by others
References
Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceed ings of the 18thannual ACM-SIAM symposium on Discretealgorithms, pp 1027
Baldi P, Hornik K (1989) Neural networks and principal com-ponent analysis: Learning from examples without local min-ima.Neural networks, 2(1):53–58
Bezdek JC, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10(2–3):191–203
Celeux G, Govaert G (1992) A classification em algorithm for clustering and two615stochastic versions. Comput Stat Data Anal 14(3):315–333
Chen X, Zhou Q, Lan R, et al (2020) Sensorineural hearing loss classification via deep-HLNet and few-shot learning. Multim Tools Appl: 1–14
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets[J]. J Mach Learn Res: 1–30.
Dubois D, Prade H (1988) Fuzzy sets and systems. Academic Press, New York
Fard MM, Thonet T, Gaussier E (2020) Deep k-means: jointly clustering with k-means and learning representations. Pattern Recogn Lett 138:185–192
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188
Garcia S, Herrera F (2008) An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons. J Mach Learn Res: 2677–2694.
Guo X, Gao L, Liu X, Yin J (2017) Improved deep embedded clustering with local structure preservation. In: Proceedings of IJCAI, IJCAI ’17, pp 1753–1759
Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Educat Psychol 24:417–441
Ji P, Zhang T, Li H, Salzmann M, Reid I (2017) Deep subspace clustering networks. In: Pro ceedings of NIPS, NIPS ’17, pp 23–32
Jiang YZ, Chung FL, Wang ST, Deng ZH, Wang J, Qian PJ (2015) Collaborative fuzzy cluster- ing from multiple weighted views. IEEE Trans Cybern 45(4):688–701
Jiang YZ, Deng ZH, Chung FL, Wang GJ, Qian PJ, Choi KS, Wang ST (2017a) Recognition of epileptic EEG signals using a novel multi-view TSK fuzzy sys-tem. IEEE Trans Fuzzy Systems 25(1):3–20
Jiang YZ, Wu DR, Deng ZH, Qian PJ, Wang J, Wang GJ, Chung FL, Choi KS, Wang ST (2017) Seizure classification from eeg signals using transfer learning, semi-supervised learning and TSK fuzzy system. IEEE Trans Neural Syst Rehabili Eng 25(12): 2270–2284
Jiang YZ, Gu XQ, Wu DR, Hang WL, Xue J, Qiu S, Lin CT (2019a) A novel negative-transfer-resistant fuzzy cluster-ing model with a shared cross-domain transfer latent space and its application to brain CT Image segmentation. IEEE/ACM Transa Comput Biol Bioinform. https://doi.org/10.1109/TCBB.2019.2963873
Jiang YZ, Zhao KF, Xia KJ, Xue J, Zhou LY, Ding Y, Qian PJ (2019) a novel distributed multi task fuzzy clustering algorithm for automatic MR brain image segmen-tation. J Med Syst 43(5): 118:1–118:9
Jiang YZ, Zhang YP, Lin C, Wu DR, Lin CT (2020) EEG-based driver drowsiness estimation using an online multi-view and transfer TSK fuzzy system. IEEE Trans Intell Transp Syst. https://doi.org/10.1109/TITS.2020.2973673
Jolliffe IT (2002) Principal Component Analysis, 2nd edn. Springer Verlag
Karayiannis NB (1994) MECA: maximum entropy clustering algorithm. In: Proceedings of the IEEE International Conference on Fuzzy System, Orlando, pp. 630–635
Kavukcuoglu K, Fergus R, LeCun Y, et al (2009) Learning invari-ant features through topographicfilter maps. InComputer Vi-sion and Pattern Recognition, 2009. CVPR 2009. IEEE Con-ference on, pages 1605–1612. IEEE
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenetclassification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Kuhn HW (1955) The Hungarian method for the assignment problem[J]. Naval Res Log Quart 2(1–2):83–97
Lee H, Pham P, Largman Y, Ng AY (2009) Unsupervisedfeature learning for audio classification using convolutionaldeep belief networks. In: Advances in neural information pro-cessing systems, pp 1096–1104
Li R, Mukaidono M (1995) A maximum-entropy approach to fuzzy clustering. In: Proceedings on IEEE International Conference on Fuzzy System, pp 2227–2232
Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inform Theor 28(2):129–137
Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605
MacQueen JB (1967), Some methods for classification and analysis of multivariate observa tions. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297
Miyamoto S, Umayahara K (1998) Fuzzy clustering by quadratic regularization. In: Proceedings of the 1998 IEEE International Conference on Fuzzy Systems and IEEE World Congress on Computational Intelligence, pp. 1394–1399
Miyamoto S, Ichihashi H, Honda K (2008) Algorithms for Fuzzy Clustering. Springer, Berlin
Pal NR, Pal K, Keller JM, Bezdek JC (2005) A possibilistic fuzzy c-means clustering algorithm. IEEE Trans Fuzzy Syst 13(4):517–530
Peng X, Xiao S, Feng J, Yau WY, Yi Z (2016) Deep subspace clustering with sparsity prior. In: Proceedings of IJCAI, IJCAI ’16, pp 1925–1931
Peng X, Feng J, Lu J, Yau Wy, Yi Z (2017) Cascade subspace clustering. In: Proceedings of AAAI,AAAI ’17, pp 2478–2484
Rifai S, Vincent P, Muller X, Glorot X, Bengio Y (2011) Contractive auto-encoders: explicit invariance during feature extraction. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 833–840
Rumelhart DE, Hinton GE, Williams RJ (2012) Learning representations by back-propagating errors. Cogn Model 5(3):1
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint 1409.1556
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–9
Valdivia A, Martínez-Cámara E, Chaturvedi I, Luzón MV, Cambria E, Ong YS, Herrera F (2020) What do people think about this monument? Understanding negative reviews via deep learning, clustering and descriptive rules. J Ambient Intell Hum Comput 11(1):39–52
Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust fea tures with denoising au-toencoders. In: Proceedings of the 25th international confer-ence on Machine learning, pp 1096–1103. ACM
Wang SH, Govindaraj VV, Górriz JM, Zhang X, Zhang YD (2020) Covid-19 classification by FGCNet with deep feature fusion from graph convolutional network and convolutional neural network. Inform Fus 67:208–229
Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: Proceedings of ICML, ICML ’16, pp 478–487
Yang B, Fu X, Sidiropoulos ND, Hong M (2017) Towards K-means friendly spaces: simulta neous deep learning and clustering. In: Proceedings of ICML, ICML ’17, pp 3861–3870
Yang JC, Shi Rui, Ni BB (2020) MedMNIST classification decathlon: a lightweight AutoML benchmark for medical image analysis. arXiv preprint 2010.14925
Yao X, Wang X, Wang S et al (2020) A comprehensive survey on convolutional neural network in medical image analysis. Multim Tools Appl. https://doi.org/10.1007/s11042-020-09634-7
Yolcu G, Oztel I, Kazan S, Oz C, Bunyak F (2020) Deep learning-based face analysis system for monitoring customer interest. J Ambient Intell Hum Comp 11(1):237–248
Zadeh LA (1965) Fuzzy sets. Inform Control 8(3):338–353
Zhang YD, Dong Z, Wang SH, Yu X, Yao X, Zhou Q, Martinez FJ (2020) Advances in mul timodal data fusion in neuroimaging: Overview, challenges, and novel orientation. Inform Fus 64:149–187
Zhang Y, Guttery DS, Wang SH (2020) Abnormal breast detection by an improved AlexNet model. Ann Oncol 31:S277
Zhang YD, Satapathy SC, Zhu LY, et al (2020) A seven-layer convolutional neural network for chest CT based COVID-19 diagnosis using stochastic pooling. IEEE Sens J
Zhang YP, Wang SH, Xia KJ, Jiang YZ, Qian PJ (2021) Alzheimer’s disease multiclass diagno sis via multimodal neu-roimaging embedding feature selection and fusion. Inform Fus 66:170–183
Acknowledgement
This work was supported in part by the National Natural Science Foundation of China under Grants 61702225 and 61806026, by the Natural Science Foundation of Jiangsu Province under Grant BK 20180956, and in part by Six Talent Peaks Project of Jiangsu Province under Grant XYDXX-127, and by the Jiangsu Committee of Health under Grant H2018071.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Evaluation measures
We used the standard unsupervised measures to evaluate the performance of clustering algorithm: the clustering accuracy (ACC). For all algorithms we set the number of ground-truth categories as the number of clusters, then evaluate performance with ACC:
where \(s_{i}\) is the groun-truth classes,\(c_{i}\) is the obtained cluster, Where \(\chi\) is the indicator function: \(\chi \{ true\} = 1\) and \(\chi \{ false\} = 0\);\(\psi\) ranges over all possible one-to-one mapping between the obtained clusters and groud-truth class and this match is based on the Hungarian algorithm (Kuhn 1955).
In addition, we used Normalized Mutual Information (NMI) which is an information-thoretic measure that based on the mutual information:
where \(I\) represents mutual information, \(H\) is entropy
where \(P(w_{k} ),P(c_{j} ),P(w_{k} \cap c_{j} )\) can be regarded as the probability that the data instance belongs to the cluster \(w_{k}\), belongs to the ground-truth class \(c_{j}\), and belongs to both at the same time. The second equivalent formula is derived from the maximum likelihood estimate of the probability.
Appendix B: The boundary significance of the weighted index \(m\)
For the FCM algorithm where m belongs to the interval \([1,\infty )\), there are the following situations:
When \(m = 1\), FCM algorithm degenerates into KM algorithm.
When \(m \to 1^{ + }\), FCM degenerates to KM algorithm with probability 1.
When \(m \to + \infty\), FCM loses its division characteristics, and \(U = [\mu_{ik} ] = [\frac{1}{c}]\).
Proof
-
1.
\(m \to 1^{ + }\) \(\because d_{ik} = \left\| {x_{k} - v_{i} } \right\|_{A}^{2} \, \therefore {\text{d}}_{k} \ge 0\)
When d = 0, we can get.
when \(\forall d_{ik} \ne 0;1 \le i \le c,\) let \(d^{(k)}_{\min } = \mathop {\min }\limits_{1 \le j \le c} \left\{ {d_{jk} } \right\} \ne 0,\) we can get
When \(d_{ik} = d_{\min }^{(k)}\),\(\frac{{d_{\min }^{(k)} }}{{d_{ik} }} = 1\),we can get
When \(d_{ik} \ne d_{\min }^{(k)}\),\(\frac{{d_{\min }^{(k)} }}{{d_{ik} }} < 1\),we can get
Obviously, when \(m = 1\), the FCM algorithm degenerates from softening to hardening, which is equivalent to K-means.
-
2.
\(m \to + \infty\) \(\because d_{ik} = \left\| {x_{k} - v_{i} } \right\|_{A}^{2} \, \therefore {\text{d}}_{k} \ge 0\)
We do not consider the special case of large \(\exists d_{ik} = 0\),so \(\forall d_{ik} \ne 0;1 \le i \le c\),let \(d_{\max }^{(k)} = \mathop {\max }\limits_{1 \le j \le c} \left\{ {d_{jk} } \right\} \ne 0\),we can get
\(\because d_{\max }^{(k)} = \mathop {\max }\limits_{1 \le j \le c} \left\{ {d_{jk} } \right\},1 \le \frac{{d^{(k)}_{\max } }}{{d_{ik} }} < + \infty\),
So, we can get
Obviously, when \(m \to + \infty\), the membership value in the FCM algorithm is all \(\frac{1}{c}\), The probability that the data instance belongs to each cluster is equal, and the algorithm will not work.
Rights and permissions
About this article
Cite this article
Li, K., Ni, T., Xue, J. et al. Deep soft clustering: simultaneous deep embedding and soft-partition clustering. J Ambient Intell Human Comput 14, 5581–5593 (2023). https://doi.org/10.1007/s12652-021-02997-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-021-02997-1