Abstract
Clustering techniques aim at finding meaningful groups of data samples which exhibit similarity with regards to a set of characteristics, typically measured in terms of pairwise distances. Due to the so-called curse of dimensionality, i.e., the observation that high-dimensional spaces are unsuited for measuring distances, distance-based clustering techniques such as the classic k-means algorithm fail to uncover meaningful clusters in high-dimensional spaces. Thus, dimensionality reduction techniques can be used to greatly improve the performance of such clustering methods. In this work, we study Autoencoders as Deep Learning tools for dimensionality reduction, and combine them with k-means clustering to learn low-dimensional representations which improve the clustering performance by enhancing intra-cluster relationships and suppressing inter-cluster ones, in a self-supervised manner. In the supervised paradigm, distance-based classifiers may also greatly benefit from robust dimensionality reduction techniques. The proposed method is evaluated via multiple experiments on datasets of handwritten digits, various objects and faces, and is shown to improve external cluster quality measuring criteria. A fully supervised counterpart is also evaluated on two face recognition datasets, and is shown to improve the performance of various lightweight classifiers, allowing their use in real-time applications on devices with limited computational resources, such as Unmanned Aerial Vehicles (UAVs).
Similar content being viewed by others
References
Aggarwal CC, Hinneburg A, Keim DA (2001) On the surprising behavior of distance metrics in high dimensional spaces. In: ICDT, vol 1. Springer, pp 420–434
Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, pp 1027–1035
Belhumeur PN, Hespanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720
Beyer K, Goldstein J, Ramakrishnan R, Shaft U (1999) When is nearest neighbor meaningful? In: International conference on database theory. Springer, pp 217–235
Bezdek JC, Ehrlich R, Full W (1984) Fcm: the fuzzy c-means clustering algorithm. Comput Geosci 10(2–3):191–203
Bhuiyan AA, Liu CH (2007) On face recognition using gabor filters. World Acad Sci Eng Technol 28:51–56
Boutsidis C, Zouzias A, Mahoney MW, Drineas P (2015) Randomized dimensionality reduction for \(k\)-means clustering. IEEE Trans Inf Theory 61(2):1045–1062
Bouzas D, Arvanitopoulos N, Tefas A (2015) Graph embedded nonparametric mutual information for supervised dimensionality reduction. IEEE Trans Neural Netw Learn Syst 26(5):951–963
Celebi ME, Kingravi HA, Vela PA (2013) A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst Appl 40(1):200–210
Chrysouli C, Tefas A (2015) Spectral clustering and semi-supervised learning using evolving similarity graphs. Appl Soft Comput 34:625–637
Davis JV, Kulis B, Jain P, Sra S, Dhillon IS (2007) Information-theoretic metric learning. In: Proceedings of the 24th international conference on Machine learning. ACM, pp 209–216
Dehghan A, Ortiz EG, Villegas R, Shah M (2014) Who do i look like? Determining parent-offspring resemblance via gated autoencoders. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1757–1764
Dhillon IS, Guan Y, Kulis B (2004) Kernel k-means: spectral clustering and normalized cuts. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 551–556
Ding C, He X (2004) K-means clustering via principal component analysis. In: Proceedings of the twenty-first international conference on Machine learning. ACM, p 29
Ding C, Li T (2007) Adaptive dimension reduction using discriminant analysis and k-means clustering. In: Proceedings of the 24th international conference on Machine learning. ACM, pp 521–528
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(2):179–188
Ghosh S, Dubey SK (2013) Comparative analysis of k-means and fuzzy c-means algorithms. Int J Adv Comput Sci Appl 4(4)
Guo G, Li SZ, Chan K (2000) Face recognition by support vector machines. In: Fourth IEEE International Conference on Automatic Face and Gesture Recognition, 2000. Proceedings, pp 196–201. IEEE
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861
Huang P, Huang Y, Wang W, Wang L (2014) Deep embedding network for clustering. In: 2014 22nd International Conference on Pattern Recognition (ICPR). IEEE, pp 1532–1537
Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Disc 2(3):283–304
Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31(8):651–666
Jolliffe I (2002) Principal component analysis. Wiley Online Library
Khan SS, Ahmad A (2004) Cluster center initialization algorithm for k-means clustering. Pattern Recogn Lett 25(11):1293–1302
Kingma D, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Kuhn HW (1955) The Hungarian method for the assignment problem. NRL 2(1–2):83–97
Le QV (2013) Building high-level features using large scale unsupervised learning. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 8595–8598
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Lee KC, Ho J, Kriegman DJ (2005) Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans Pattern Anal Mach Intell 27(5):684–698
Likas A, Vlassis N, Verbeek JJ (2003) The global k-means clustering algorithm. Pattern Recogn 36(2):451–461
MacQueen J et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1. Oakland, CA, USA, pp 281–297
Mika S, Ratsch G, Weston J, Scholkopf B, Mullers KR (1999) Fisher discriminant analysis with kernels. In: Neural Networks for Signal Processing IX, 1999. Proceedings of the 1999 IEEE Signal Processing Society Workshop., pp 41–48. IEEE
Nene SA, Nayar SK, Murase H et al (1996) Columbia object image library (coil-20)
Nikitidis S, Tefas A, Pitas I (2014) Maximum margin projection subspace learning for visual data analysis. IEEE Trans Image Process 23(10):4413–4425
Nousi P, Tefas A (2017) Deep learning algorithms for discriminant autoencoding. Neurocomputing
Nousi P, Tefas A (2017) Discriminatively trained autoencoders for fast and accurate face recognition. In: International Conference on Engineering Applications of Neural Networks. Springer, pp 205–215
Passalis N, Tefas A (2016) Information clustering using manifold-based optimization of the bag-of-features representation. IEEE Trans Cybern
Passalis N, Tefas A (2017) Dimensionality reduction using similarity-induced embeddings. IEEE Trans Neural Netw Learn Syst
Rolfe JT, LeCun Y (2013) Discriminative recurrent sparse auto-encoders. arXiv preprint arXiv:1301.3775
Rumelhart DE, Hinton GE, Williams RJ (1985) Learning internal representations by error propagation. Tech. rep, DTIC Document
Samaria FS, Harter AC (1994) Parameterisation of a stochastic model for human face identification. In: Proceedings of the Second IEEE Workshop on Applications of Computer Vision. IEEE, pp 138–142
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 815–823
Song C, Liu F, Huang Y, Wang L, Tan T (2013) Auto-encoder based data clustering. In: Iberoamerican Congress on Pattern Recognition. Springer, pp 117–124
Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617
Tian F, Gao B, Cui Q, Chen E, Liu TY (2014) Learning deep representations for graph clustering. In: AAAI, pp 1293–1299
Tsapanos N, Tefas A, Nikolaidis N, Pitas I (2015) A distributed framework for trimmed kernel k-means clustering. Pattern Recogn 48(8):2685–2698
Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on Machine learning. ACM, pp 1096–1103
Wang J, Wang J, Ke Q, Zeng G, Li S (2015) Fast approximate k-means via cluster closures. In: Multimedia Data Mining and Analytics. Springer, pp 373–395
Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: International Conference on Machine Learning, pp 478–487
Xing EP, Jordan MI, Russell SJ, Ng AY (2003) Distance metric learning with application to clustering with side-information. In: Advances in neural information processing systems, pp 521–528
Yang J, Parikh D, Batra D (2016) Joint unsupervised learning of deep representations and image clusters. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5147–5156
Zhang T (2004) Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the twenty-first international conference on Machine learning. ACM, p 116
Acknowledgements
This project has received funding from the European Unions Horizon 2020 research and innovation programme under grant agreement No. 731667 (MULTIDRONE). This publication reflects the authors views only. The European Commission is not responsible for any use that may be made of the information it contains.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Nousi, P., Tefas, A. Self-supervised autoencoders for clustering and classification. Evolving Systems 11, 453–466 (2020). https://doi.org/10.1007/s12530-018-9235-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12530-018-9235-y