Skip to main content
Log in

Self-supervised autoencoders for clustering and classification

  • Original Paper
  • Published:
Evolving Systems Aims and scope Submit manuscript

Abstract

Clustering techniques aim at finding meaningful groups of data samples which exhibit similarity with regards to a set of characteristics, typically measured in terms of pairwise distances. Due to the so-called curse of dimensionality, i.e., the observation that high-dimensional spaces are unsuited for measuring distances, distance-based clustering techniques such as the classic k-means algorithm fail to uncover meaningful clusters in high-dimensional spaces. Thus, dimensionality reduction techniques can be used to greatly improve the performance of such clustering methods. In this work, we study Autoencoders as Deep Learning tools for dimensionality reduction, and combine them with k-means clustering to learn low-dimensional representations which improve the clustering performance by enhancing intra-cluster relationships and suppressing inter-cluster ones, in a self-supervised manner. In the supervised paradigm, distance-based classifiers may also greatly benefit from robust dimensionality reduction techniques. The proposed method is evaluated via multiple experiments on datasets of handwritten digits, various objects and faces, and is shown to improve external cluster quality measuring criteria. A fully supervised counterpart is also evaluated on two face recognition datasets, and is shown to improve the performance of various lightweight classifiers, allowing their use in real-time applications on devices with limited computational resources, such as Unmanned Aerial Vehicles (UAVs).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Aggarwal CC, Hinneburg A, Keim DA (2001) On the surprising behavior of distance metrics in high dimensional spaces. In: ICDT, vol 1. Springer, pp 420–434

  • Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, pp 1027–1035

  • Belhumeur PN, Hespanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720

    Article  Google Scholar 

  • Beyer K, Goldstein J, Ramakrishnan R, Shaft U (1999) When is nearest neighbor meaningful? In: International conference on database theory. Springer, pp 217–235

  • Bezdek JC, Ehrlich R, Full W (1984) Fcm: the fuzzy c-means clustering algorithm. Comput Geosci 10(2–3):191–203

    Article  Google Scholar 

  • Bhuiyan AA, Liu CH (2007) On face recognition using gabor filters. World Acad Sci Eng Technol 28:51–56

    Google Scholar 

  • Boutsidis C, Zouzias A, Mahoney MW, Drineas P (2015) Randomized dimensionality reduction for \(k\)-means clustering. IEEE Trans Inf Theory 61(2):1045–1062

    Article  MathSciNet  Google Scholar 

  • Bouzas D, Arvanitopoulos N, Tefas A (2015) Graph embedded nonparametric mutual information for supervised dimensionality reduction. IEEE Trans Neural Netw Learn Syst 26(5):951–963

    Article  MathSciNet  Google Scholar 

  • Celebi ME, Kingravi HA, Vela PA (2013) A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst Appl 40(1):200–210

    Article  Google Scholar 

  • Chrysouli C, Tefas A (2015) Spectral clustering and semi-supervised learning using evolving similarity graphs. Appl Soft Comput 34:625–637

    Article  Google Scholar 

  • Davis JV, Kulis B, Jain P, Sra S, Dhillon IS (2007) Information-theoretic metric learning. In: Proceedings of the 24th international conference on Machine learning. ACM, pp 209–216

  • Dehghan A, Ortiz EG, Villegas R, Shah M (2014) Who do i look like? Determining parent-offspring resemblance via gated autoencoders. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1757–1764

  • Dhillon IS, Guan Y, Kulis B (2004) Kernel k-means: spectral clustering and normalized cuts. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 551–556

  • Ding C, He X (2004) K-means clustering via principal component analysis. In: Proceedings of the twenty-first international conference on Machine learning. ACM, p 29

  • Ding C, Li T (2007) Adaptive dimension reduction using discriminant analysis and k-means clustering. In: Proceedings of the 24th international conference on Machine learning. ACM, pp 521–528

  • Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(2):179–188

    Article  Google Scholar 

  • Ghosh S, Dubey SK (2013) Comparative analysis of k-means and fuzzy c-means algorithms. Int J Adv Comput Sci Appl 4(4)

  • Guo G, Li SZ, Chan K (2000) Face recognition by support vector machines. In: Fourth IEEE International Conference on Automatic Face and Gesture Recognition, 2000. Proceedings, pp 196–201. IEEE

  • Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861

  • Huang P, Huang Y, Wang W, Wang L (2014) Deep embedding network for clustering. In: 2014 22nd International Conference on Pattern Recognition (ICPR). IEEE, pp 1532–1537

  • Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Disc 2(3):283–304

    Article  Google Scholar 

  • Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31(8):651–666

    Article  Google Scholar 

  • Jolliffe I (2002) Principal component analysis. Wiley Online Library

  • Khan SS, Ahmad A (2004) Cluster center initialization algorithm for k-means clustering. Pattern Recogn Lett 25(11):1293–1302

    Article  Google Scholar 

  • Kingma D, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980

  • Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  • Kuhn HW (1955) The Hungarian method for the assignment problem. NRL 2(1–2):83–97

    Article  MathSciNet  Google Scholar 

  • Le QV (2013) Building high-level features using large scale unsupervised learning. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 8595–8598

  • LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  • Lee KC, Ho J, Kriegman DJ (2005) Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans Pattern Anal Mach Intell 27(5):684–698

    Article  Google Scholar 

  • Likas A, Vlassis N, Verbeek JJ (2003) The global k-means clustering algorithm. Pattern Recogn 36(2):451–461

    Article  Google Scholar 

  • MacQueen J et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1. Oakland, CA, USA, pp 281–297

  • Mika S, Ratsch G, Weston J, Scholkopf B, Mullers KR (1999) Fisher discriminant analysis with kernels. In: Neural Networks for Signal Processing IX, 1999. Proceedings of the 1999 IEEE Signal Processing Society Workshop., pp 41–48. IEEE

  • Nene SA, Nayar SK, Murase H et al (1996) Columbia object image library (coil-20)

  • Nikitidis S, Tefas A, Pitas I (2014) Maximum margin projection subspace learning for visual data analysis. IEEE Trans Image Process 23(10):4413–4425

    Article  MathSciNet  Google Scholar 

  • Nousi P, Tefas A (2017) Deep learning algorithms for discriminant autoencoding. Neurocomputing

  • Nousi P, Tefas A (2017) Discriminatively trained autoencoders for fast and accurate face recognition. In: International Conference on Engineering Applications of Neural Networks. Springer, pp 205–215

  • Passalis N, Tefas A (2016) Information clustering using manifold-based optimization of the bag-of-features representation. IEEE Trans Cybern

  • Passalis N, Tefas A (2017) Dimensionality reduction using similarity-induced embeddings. IEEE Trans Neural Netw Learn Syst

  • Rolfe JT, LeCun Y (2013) Discriminative recurrent sparse auto-encoders. arXiv preprint arXiv:1301.3775

  • Rumelhart DE, Hinton GE, Williams RJ (1985) Learning internal representations by error propagation. Tech. rep, DTIC Document

  • Samaria FS, Harter AC (1994) Parameterisation of a stochastic model for human face identification. In: Proceedings of the Second IEEE Workshop on Applications of Computer Vision. IEEE, pp 138–142

  • Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 815–823

  • Song C, Liu F, Huang Y, Wang L, Tan T (2013) Auto-encoder based data clustering. In: Iberoamerican Congress on Pattern Recognition. Springer, pp 117–124

  • Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  • Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617

  • Tian F, Gao B, Cui Q, Chen E, Liu TY (2014) Learning deep representations for graph clustering. In: AAAI, pp 1293–1299

  • Tsapanos N, Tefas A, Nikolaidis N, Pitas I (2015) A distributed framework for trimmed kernel k-means clustering. Pattern Recogn 48(8):2685–2698

    Article  Google Scholar 

  • Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on Machine learning. ACM, pp 1096–1103

  • Wang J, Wang J, Ke Q, Zeng G, Li S (2015) Fast approximate k-means via cluster closures. In: Multimedia Data Mining and Analytics. Springer, pp 373–395

  • Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: International Conference on Machine Learning, pp 478–487

  • Xing EP, Jordan MI, Russell SJ, Ng AY (2003) Distance metric learning with application to clustering with side-information. In: Advances in neural information processing systems, pp 521–528

  • Yang J, Parikh D, Batra D (2016) Joint unsupervised learning of deep representations and image clusters. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5147–5156

  • Zhang T (2004) Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the twenty-first international conference on Machine learning. ACM, p 116

Download references

Acknowledgements

This project has received funding from the European Unions Horizon 2020 research and innovation programme under grant agreement No. 731667 (MULTIDRONE). This publication reflects the authors views only. The European Commission is not responsible for any use that may be made of the information it contains.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paraskevi Nousi.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nousi, P., Tefas, A. Self-supervised autoencoders for clustering and classification. Evolving Systems 11, 453–466 (2020). https://doi.org/10.1007/s12530-018-9235-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12530-018-9235-y

Keywords

Navigation