Deep learning algorithms for discriminant autoencoding

doi:10.1016/j.neucom.2017.05.042

Neurocomputing

Volume 266, 29 November 2017, Pages 325-335

https://doi.org/10.1016/j.neucom.2017.05.042 Get rights and content

Abstract

In this paper, a new family of Autoencoders (AE) for dimensionality reduction as well as class discrimination is proposed, using various class separating methods which cause a translation of the reconstructed data in a way such that the classes are better separated. The result of this combination is a new type of Discriminant Autoencoder, in which the targets are shifted in space in a discriminative fashion. The proposed Discriminant AE is experimentally compared to the standard Denoising AE in the challenging classification tasks of handwritten digit recognition and facial expression recognition as well as in the CIFAR10 dataset.

Introduction

Classification is one of the most important problems in the field Machine Learning, with many kinds of algorithms having been developed towards this purpose, from Linear Classifiers [1], shallow or deep Neural Networks [2], [3] to Support Vector Machines [4], [5] among others.

A lot of data preprocessing techniques have been suggested towards the optimization of the classification process. A common such technique is Dimensionality Reduction, i.e., the process of reducing the number of random variables, or features, used to represent the data. Perhaps the most well known unsupervised method used for this purpose is Principal Component Analysis (PCA), a linear dimensionality reduction method. There are, however, non-linear methods used for the same purpose, such as kernel based versions of the well-known linear dimensionality reduction methods (e.g. kPCA [6]), or neural network based methods such as Autoencoders [7], [8]. In addition to the reduction in dimension, Autoencoders, as Deep Learning Algorithms, attempt to capture and unveil their input’s most robust features.

Another class of dimensionality reduction methods proposed towards classification optimization is the supervised one, such as Linear Discriminant Analysis [9], which is related to Data Separation. Methods used towards this purpose attempt to move samples of the same class closer to each other while simultaneously moving them away from samples of rival classes. Usually, Linear Discriminant Analysis is performed, along with Principal Component Analysis, to achieve lower dimensions and better separated data at the same time. Intuitively, if we manage to move samples away from rival neighboring samples and towards samples of the same class, then we have achieved data separation.

Many other techniques have been proposed for the same purpose [10]. In this paper, we propose a new family of Autoencoders as Deep Learning Algorithms and as tools for dimensionality reduction, combined with various methods which achieve data separation. Ultimately, we attempt to combine these techniques under a common framework, in order to achieve dimensionality reduction and data separation using Neural Networks. The proposed learning machines are called Discriminant Autoencoders.

The manuscript is organized as follows. The deep autoencoders and the proposed family of discriminant autoencoders are described in Sections 2 and 3, respectively. In Section 4 we describe the experiments conducted to measure the effectiveness of the Discriminant autoencoder and present the results they gave. Last, our conclusions are summarized in Section 5.

Section snippets

Autoencoders

Autoencoders are unsupervised Artificial Neural Networks trained so that they can reconstruct their input, through an intermediate representation, typically of lower dimension. A very simple Autoencoder has a structure similar to a Multilayer Perceptron (MLP) [11], [12], where the output targets are the input itself. The network can then be trained effectively using a variation of the Backpropagation algorithm [12], so that it reconstructs its input with great precision, or in other words, its

Discriminant autoencoders

The proposed Discriminant Autoencoders try to increase the intra-class compactness and also the inter-class separability. To this end, we set new targets for the reconstructed samples that improve specific compactness and separability criteria that are usually used in supervised dimensionality reduction. That is, we employ discriminant analysis criteria described in Section 3.2, under the Graph Embedding Framework [14], to the iterative optimization of the designed autoencoders.

Experimental results

In this Section, we evaluate the proposed discriminant autoencoding framework using various datasets. The first set of experiments uses the well known MNIST dataset [34]. The second set of experiments is applied to three different facial expressions datasets: BU [35], JAFFE [36] and KANADE [37], [38], while the last set is performed on the CIFAR10 dataset [39]. Some samples from the MNIST dataset are shown in Fig. 11, from CIFAR10 in Fig. 12, from the JAFFE dataset in Fig. 13 and from the

Conclusion

In this paper, a novel family of Autoencoders that take into account class information as well as the local geometry of the data has been proposed. The class information is used in order to define new discriminant reconstruction targets and the resulting learning machines are called Discriminant Autoencoders. The proposed dAE are able to encode information for reconstruction as well as information for class separability, leading to improved representations in the hidden layer. By combining the

Paraskevi Nousi obtained her B.Sc. in Informatics in 2014 from Aristotle University of Thessaloniki, Greece. She is currently pursuing her Ph.D. studies in the Artificial Intelligence and Information Analysis Laboratory in the Department of Informatics at the University of Thessaloniki. Her research interests include Machine Learning and Computational Intelligence.

References (40)

M.-F.F. Balcan et al.
Distributed k-means and k-median clustering on general topologies
Proceedings of the Advances in Neural Information Processing Systems
(2013)
YuanG.-X. et al.
Recent advances of large-scale linear classification
Proc. IEEE
(2012)
F. Rosenblatt
Principles of Neurodynamics
(1962)
M.L. Minsky et al.
Perceptrons -Expanded Edition: An Introduction to Computational Geometry
(1987)
C. Cortes et al.
Support-vector networks
Mach. Learn.
(1995)
B. Schölkopf et al.
Advances in Kernel Methods: Support Vector Learning
(1999)
B. Schölkopf et al.
Nonlinear component analysis as a kernel eigenvalue problem
Neural comput.
(1998)
Y. Bengio
Learning deep architectures for AI
Found. Trends® Mach. Learn.
(2009)
G.E. Hinton et al.
Reducing the dimensionality of data with neural networks
Science
(2006)
R.A. Fisher
The use of multiple measurements in taxonomic problems
Ann. Eugen.
(1936)

Y. Bengio et al.

Representation learning: a review and new perspectives

IEEE Trans. Pattern Anal. Mach. Intell.

(2013)

R.P. Lippmann

An introduction to computing with neural nets

ASSP Mag. IEEE

(1987)

D.E. Rumelhart et al.

Learning internal representations by error propagation

Technical Report

(1985)

P. Vincent et al.

Extracting and composing robust features with denoising autoencoders

Proceedings of the 25th international conference on Machine learning

(2008)

YanS. et al.

Graph embedding and extensions: a general framework for dimensionality reduction

IEEE Trans. Pattern Anal. Mach. Intell.

(2007)

F. Schroff et al.

Facenet: a unified embedding for face recognition and clustering

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

(2015)

M. Kan et al.

Stacked progressive auto-encoders (SPAE) for face recognition across poses

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

(2014)

J. Snoek et al.

On nonparametric guidance for learning autoencoder representations.

Proceedings of the International Conference on Artificial Intelligence and Statistics AISTATS

(2012)

S. Rifai et al.

Disentangling factors of variation for facial expression recognition

Computer Vision–ECCV 2012

(2012)

J.T. Rolfe et al.

Discriminative recurrent sparse auto-encoders

International Conference on Learning Representations (ICLR)

(2013)

Cited by (24)

Autoencoder-driven spiral representation learning for gravitational wave surrogate modelling
2022, Neurocomputing
Citation Excerpt :
And secondly, the addition of the learnable spiral module to the neural networks, which, as demonstrated in Section 4, leads to surrogate waveforms with better mismatches to the ground truth waveforms, while being faster than equivalent architectures, which do not use the module. Due to their ability to extract semantically meaningful representations without the use of labels, AEs have been widely studied for a variety of tasks, including clustering [51,44], classification [42,52], and image retrieval [53,54]. Recently, artificial neural networks have been gaining momentum in the field of gravitational wave astronomy, and specifically in surrogate modelling of fiducial waveform models.
Recently, artificial neural networks have been gaining momentum in the field of gravitational wave astronomy, for example in surrogate modelling of computationally expensive waveform models for binary black hole inspiral and merger. Surrogate modelling yields fast and accurate approximations of gravitational waves and neural networks have been used in the final step of interpolating the coefficients of the surrogate model for arbitrary waveforms outside the training sample. We investigate the existence of underlying structures in the empirical interpolation coefficients using autoencoders. We demonstrate that when the coefficient space is compressed to only two dimensions, a spiral structure appears, wherein the spiral angle is linearly related to the mass ratio. Based on this finding, we design a spiral module with learnable parameters, that is used as the first layer in a neural network, which learns to map the input space to the coefficients. The spiral module is evaluated on multiple neural network architectures and consistently achieves better speed-accuracy trade-off than baseline models. A thorough experimental study is conducted and the final result is a surrogate model which can evaluate millions of input parameters in a single forward pass in under 1 ms on a desktop GPU, while the mismatch between the corresponding generated waveforms and the ground-truth waveforms is better than the compared baseline methods. We anticipate the existence of analogous underlying structures and corresponding computational gains also in the case of spinning black hole binaries.
Representation learning and retrieval
2022, Deep Learning for Robot Perception and Cognition
Recent advances in Deep Learning (DL), which inherently incorporate representation learning, have altered the landscape of representation learning tasks, such as image retrieval. This chapter aims to provide an introduction to recent deep representation learning methods, with special emphasis to the content based image retrieval task that will equip the reader with the necessary knowledge to apply these methods in practice. Particularly, we first present how a well-known DL method, known as autoencoders, can be used for representation learning tasks. Subsequently, we present a set of representative representation learning methods that aim to learn feature spaces that produce efficient retrieval-oriented representations, exploiting the geometric structure of the data in an unsupervised manner, as well as discriminative representations that exploit the class category of the data or the user's feedback using relevance feedback. Finally, we present a representation learning method suitable for retrieving objects that belong to classes that were not seen during the training process.
Deep autoencoders for attribute preserving face de-identification
2020, Signal Processing: Image Communication
Citation Excerpt :
In contrast to the above methods, in this work Autoencoders [22] are used, and it is shown experimentally that they are tools capable of achieving good de-identification performance, while producing high quality reconstructions. Deep autoencoders have recently been proposed to solve tasks such as one-shot face recognition [23], unconstrained face recognition [24,25], face alignment [26] and face hallucination [27] among others. The features extracted by deep AEs have been shown time and time again to be significantly useful to such tasks.
The mass availability of mobile devices equipped with cameras has lead to increased public privacy concerns in recent years. Face de-identification is a necessary first step towards anonymity preservation, and can be trivially solved by blurring or concealing detected faces. However, such naive privacy protection methods are both ineffective and unsatisfying, producing a visually unpleasant result. In this paper, we tackle face de-identification using Deep Autoencoders, by finetuning the encoder to perform face de-identification. We present various methods to finetune the encoder in both a supervised and unsupervised fashion to preserve facial attributes, while generating new faces which are both visually and quantitatively different from the original ones. Furthermore, we quantify the realism and naturalness of the resulting faces by introducing a diversity metric to measure the distinctiveness of the new faces. Experimental results show that the proposed methods can generate new faces with different person identity labels, while maintaining the facelike nature and diversity of the input face images.
Discriminative clustering using regularized subspace learning
2019, Pattern Recognition
Citation Excerpt :
The ability of the proposed method to provide useful representations for other unsupervised tasks, e.g., information retrieval [4,72], or data visualization [55], can be examined. Also, the optimization objective used in this paper can be also combined with other deep clustering architectures, e.g., [73,74], to allow for learning deep regularized models for clustering tasks. Finally, the regularization parameters aintra and αinter can be adaptively chosen based on the confidence/probability for each sample to belong to each cluster, possibly further improving the clustering performance.
Clustering is among the most important unsupervised learning tasks with several applications in a wide range of domains. Discriminative Clustering (DC) techniques combine the unsupervised nature of clustering with the high discriminative ability of supervised subspace learning methods by simultaneously performing clustering and learning a representation that encourages the separability of the clusters. However, in contrast with classical supervised learning tasks, where the labels are usually correct, cluster assignments are inherently noisy leading to suboptimal results when used as ground truth information with highly discriminative methods. To this end, a novel similarity-based subspace learning method, that allows for learning regularized clustering-oriented representations, avoiding the pitfalls of highly discriminative methods, such as Linear Discriminant Analysis (LDA), is proposed in this paper. The ability of the proposed method to improve the quality of the obtained clustering solutions is demonstrated using extensive experiments on four datasets.
Manifold regularized stacked denoising autoencoders with feature selection
2019, Neurocomputing
Citation Excerpt :
Feng et al. [36] proposed a new method for graph and autoencoder-based feature selection (GAFS). Nousi and Tefas [37] developed a new type of discriminant autoencoder, in which the targets are shifted in space in a discriminative fashion. Zhang et al. [38] proposed a new feature learning method called local deep-feature alignment (LDFA) for dimension reduction and feature learning.
This paper proposes a new stacked denoising autoencoders (SDAE), called manifold regularized SDAE (MRSDAE) based on particle swarm optimization (PSO), where manifold regularization and feature selection are embedded in the deep network. This study concentrates on using PSO to simultaneously optimize structure and parameters of SDAEs through a specific particle representation and learning method. MRSDAE aims to generate discriminant features from the data based on the integration of these effective techniques, i.e., structure and parameter optimization, manifold regularization and feature selection. The experimental results on a number of benchmark classification datasets demonstrate that MRSDAE can construct compact SDAEs with high generalization performance. Finding from this study can be used as effective guideline in learning both the structure and parameters of deep neural networks (DNNs) with manifold regularization and feature selection techniques.
Fast Deep Convolutional Face Detection in the Wild Exploiting Hard Sample Mining
2018, Big Data Research
Citation Excerpt :
The model consists of three convolutional layers each followed by a max pooling layer and the last pooling layer is followed by a fully connected layer, whose output comprises the input of the three aforementioned branches. Amongst other deep learning strategies, Denoising Autoencoders [49] are comprised of fully connected layers and they are typically used for unsupervised pretraining as well as dimensionality reduction, leading to more compact and robust representations of data. Face detection is a supervised visual learning task where convolutional deep approaches dominate in terms of performance and speed.
Face detection constitutes a key visual information analysis task in Machine Learning. The rise of Big Data has resulted in the accumulation of a massive volume of visual data which requires proper and fast analysis. Deep Learning methods are powerful approaches towards this task as training with large amounts of data exhibiting high variability has been shown to significantly enhance their effectiveness, but often requires expensive computations and leads to models of high complexity. When the objective is to analyze visual content in massive datasets, the complexity of the model becomes crucial to the success of the model. In this paper, a lightweight deep Convolutional Neural Network (CNN) is introduced for the purpose of face detection, designed with a view to minimize training and testing time, and outperforms previously published deep convolutional networks in this task, in terms of both effectiveness and efficiency. To train this lightweight deep network without compromising its efficiency, a new training method of progressive positive and hard negative sample mining is introduced and shown to drastically improve training speed and accuracy. Additionally, a separate deep network was trained to detect individual facial features and a model that combines the outputs of the two networks was created and evaluated. Both methods are capable of detecting faces under severe occlusion and unconstrained pose variation and meet the difficulties of large scale real-world, real-time face detection, and are suitable for deployment even in mobile environments such as Unmanned Aerial Vehicles (UAVs).

View all citing articles on Scopus

Anastasios Tefas received the B.Sc. in informatics in 1997 and the Ph.D. degree in informatics in 2002, both from the Aristotle University of Thessaloniki, Greece. Since 2017 he has been an Associate Professor at the Department of Informatics, Aristotle University of Thessaloniki. From 2008 to 2017, he was a Lecturer, Assistant Professor at the same University. From 2006 to 2008, he was an Assistant Professor at the Department of Information Management, Technological Institute of Kavala. From 2003 to 2004, he was a temporary lecturer in the Department of Informatics, University of Thessaloniki. From 1997 to 2002, he was a researcher and teaching assistant in the Department of Informatics, University of Thessaloniki. Dr. Tefas participated in 15 research projects financed by national and European funds. He has co-authored 69 journal papers, 177 papers in international conferences and contributed 8 chapters to edited books in his area of expertise. Over 3730 citations have been recorded to his publications and his H-index is 32 according to Google scholar. His current research interests include computational intelligence, pattern recognition, statistical machine learning, digital signal and image analysis and retrieval and computer vision.

View full text

Deep learning algorithms for discriminant autoencoding

Abstract

Introduction

Section snippets

Autoencoders

Discriminant autoencoders

Experimental results

Conclusion

Recent advances of large-scale linear classification

Proc. IEEE

Principles of Neurodynamics

Perceptrons -Expanded Edition: An Introduction to Computational Geometry

Support-vector networks

Mach. Learn.

Advances in Kernel Methods: Support Vector Learning

Nonlinear component analysis as a kernel eigenvalue problem

Neural comput.

Learning deep architectures for AI

Found. Trends® Mach. Learn.

Reducing the dimensionality of data with neural networks

Science

The use of multiple measurements in taxonomic problems

Ann. Eugen.

Representation learning: a review and new perspectives

IEEE Trans. Pattern Anal. Mach. Intell.

An introduction to computing with neural nets

ASSP Mag. IEEE

Learning internal representations by error propagation

Technical Report

Extracting and composing robust features with denoising autoencoders

Proceedings of the 25th international conference on Machine learning

Graph embedding and extensions: a general framework for dimensionality reduction

IEEE Trans. Pattern Anal. Mach. Intell.

Facenet: a unified embedding for face recognition and clustering

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Stacked progressive auto-encoders (SPAE) for face recognition across poses

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

On nonparametric guidance for learning autoencoder representations.

Proceedings of the International Conference on Artificial Intelligence and Statistics AISTATS

Disentangling factors of variation for facial expression recognition

Computer Vision–ECCV 2012

Discriminative recurrent sparse auto-encoders

International Conference on Learning Representations (ICLR)