Abstract:
Clustering is one of the most fundamental unsupervised tasks in machine learning and is elementary in the exploration of high volume data. Recent works propose using deep...Show MoreMetadata
Abstract:
Clustering is one of the most fundamental unsupervised tasks in machine learning and is elementary in the exploration of high volume data. Recent works propose using deep neural networks for clustering, owing to their ability to learn powerful representations of the data. In this work, we present a novel clustering approach using deep neural networks that simultaneously learns feature representations and embeddings suitable for clustering by encouraging separation of natural clusters in the embedding space. More specifically, an autoencoder is employed to learn representations of the data. Then a mapping from autoencoder representation space to an embedding space is learned using a deep neural network we call Representation Network. This neural network promotes separation between natural clusters by minimizing cross-entropy between two probability distributions that denote pairwise similarity in autoencoder latent space and representation network's embedding space. The resultant optimization problem can be solved effectively by jointly training the autoencoder and the representation network using minibatch stochastic gradient descent and backpropagation. Ultimately we obtain a K-Means friendly embedding space. Experimental results show that despite being a simple model, the proposed approach outperforms a broad range of recent approaches on Reuters dataset, other autoencoder based models on MNIST dataset and produces consistently good results that are very competitive with other complex and hybrid models.
Date of Conference: 10-13 December 2018
Date Added to IEEE Xplore: 24 January 2019
ISBN Information: