Abstract
Manifold clustering, also known as submanifold learning, is the task to embed patterns in submanifolds with different characteristics. This paper proposes a hybrid approach of clustering the data set, computing a global map of cluster centers, embedding each cluster, and then merging the scaled submanifolds with the global map. We introduce various instantiations of cluster and embedding algorithms based on hybridization of \(k\)-means, principal component analysis, isometric mapping, and locally linear embedding. A (1\(+\)1)-ES is employed to tune the submanifolds by rotation and scaling. The submanifold learning algorithms are compared w.r.t. the nearest neighbor classification performance on various experimental data sets.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
The notation \(\mathcal {M}_j\) is used for cluster \(j\) in data space with center \(\mathbf {m}_j\), while \(\hat{\mathcal {M}}_j\) is the corresponding submanifold in latent space with center \(\hat{\mathbf {m}}_j\).
- 2.
The runtime of ISOMAP is \(O(N^2 \log N)\).
References
Jolliffe, I.: Principal Component Analysis. Springer Series in Statistics. Springer, New York (1986)
Tenenbaum, J.B., Silva, V.D., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)
Beyer, H.G., Schwefel, H.P.: Evolution strategies - A comprehensive introduction. Natural Computing 1, 3–52 (2002)
Vidal, R.: Subspace clustering. IEEE Signal Process Mag. 28, 52–68 (2011)
Costeira, J.P., Kanade, T.: A multibody factorization method for independently moving objects. International Journal of Computer Vision 29, 159–179 (1998)
Gear, C.W.: Multibody grouping from motion images. Int. J. Comput. Vis. 29, 133–150 (1998)
Vidal, R., Ma, Y., Sastry, S.: Generalized principal component analysis (gpca). IEEE Trans. Pattern Anal. Mach. Intell. 27, 1945–1959 (2005)
Kushnir, D., Galun, M., Brandt, A.: Fast multiscale clustering and manifold identification. Pattern Recognit. 39, 1876–1891 (2006)
Bradley, P.S., Mangasarian, O.L.: k-plane clustering. J. Global Optim. 16, 23–32 (2000)
Tseng, P.: Nearest \(q\)-flat to \(m\) points. J. Optim. Theory Appl. 105, 249–252 (2000)
Kramer, O.: Fast submanifold learning with unsupervised nearest neighbors. In: Tomassini, M., Antonioni, A., Daolio, F., Buesser, P. (eds.) ICANNGA 2013. LNCS, vol. 7824, pp. 317–325. Springer, Heidelberg (2013)
Kramer, O.: Dimensionalty reduction by unsupervised nearest neighbor regression. In: International Conference on Machine Learning and Applications (ICMLA), pp. 275–278. IEEE (2011)
Tipping, M.E., Bishop, C.M.: Mixtures of probabilistic principal component analysers. Neural Computation 11, 443–482 (1999)
Nourashrafeddin, S., Arnold, D., Milios, E.E.: An evolutionary subspace clustering algorithm for high-dimensional data. In: Proceedings of the Annual Conference on Genetic and Evolutionary Computation (GECCO), pp. 1497–1498 (2012)
Vahdat, A., Heywood, M.I., Zincir-Heywood, A.N.: Bottom-up evolutionary subspace clustering. In: IEEE Congress on Evolutionary Computation, pp. 1–8 (2010)
von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17, 1–24 (2007)
Rechenberg, I.: Cybernetic Solution Path of an Experimental Problem. Ministry of Aviation, Royal Aircraft Establishment, Farnborough (1965)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Hull, J.: A database for handwritten text recognition research. IEEE PAMI 5, 550–554 (1994)
Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical report, pp. 07–49. University of Massachusetts, Amherst (2007)
Friedman, J.H.: Multivariate adaptive regression splines. Ann. Stat. 19, 1–67 (1991)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Benchmark Problems
A Benchmark Problems
MakeClass is a classification data set generated with the scikit-learn [19] method make_classification with \(d\) dimensions and two centers. The UCI Digits data set [20] comprises handwritten digits with \(d=64\). It is a frequent reference problem related to the recognition of handwritten characters and digits. The Faces data set is called Labeled Faces in the Wild [21] and has been introduced for studying the face recognition problem. The data set source is http://vis-www.cs.umass.edu/lfw/. It contains JPEG images of famous people collected from the internet. The Gaussian blobs data set is generated with the scikit-learn [19] method make_blobs and the following settings. Two centers, i.e., two classes are generated, each with a standard deviation of \(\sigma = 10.0\) and variable \(d\). Friedman 1 is a regression data set generated with the scikit-learn [19] method make_friedman1. The regression problem has been introduced in [22], where Friedman introduces multivariate adaptive regression splines. Friedman 2 is also a regression data set of scikit-learn [19] and can be generated with make_friedman2. The wind data set is based on spatio-temporal time series data from the National Renewable Energy Laboratory (NREL) western wind data set. The whole data set comprises time series of 32,043 wind turbines, each holding ten \(3\) MW turbines over a timespan of three years in a \(10\)-minute resolution. The dimensionality is \(d=22\). Fitness is data set based on an optimization run of a (15+100)-ES [4] on the Sphere function \(f(\mathbf {z}) = \mathbf {z}^T \mathbf {z}\) with \(d=20\) dimensions and 21000 fitness function evaluations. The patterns are the objective variable values of the best candidate solution in each generation, the labels are the fitness function values in each generation. The data set Photos contains thirty JPEG photos with resolution \(320 \times 214\) taken with a SONY DSLR-A300. The Iris data sets consists of 150 4-dimensional patterns of three different types of irises.
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Kramer, O. (2015). Hybrid Manifold Clustering with Evolutionary Tuning. In: Mora, A., Squillero, G. (eds) Applications of Evolutionary Computation. EvoApplications 2015. Lecture Notes in Computer Science(), vol 9028. Springer, Cham. https://doi.org/10.1007/978-3-319-16549-3_39
Download citation
DOI: https://doi.org/10.1007/978-3-319-16549-3_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16548-6
Online ISBN: 978-3-319-16549-3
eBook Packages: Computer ScienceComputer Science (R0)