Hybrid Manifold Clustering with Evolutionary Tuning

Kramer, Oliver

doi:10.1007/978-3-319-16549-3_39

Hybrid Manifold Clustering with Evolutionary Tuning

Oliver Kramer¹⁵

Conference paper
First Online: 01 January 2015

1784 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9028))

Abstract

Manifold clustering, also known as submanifold learning, is the task to embed patterns in submanifolds with different characteristics. This paper proposes a hybrid approach of clustering the data set, computing a global map of cluster centers, embedding each cluster, and then merging the scaled submanifolds with the global map. We introduce various instantiations of cluster and embedding algorithms based on hybridization of \(k\)-means, principal component analysis, isometric mapping, and locally linear embedding. A (1\(+\)1)-ES is employed to tune the submanifolds by rotation and scaling. The submanifold learning algorithms are compared w.r.t. the nearest neighbor classification performance on various experimental data sets.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
The notation \(\mathcal {M}_j\) is used for cluster \(j\) in data space with center \(\mathbf {m}_j\), while \(\hat{\mathcal {M}}_j\) is the corresponding submanifold in latent space with center \(\hat{\mathbf {m}}_j\).
2.
The runtime of ISOMAP is \(O(N^2 \log N)\).

References

Jolliffe, I.: Principal Component Analysis. Springer Series in Statistics. Springer, New York (1986)
Book Google Scholar
Tenenbaum, J.B., Silva, V.D., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)
Article Google Scholar
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)
Article Google Scholar
Beyer, H.G., Schwefel, H.P.: Evolution strategies - A comprehensive introduction. Natural Computing 1, 3–52 (2002)
Article MATH MathSciNet Google Scholar
Vidal, R.: Subspace clustering. IEEE Signal Process Mag. 28, 52–68 (2011)
Article Google Scholar
Costeira, J.P., Kanade, T.: A multibody factorization method for independently moving objects. International Journal of Computer Vision 29, 159–179 (1998)
Article Google Scholar
Gear, C.W.: Multibody grouping from motion images. Int. J. Comput. Vis. 29, 133–150 (1998)
Article Google Scholar
Vidal, R., Ma, Y., Sastry, S.: Generalized principal component analysis (gpca). IEEE Trans. Pattern Anal. Mach. Intell. 27, 1945–1959 (2005)
Article Google Scholar
Kushnir, D., Galun, M., Brandt, A.: Fast multiscale clustering and manifold identification. Pattern Recognit. 39, 1876–1891 (2006)
Article MATH Google Scholar
Bradley, P.S., Mangasarian, O.L.: k-plane clustering. J. Global Optim. 16, 23–32 (2000)
Article MATH MathSciNet Google Scholar
Tseng, P.: Nearest \(q\)-flat to \(m\) points. J. Optim. Theory Appl. 105, 249–252 (2000)
Article MATH MathSciNet Google Scholar
Kramer, O.: Fast submanifold learning with unsupervised nearest neighbors. In: Tomassini, M., Antonioni, A., Daolio, F., Buesser, P. (eds.) ICANNGA 2013. LNCS, vol. 7824, pp. 317–325. Springer, Heidelberg (2013)
Chapter Google Scholar
Kramer, O.: Dimensionalty reduction by unsupervised nearest neighbor regression. In: International Conference on Machine Learning and Applications (ICMLA), pp. 275–278. IEEE (2011)
Google Scholar
Tipping, M.E., Bishop, C.M.: Mixtures of probabilistic principal component analysers. Neural Computation 11, 443–482 (1999)
Article Google Scholar
Nourashrafeddin, S., Arnold, D., Milios, E.E.: An evolutionary subspace clustering algorithm for high-dimensional data. In: Proceedings of the Annual Conference on Genetic and Evolutionary Computation (GECCO), pp. 1497–1498 (2012)
Google Scholar
Vahdat, A., Heywood, M.I., Zincir-Heywood, A.N.: Bottom-up evolutionary subspace clustering. In: IEEE Congress on Evolutionary Computation, pp. 1–8 (2010)
Google Scholar
von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17, 1–24 (2007)
Article MathSciNet Google Scholar
Rechenberg, I.: Cybernetic Solution Path of an Experimental Problem. Ministry of Aviation, Royal Aircraft Establishment, Farnborough (1965)
Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MATH MathSciNet Google Scholar
Hull, J.: A database for handwritten text recognition research. IEEE PAMI 5, 550–554 (1994)
Article Google Scholar
Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical report, pp. 07–49. University of Massachusetts, Amherst (2007)
Google Scholar
Friedman, J.H.: Multivariate adaptive regression splines. Ann. Stat. 19, 1–67 (1991)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Computational Intelligence Group, Department of Computer Science, University of Oldenburg, Oldenburg, Germany
Oliver Kramer

Authors

Oliver Kramer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Oliver Kramer .

Editor information

Editors and Affiliations

Universidad de Granada, Granada, Spain
Antonio M. Mora
Politecnico di Torino, Turin, Italy
Giovanni Squillero

A Benchmark Problems

MakeClass is a classification data set generated with the scikit-learn [19] method make_classification with \(d\) dimensions and two centers. The UCI Digits data set [20] comprises handwritten digits with \(d=64\). It is a frequent reference problem related to the recognition of handwritten characters and digits. The Faces data set is called Labeled Faces in the Wild [21] and has been introduced for studying the face recognition problem. The data set source is http://vis-www.cs.umass.edu/lfw/. It contains JPEG images of famous people collected from the internet. The Gaussian blobs data set is generated with the scikit-learn [19] method make_blobs and the following settings. Two centers, i.e., two classes are generated, each with a standard deviation of \(\sigma = 10.0\) and variable \(d\). Friedman 1 is a regression data set generated with the scikit-learn [19] method make_friedman1. The regression problem has been introduced in [22], where Friedman introduces multivariate adaptive regression splines. Friedman 2 is also a regression data set of scikit-learn [19] and can be generated with make_friedman2. The wind data set is based on spatio-temporal time series data from the National Renewable Energy Laboratory (NREL) western wind data set. The whole data set comprises time series of 32,043 wind turbines, each holding ten \(3\) MW turbines over a timespan of three years in a \(10\)-minute resolution. The dimensionality is \(d=22\). Fitness is data set based on an optimization run of a (15+100)-ES [4] on the Sphere function \(f(\mathbf {z}) = \mathbf {z}^T \mathbf {z}\) with \(d=20\) dimensions and 21000 fitness function evaluations. The patterns are the objective variable values of the best candidate solution in each generation, the labels are the fitness function values in each generation. The data set Photos contains thirty JPEG photos with resolution \(320 \times 214\) taken with a SONY DSLR-A300. The Iris data sets consists of 150 4-dimensional patterns of three different types of irises.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kramer, O. (2015). Hybrid Manifold Clustering with Evolutionary Tuning. In: Mora, A., Squillero, G. (eds) Applications of Evolutionary Computation. EvoApplications 2015. Lecture Notes in Computer Science(), vol 9028. Springer, Cham. https://doi.org/10.1007/978-3-319-16549-3_39

Download citation

DOI: https://doi.org/10.1007/978-3-319-16549-3_39
Published: 17 March 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16548-6
Online ISBN: 978-3-319-16549-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Abstract

Buying options

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Benchmark Problems

A Benchmark Problems

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation