Skip to main content

Hybrid Manifold Clustering with Evolutionary Tuning

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9028))

Abstract

Manifold clustering, also known as submanifold learning, is the task to embed patterns in submanifolds with different characteristics. This paper proposes a hybrid approach of clustering the data set, computing a global map of cluster centers, embedding each cluster, and then merging the scaled submanifolds with the global map. We introduce various instantiations of cluster and embedding algorithms based on hybridization of \(k\)-means, principal component analysis, isometric mapping, and locally linear embedding. A (1\(+\)1)-ES is employed to tune the submanifolds by rotation and scaling. The submanifold learning algorithms are compared w.r.t. the nearest neighbor classification performance on various experimental data sets.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The notation \(\mathcal {M}_j\) is used for cluster \(j\) in data space with center \(\mathbf {m}_j\), while \(\hat{\mathcal {M}}_j\) is the corresponding submanifold in latent space with center \(\hat{\mathbf {m}}_j\).

  2. 2.

    The runtime of ISOMAP is \(O(N^2 \log N)\).

References

  1. Jolliffe, I.: Principal Component Analysis. Springer Series in Statistics. Springer, New York (1986)

    Book  Google Scholar 

  2. Tenenbaum, J.B., Silva, V.D., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)

    Article  Google Scholar 

  3. Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)

    Article  Google Scholar 

  4. Beyer, H.G., Schwefel, H.P.: Evolution strategies - A comprehensive introduction. Natural Computing 1, 3–52 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  5. Vidal, R.: Subspace clustering. IEEE Signal Process Mag. 28, 52–68 (2011)

    Article  Google Scholar 

  6. Costeira, J.P., Kanade, T.: A multibody factorization method for independently moving objects. International Journal of Computer Vision 29, 159–179 (1998)

    Article  Google Scholar 

  7. Gear, C.W.: Multibody grouping from motion images. Int. J. Comput. Vis. 29, 133–150 (1998)

    Article  Google Scholar 

  8. Vidal, R., Ma, Y., Sastry, S.: Generalized principal component analysis (gpca). IEEE Trans. Pattern Anal. Mach. Intell. 27, 1945–1959 (2005)

    Article  Google Scholar 

  9. Kushnir, D., Galun, M., Brandt, A.: Fast multiscale clustering and manifold identification. Pattern Recognit. 39, 1876–1891 (2006)

    Article  MATH  Google Scholar 

  10. Bradley, P.S., Mangasarian, O.L.: k-plane clustering. J. Global Optim. 16, 23–32 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  11. Tseng, P.: Nearest \(q\)-flat to \(m\) points. J. Optim. Theory Appl. 105, 249–252 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  12. Kramer, O.: Fast submanifold learning with unsupervised nearest neighbors. In: Tomassini, M., Antonioni, A., Daolio, F., Buesser, P. (eds.) ICANNGA 2013. LNCS, vol. 7824, pp. 317–325. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  13. Kramer, O.: Dimensionalty reduction by unsupervised nearest neighbor regression. In: International Conference on Machine Learning and Applications (ICMLA), pp. 275–278. IEEE (2011)

    Google Scholar 

  14. Tipping, M.E., Bishop, C.M.: Mixtures of probabilistic principal component analysers. Neural Computation 11, 443–482 (1999)

    Article  Google Scholar 

  15. Nourashrafeddin, S., Arnold, D., Milios, E.E.: An evolutionary subspace clustering algorithm for high-dimensional data. In: Proceedings of the Annual Conference on Genetic and Evolutionary Computation (GECCO), pp. 1497–1498 (2012)

    Google Scholar 

  16. Vahdat, A., Heywood, M.I., Zincir-Heywood, A.N.: Bottom-up evolutionary subspace clustering. In: IEEE Congress on Evolutionary Computation, pp. 1–8 (2010)

    Google Scholar 

  17. von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17, 1–24 (2007)

    Article  MathSciNet  Google Scholar 

  18. Rechenberg, I.: Cybernetic Solution Path of an Experimental Problem. Ministry of Aviation, Royal Aircraft Establishment, Farnborough (1965)

    Google Scholar 

  19. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MATH  MathSciNet  Google Scholar 

  20. Hull, J.: A database for handwritten text recognition research. IEEE PAMI 5, 550–554 (1994)

    Article  Google Scholar 

  21. Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical report, pp. 07–49. University of Massachusetts, Amherst (2007)

    Google Scholar 

  22. Friedman, J.H.: Multivariate adaptive regression splines. Ann. Stat. 19, 1–67 (1991)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oliver Kramer .

Editor information

Editors and Affiliations

A Benchmark Problems

A Benchmark Problems

MakeClass is a classification data set generated with the scikit-learn [19] method make_classification with \(d\) dimensions and two centers. The UCI Digits data set [20] comprises handwritten digits with \(d=64\). It is a frequent reference problem related to the recognition of handwritten characters and digits. The Faces data set is called Labeled Faces in the Wild [21] and has been introduced for studying the face recognition problem. The data set source is http://vis-www.cs.umass.edu/lfw/. It contains JPEG images of famous people collected from the internet. The Gaussian blobs data set is generated with the scikit-learn [19] method make_blobs and the following settings. Two centers, i.e., two classes are generated, each with a standard deviation of \(\sigma = 10.0\) and variable \(d\). Friedman 1 is a regression data set generated with the scikit-learn [19] method make_friedman1. The regression problem has been introduced in [22], where Friedman introduces multivariate adaptive regression splines. Friedman 2 is also a regression data set of scikit-learn [19] and can be generated with make_friedman2. The wind data set is based on spatio-temporal time series data from the National Renewable Energy Laboratory (NREL) western wind data set. The whole data set comprises time series of 32,043 wind turbines, each holding ten \(3\) MW turbines over a timespan of three years in a \(10\)-minute resolution. The dimensionality is \(d=22\). Fitness is data set based on an optimization run of a (15+100)-ES [4] on the Sphere function \(f(\mathbf {z}) = \mathbf {z}^T \mathbf {z}\) with \(d=20\) dimensions and 21000 fitness function evaluations. The patterns are the objective variable values of the best candidate solution in each generation, the labels are the fitness function values in each generation. The data set Photos contains thirty JPEG photos with resolution \(320 \times 214\) taken with a SONY DSLR-A300. The Iris data sets consists of 150 4-dimensional patterns of three different types of irises.

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Kramer, O. (2015). Hybrid Manifold Clustering with Evolutionary Tuning. In: Mora, A., Squillero, G. (eds) Applications of Evolutionary Computation. EvoApplications 2015. Lecture Notes in Computer Science(), vol 9028. Springer, Cham. https://doi.org/10.1007/978-3-319-16549-3_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16549-3_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16548-6

  • Online ISBN: 978-3-319-16549-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics