Abstract
Many Data Mining tasks deal with data which are presented in high dimensional spaces, and the ‘curse of dimensionality’ phenomena is often an obstacle to the use of many methods for solving these tasks. To avoid these phenomena, various Representation learning algorithms are used as a first key step in solutions of these tasks to transform the original high-dimensional data into their lower-dimensional representations so that as much information about the original data required for the considered Data Mining task is preserved as possible. The above Representation learning problems are formulated as various Dimensionality Reduction problems (Sample Embedding, Data Manifold embedding, Manifold Learning and newly proposed Tangent Bundle Manifold Learning) which are motivated by various Data Mining tasks. A new geometrically motivated algorithm that solves the Tangent Bundle Manifold Learning and gives new solutions for all the considered Dimensionality Reduction problems is presented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bengio, Y., Courville, A., Vincent, P.: Representation Learning: A Review and New Perspectives. arXiv preprint: arXiv:1206.5538v2, 1–64 (2012)
Bunte, K., Biehl, M., Hammer, B.: Dimensionality reduction mappings. In: IEEE Symposium Series in Computational Intelligence (SSCI) 2011 - Computational Intelligence and Data Mining (CIDM), pp. 349–356. IEEE, Paris (2011)
Cox, T.F., Cox, M.A.A.: Multidimensional Scaling. Chapman and Hall/CRC, London (2001)
Jollie, T.: Principal Component Analysis. Springer, New-York (2002)
Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation 15, 1373–1396 (2003)
Hecht-Nielsen, R.: Replicator neural networks for universal optimal source coding. Science 269, 1860–1863 (1995)
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Kramer, M.: Nonlinear Principal Component Analysis using autoassociative neural networks. AIChE Journal 37(2), 233–243 (1991)
DeMers, D., Cottrell, G.W.: Nonlinear dimensionality reduction. In: Hanson, D., Cowan, J., Giles, L. (eds.) Advances in Neural Information Processing Systems, vol. 5, pp. 580–587. Morgan Kaufmann, San Mateo (1993)
Kohonen, T.: Self-organizing Maps, 3rd edn. Springer (2000)
Martinetz, T., Schulten, K.: Topology representing networks. Neural Networks 7, 507–523 (1994)
Lafon, S., Lee, A.B.: Diffusion Maps and Coarse-Graining: A Unified Framework for Dimensionality Reduction, Graph Partitioning and Data Set Parameterization. IEEE Transaction on Pattern Analysis and Machine Intelligence 28(9), 1393–1403 (2006)
Schölkopf, B., Smola, A., Műller, K.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 10(5), 1299–1319 (1998)
Saul, L.K., Roweis, S.T.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)
Donoho, D.L., Grimes, C.: Hessian eigenmaps: New locally linear embedding techniques for high-dimensional data. Proceedings of the National Academy of Arts and Sciences 100, 5591–5596 (2003)
Tehenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)
Weinberger, K.Q., Saul, L.K.: Maximum Variance Unfolding: Unsupervized Learning of Image Manifolds by Semidefinite Programming. International Journal of Computer Vision 70(1), 77–90 (2006)
Brand, M.: Charting a manifold. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems, vol. 15, pp. 961–968. MIT Press, Cambridge (2003)
Zhang, Z., Zha, H.: Principal Manifolds and Nonlinear Dimension Reduction via Local Tangent Space Alignment. SIAM Journal on Scientific Computing 26(1), 313–338 (2005)
Bengio, Y., Delalleau, O., Le Roux, N., Paiement, J.-F., Vincent, P., Ouimet, M.: Learning Eigenfunctions Link Spectral Embedding and Kernel PCA. Neural Computation 16(10), 2197–2219 (2004)
Bengio, Y., Delalleau, O., Le Roux, N., Paiement, J.-F., Vincent, P., Ouimet, M.: Out-of-sample extension for LLE, Isomap, MDS, Eigenmaps, and spectral clustering. In: Thrun, S., Saul, L., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems, vol. 16, pp. 177–184. MIT Press, Cambridge (2004)
Saul, L.K., Roweis, S.T.: Think globally, fit locally: unsupervised learning of low dimensional manifolds. Journal of Machine Learning Research 4, 119–155 (2003)
Saul, L.K., Weinberger, K.Q., Ham, J.H., Sha, F., Lee, D.D.: Spectral methods for dimensionality reduction. In: Chapelle, O., Schölkopf, B., Zien, A. (eds.) Semisupervised Learning, pp. 293–308. MIT Press, Cambridge (2006)
Burges, C.J.C.: Dimension Reduction: A Guided Tour. Foundations and Trends in Machine Learning 2(4), 275–365 (2010)
Gisbrecht, A., Lueks, W., Mokbel, B., Hammer, B.: Out-of-Sample Kernel Extensions for Nonparametric Dimensionality Reduction. In: Proceedings of European Symposium on Artificial Neural Networks, ESANN 2012. Computational Intelligence and Machine Learning, pp. 531–536. Bruges, Belgium (2012)
Strange, H., Zwiggelaar, R.: A Generalised Solution to the Out-of-Sample Extension Problem in Manifold Learning. In: Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, San Francisco, California, USA, pp. 471–478. AAAI Press, Menlo Park (2011)
Cayton, L.: Algorithms for manifold learning. Univ of California at San Diego (UCSD), Technical Report CS2008-0923, pp. 541–555. Citeseer (2005)
Huo, X., Ni, X., Smith, A.K.: Survey of Manifold-based Learning Methods. In: Liao, T.W., Triantaphyllou, E. (eds.) Recent Advances in Data Mining of Enterprise Data, pp. 691–745. World Scientific, Singapore (2007)
Izenman, A.J.: Introduction to manifold learning. Computational Statistics 4(5), 439–446 (2012)
Ma, Y., Fu, Y. (eds.): Manifold Learning Theory and Applications. CRC Press, London (2011)
Narayanan, H., Mitter, S.: Sample complexity of testing the manifold hypothesis. In: Lafferty, J., Williams, C.K.I., Shawe-Taylor, J., Zemel, R., Culotta, A. (eds.) Advances in Neural Information Processing Systems, vol. 23, pp. 1786–1794. MIT Press, Cambridge (2010)
Rifai, S., Dauphin, Y.N., Vincent, P., Bengio, Y., Muller, X.: The manifold Tangent Classifier. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P., Pereira, F., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 24, pp. 2294–2302. MIT Press, Cambridge (2011)
Chen, J., Deng, S.-J., Huo, X.: Electricity price curve modeling and forecasting by manifold learning. IEEE Transaction on Power Systems 23(3), 877–888 (2008)
Song, W., Keane, A.J.: A Study of Shape Parameterisation Methods for Airfoil Optimisation. In: Proceedings of the 10th AIAA / ISSMO Multidisciplinary Analysis and Optimization Conference, AIAA 2004-4482. American Institute of Aeronautics and Astronautics, Albany (2004)
Bernstein, A., Kuleshov, A., Sviridenko, Y., Vyshinsky, V.: Fast Aerodynamic Model for Design Technology. In: Proceedings of West-East High Speed Flow Field Conference, WEHSFF-2007. IMM RAS, Moscow (2007), http://wehsff.imamod.ru/pages/s7.htm
Bernstein, A., Kuleshov, A.: Cognitive technologies in the problem of dimension reduction of geometrical object descriptions. Information Technologies and Computer Systems 2, 6–19 (2008)
Bernstein, A.V., Burnaev, E.V., Chernova, S.S., Zhu, F., Qin, N.: Comparison of Three Geometric Parameterization methods and Their Effect on Aerodynamic Optimization. In: Control with Applications to Industrial and Societal Problems (Eurogen 2011), Capua, Italy, September 14 - 16 (2011)
Lee, J.A., Verleysen, M.: Quality assessment of dimensionality reduction based on k-ary neighborhoods. In: Saeys, Y., Liu, H., Inza, I., Wehenkel, L., Van de Peer, Y. (eds.) JMLR Workshop and Conference Proceedings. New Challenges for Feature Selection in Data Mining and Knowledge Discovery, vol. 4, pp. 21–35. Antwerpen, Belgium (2008)
Lee, J.A., Verleysen, M.: Quality assessment of dimensionality reduction: Rank-based criteria. Neurocomputing 72(7-9), 1431–1443 (2009)
Freedman, D.: Efficient simplicial reconstructions of manifold from their samples. IEEE Transaction on Pattern Analysis and Machine Intelligence 24(10), 1349–1357 (2002)
Karygianni, S., Frossard, P.: Tangent-based manifold approximation with locally linear models. In: arXiv preprint: arXiv:1211.1893v1 [cs.LG] (November 6, 2012)
Golub, G.H., Van Loan, C.F.: Matrix Computation, 3rd edn. Johns Hopkins University Press, MD (1996)
Hotelling, H.: Relations between two sets of variables. Biometrika 28, 321–377 (1936)
James, A.T.: Normal multivariate analysis and the orthogonal group. Ann. Math. Statistics 25, 40–75 (1954)
Wang, L., Wang, X., Feng, J.: Subspace Distance Analysis with Application to Adaptive Bayesian Algorithm for Face Recognition. Pattern Recognition 39(3), 456–464 (2006)
Edelman, A., Arias, T.A., Smith, T.: The Geometry of Algorithms with Orthogonality Constraints. SIAM Journal on Matrix Analysis and Applications 20(2), 303–353 (1999)
Hamm, J., Lee, D.D.: Grassmann Discriminant Analysis: a Unifying View on Subspace-Based Learning. In: Proceedings of the 25th International Conference on Machine Learning (ICML 2008), pp. 376–383 (2008)
Bernstein, A.V., Kuleshov, A.P.: Manifold Learning: generalizing ability and tangent proximity. International Journal of Software and Informatics 7(3), 359–390 (2013)
Kuleshov, A.P., Bernstein, A.V.: Cognitive Technologies in Adaptive Models of Complex Plants. Information Control Problems in Manufacturing 13(1), 1441–1452 (2009)
Lee, J.M.: Manifolds and Differential Geometry. Graduate Studies in Mathematics, vol. 107. American Mathematical Society, Providence (2009)
Lee, J.M.: Introduction to Smooth Manifolds. Springer, New York (2003)
Rifai, S., Vincent, P., Muller, X., Glorot, X., Bengio, Y.: Contractive Auto-Encoders: Explicit Invariance during Feature Extraction. In: Getoor, L., Scheffer, T. (eds.) Proceedings of the 28th International Conference on Machine Learning (ICML 2011), pp. 833–840. Omnipress, Bellevue (2011)
Silva, J.G., Marques, J.S., Lemos, J.M.: A Geometric approach to motion tracking in manifolds. In: Paul, M.J., Van Den Hof, B.W., Weiland, S. (eds.) A Proceedings Volume from the 13th IFAC Symposium on System Identification, Rotterdam (2003)
Silva, J.G., Marques, J.S., Lemos, J.M.: Non-linear dimension reduction with tangent bundle approximation. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2005), vol. 4, pp. 85–88. Conference Publications (2005)
Silva, J.G., Marques, J.S., Lemos, J.M.: Selecting Landmark Points for Sparse Manifold Learning. In: Weiss, Y., Schölkopf, B., Platt, J. (eds.) Advances in Neural Information Processing Systems, vol. 18. MIT Press, Cambridge (2006)
Bernstein, A.V., Kuleshov, A.P.: Tangent Bundle Manifold Learning via Grassmann & Stiefel Eigenmaps. arXiv preprint: arXiv:1212.6031v1 [cs.LG], pp. 1–25 (December 2012)
Achlioptas, D.: Random matrices in data analysis. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 1–7. Springer, Heidelberg (2004)
Tyagi, H., Vural, E., Frossard, P.: Tangent space estimation for smooth embeddings of riemannian manifold. arXiv preprint: arXiv:1208.1065v2 [stat.CO], pp. 1–35 (May 17, 2013)
Singer, A., Wu, H.: Vector Diffusion Maps and the Connection Laplacian. Comm. on Pure and App. Math. (2012)
Coifman, R.R., Lafon, S., Lee, A.B., Maggioni, M., Warner, F., Zucker, S.: Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps. Proceedings of the National Academy of Sciences, 7426–7431 (2005)
Wolf, L., Shashua, A.: Learning over sets using kernel principal angles. J. Mach. Learn. Res. 4, 913–931 (2003)
Kuleshov, A., Bernstein, A.: Yanovich, Yu.: Asymptotically optimal method in Manifold estimation. In: Márkus, L., Prokaj, V. (eds.) Abstracts of the XXIX-th European Meeting of Statisticians, Budapest, Hungary, July 20-25, p. 325 (2013), http://ems2013.eu/conf/upload/BEK086_006.pdf
Genovese, C.R., Perone-Pacifico, M., Verdinelli, I., Wasserman, L.: Minimax Manifold Estimation. Journal Machine Learning Research 13, 1263–1291 (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Kuleshov, A., Bernstein, A. (2014). Manifold Learning in Data Mining Tasks. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2014. Lecture Notes in Computer Science(), vol 8556. Springer, Cham. https://doi.org/10.1007/978-3-319-08979-9_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-08979-9_10
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08978-2
Online ISBN: 978-3-319-08979-9
eBook Packages: Computer ScienceComputer Science (R0)