Abstract
In many scientific disciplines structures in high-dimensional data have to be detected, e.g., in stellar spectra, genome data, or in face recognition tasks. In this work we present an approach to non-linear dimensionality reduction based on fitting nearest neighbor regression to the unsupervised regression framework for learning low-dimensional manifolds. The problem of optimizing latent neighborhoods is difficult to solve, but the unsupervised nearest neighbor (UNN) formulation allows an efficient strategy of iteratively embedding latent points to discrete neighborhood topologies. The choice of an appropriate loss function is relevant, in particular for noisy, and high-dimensional data spaces. We extend UNN by the ε-insensitive loss, which allows to ignore small residuals under a defined threshold. Furthermore, we introduce techniques to handle incomplete data. Experimental analyses on various artificial and real-world test problems demonstrates the performance of the approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Sdss 2011, sloan digital sky survey (2011), http://www.sdss.org
Baillard, A., Bertin, E., de Lapparent, V., Fouqué, P., Arnouts, S., Mellier, Y., Pelló, R., Leborgne, J.-F., Prugniel, P., Markarov, D., Makarova, L., McCracken, H.J., Bijaoui, A., Tasca, L.: Galaxy morphology without classification: Self-organizing maps, 532, A74, 1103.5734 (2011)
Bhatia, N., Vandana: Survey of nearest neighbor techniques. CoRR, abs/1007.0085 (2010)
Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer (2007)
Carreira-Perpiñán, M.Á., Lu, Z.: Parametric dimensionality reduction by unsupervised regression. In: Computer Vision and Pattern Recognition (CVPR), pp. 1895–1902 (2010)
Chechik, G., Heitz, G., Elidan, G., Abbeel, P., Koller, D.: Max-margin classification of data with absent features. Journal of Machine Learning Research 9, 1–21 (2008)
Dick, U., Haider, P., Scheffer, T.: Learning from incomplete data with infinite imputations. In: International Conference on Machine Learning (ICML), pp. 232–239 (2008)
Fix, E., Hodges, J.: Discriminatory analysis, nonparametric discrimination: Consistency properties, vol. 4 (1951)
Ghahramani, Z., Jordan, M.I.: Supervised learning from incomplete data via an em approach. In: Advances in Neuronal Information Processing (NIPS), pp. 120–127 (1993)
Gieseke, F., Polsterer, K.L., Thom, A., Zinn, P., Bomanns, D., Dettmar, R.-J., Kramer, O., Vahrenhold, J.: Detecting quasars in large-scale astronomical surveys. In: ICMLA, pp. 352–357 (2010)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, Berlin (2009)
Hastie, Y., Stuetzle, W.: Principal curves. Journal of the American Statistical Association 85(406), 502–516 (1989)
Hull, J.: A database for handwritten text recognition research. IEEE PAMI 5(16), 550–554 (1994)
Jolliffe, I.: Principal component analysis. Springer series in statistics. Springer, New York (1986)
Kitchin, C.: Galaxies in Turmoil – The Active and Starburst Galaxies and the Black Holes That Drive Them. Springer, New York (2007)
Klanke, S., Ritter, H.: Variants of unsupervised kernel regression: General cost functions. Neurocomputing 70(7-9), 1289–1303 (2007)
Kramer, O.: Dimensionalty reduction by unsupervised nearest neighbor regression. In: Proceedings of the 10th International Conference on Machine Learning and Applications (ICMLA), pp. 275–278. IEEE Press (2011)
Kramer, O.: On unsupervised nearest-neighbor regression and robust loss functions. In: International Conference on Artificial Intelligence, pp. 164–170 (2012)
Lawrence, N.D.: Probabilistic non-linear principal component analysis with gaussian process latent variable models. Journal of Machine Learning Research 6, 1783–1816 (2005)
Meinicke, P.: Unsupervised Learning in a Generalized Regression Framework. PhD thesis, University of Bielefeld (2000)
Meinicke, P., Klanke, S., Memisevic, R., Ritter, H.: Principal surfaces from unsupervised kernel regression. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(9), 1379–1391 (2005)
Pearson, K.: On lines and planes of closest fit to systems of points in space. Philosophical Magazine 2(6), 559–572 (1901)
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)
Schafer, J.L., Graham, J.W.: Missing data: Our view of the state of the art. Psychological Methods 7(2), 147–177 (2002)
Schölkopf, B., Smola, A., Müller, K.-R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 10(5), 1299–1319 (1998)
Smola, A.J., Mika, S., Schölkopf, B., Williamson, R.C.: Regularized principal manifolds. Journal on Machine Learning Research 1, 179–209 (2001)
Tan, S., Mavrovouniotis, M.: Reducing data dimensionality through optimizing neural network inputs. AIChE Journal 41(6), 1471–1479 (1995)
Tenenbaum, J.B., Silva, V.D., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)
Williams, D., Liao, X., Xue, Y., Carin, L., Krishnapuram, B.: On classification with incomplete data. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(3), 427–436 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kramer, O. (2013). Sorting High-Dimensional Patterns with Unsupervised Nearest Neighbors. In: Filipe, J., Fred, A. (eds) Agents and Artificial Intelligence. ICAART 2012. Communications in Computer and Information Science, vol 358. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36907-0_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-36907-0_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36906-3
Online ISBN: 978-3-642-36907-0
eBook Packages: Computer ScienceComputer Science (R0)