Sorting High-Dimensional Patterns with Unsupervised Nearest Neighbors

Kramer, Oliver

doi:10.1007/978-3-642-36907-0_17

Oliver Kramer³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 358))

Included in the following conference series:

International Conference on Agents and Artificial Intelligence

834 Accesses

Abstract

In many scientific disciplines structures in high-dimensional data have to be detected, e.g., in stellar spectra, genome data, or in face recognition tasks. In this work we present an approach to non-linear dimensionality reduction based on fitting nearest neighbor regression to the unsupervised regression framework for learning low-dimensional manifolds. The problem of optimizing latent neighborhoods is difficult to solve, but the unsupervised nearest neighbor (UNN) formulation allows an efficient strategy of iteratively embedding latent points to discrete neighborhood topologies. The choice of an appropriate loss function is relevant, in particular for noisy, and high-dimensional data spaces. We extend UNN by the ε-insensitive loss, which allows to ignore small residuals under a defined threshold. Furthermore, we introduce techniques to handle incomplete data. Experimental analyses on various artificial and real-world test problems demonstrates the performance of the approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sdss 2011, sloan digital sky survey (2011), http://www.sdss.org
Baillard, A., Bertin, E., de Lapparent, V., Fouqué, P., Arnouts, S., Mellier, Y., Pelló, R., Leborgne, J.-F., Prugniel, P., Markarov, D., Makarova, L., McCracken, H.J., Bijaoui, A., Tasca, L.: Galaxy morphology without classification: Self-organizing maps, 532, A74, 1103.5734 (2011)
Google Scholar
Bhatia, N., Vandana: Survey of nearest neighbor techniques. CoRR, abs/1007.0085 (2010)
Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer (2007)
Google Scholar
Carreira-Perpiñán, M.Á., Lu, Z.: Parametric dimensionality reduction by unsupervised regression. In: Computer Vision and Pattern Recognition (CVPR), pp. 1895–1902 (2010)
Google Scholar
Chechik, G., Heitz, G., Elidan, G., Abbeel, P., Koller, D.: Max-margin classification of data with absent features. Journal of Machine Learning Research 9, 1–21 (2008)
MATH Google Scholar
Dick, U., Haider, P., Scheffer, T.: Learning from incomplete data with infinite imputations. In: International Conference on Machine Learning (ICML), pp. 232–239 (2008)
Google Scholar
Fix, E., Hodges, J.: Discriminatory analysis, nonparametric discrimination: Consistency properties, vol. 4 (1951)
Google Scholar
Ghahramani, Z., Jordan, M.I.: Supervised learning from incomplete data via an em approach. In: Advances in Neuronal Information Processing (NIPS), pp. 120–127 (1993)
Google Scholar
Gieseke, F., Polsterer, K.L., Thom, A., Zinn, P., Bomanns, D., Dettmar, R.-J., Kramer, O., Vahrenhold, J.: Detecting quasars in large-scale astronomical surveys. In: ICMLA, pp. 352–357 (2010)
Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, Berlin (2009)
Book MATH Google Scholar
Hastie, Y., Stuetzle, W.: Principal curves. Journal of the American Statistical Association 85(406), 502–516 (1989)
Article MathSciNet Google Scholar
Hull, J.: A database for handwritten text recognition research. IEEE PAMI 5(16), 550–554 (1994)
Article Google Scholar
Jolliffe, I.: Principal component analysis. Springer series in statistics. Springer, New York (1986)
Book Google Scholar
Kitchin, C.: Galaxies in Turmoil – The Active and Starburst Galaxies and the Black Holes That Drive Them. Springer, New York (2007)
Google Scholar
Klanke, S., Ritter, H.: Variants of unsupervised kernel regression: General cost functions. Neurocomputing 70(7-9), 1289–1303 (2007)
Article Google Scholar
Kramer, O.: Dimensionalty reduction by unsupervised nearest neighbor regression. In: Proceedings of the 10th International Conference on Machine Learning and Applications (ICMLA), pp. 275–278. IEEE Press (2011)
Google Scholar
Kramer, O.: On unsupervised nearest-neighbor regression and robust loss functions. In: International Conference on Artificial Intelligence, pp. 164–170 (2012)
Google Scholar
Lawrence, N.D.: Probabilistic non-linear principal component analysis with gaussian process latent variable models. Journal of Machine Learning Research 6, 1783–1816 (2005)
MathSciNet MATH Google Scholar
Meinicke, P.: Unsupervised Learning in a Generalized Regression Framework. PhD thesis, University of Bielefeld (2000)
Google Scholar
Meinicke, P., Klanke, S., Memisevic, R., Ritter, H.: Principal surfaces from unsupervised kernel regression. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(9), 1379–1391 (2005)
Article Google Scholar
Pearson, K.: On lines and planes of closest fit to systems of points in space. Philosophical Magazine 2(6), 559–572 (1901)
Google Scholar
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)
Article Google Scholar
Schafer, J.L., Graham, J.W.: Missing data: Our view of the state of the art. Psychological Methods 7(2), 147–177 (2002)
Article Google Scholar
Schölkopf, B., Smola, A., Müller, K.-R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 10(5), 1299–1319 (1998)
Article Google Scholar
Smola, A.J., Mika, S., Schölkopf, B., Williamson, R.C.: Regularized principal manifolds. Journal on Machine Learning Research 1, 179–209 (2001)
MATH Google Scholar
Tan, S., Mavrovouniotis, M.: Reducing data dimensionality through optimizing neural network inputs. AIChE Journal 41(6), 1471–1479 (1995)
Article Google Scholar
Tenenbaum, J.B., Silva, V.D., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)
Article Google Scholar
Williams, D., Liao, X., Xue, Y., Carin, L., Krishnapuram, B.: On classification with incomplete data. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(3), 427–436 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Oldenburg, Uhlhornsweg 84, 26111, Oldenburg, Germany
Oliver Kramer

Authors

Oliver Kramer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

INSTICC and IPS, Estefanilha, Setúbal, Portugal
Joaquim Filipe
IST - Technical University of Lisbon, Av.Rovisco Pais, 1, 1049-001, Lisbon, Portugal
Ana Fred

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kramer, O. (2013). Sorting High-Dimensional Patterns with Unsupervised Nearest Neighbors. In: Filipe, J., Fred, A. (eds) Agents and Artificial Intelligence. ICAART 2012. Communications in Computer and Information Science, vol 358. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36907-0_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-36907-0_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36906-3
Online ISBN: 978-3-642-36907-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics