Skip to main content

Sorting High-Dimensional Patterns with Unsupervised Nearest Neighbors

  • Conference paper
Agents and Artificial Intelligence (ICAART 2012)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 358))

Included in the following conference series:

  • 834 Accesses

Abstract

In many scientific disciplines structures in high-dimensional data have to be detected, e.g., in stellar spectra, genome data, or in face recognition tasks. In this work we present an approach to non-linear dimensionality reduction based on fitting nearest neighbor regression to the unsupervised regression framework for learning low-dimensional manifolds. The problem of optimizing latent neighborhoods is difficult to solve, but the unsupervised nearest neighbor (UNN) formulation allows an efficient strategy of iteratively embedding latent points to discrete neighborhood topologies. The choice of an appropriate loss function is relevant, in particular for noisy, and high-dimensional data spaces. We extend UNN by the ε-insensitive loss, which allows to ignore small residuals under a defined threshold. Furthermore, we introduce techniques to handle incomplete data. Experimental analyses on various artificial and real-world test problems demonstrates the performance of the approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sdss 2011, sloan digital sky survey (2011), http://www.sdss.org

  2. Baillard, A., Bertin, E., de Lapparent, V., Fouqué, P., Arnouts, S., Mellier, Y., Pelló, R., Leborgne, J.-F., Prugniel, P., Markarov, D., Makarova, L., McCracken, H.J., Bijaoui, A., Tasca, L.: Galaxy morphology without classification: Self-organizing maps, 532, A74, 1103.5734 (2011)

    Google Scholar 

  3. Bhatia, N., Vandana: Survey of nearest neighbor techniques. CoRR, abs/1007.0085 (2010)

    Google Scholar 

  4. Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer (2007)

    Google Scholar 

  5. Carreira-Perpiñán, M.Á., Lu, Z.: Parametric dimensionality reduction by unsupervised regression. In: Computer Vision and Pattern Recognition (CVPR), pp. 1895–1902 (2010)

    Google Scholar 

  6. Chechik, G., Heitz, G., Elidan, G., Abbeel, P., Koller, D.: Max-margin classification of data with absent features. Journal of Machine Learning Research 9, 1–21 (2008)

    MATH  Google Scholar 

  7. Dick, U., Haider, P., Scheffer, T.: Learning from incomplete data with infinite imputations. In: International Conference on Machine Learning (ICML), pp. 232–239 (2008)

    Google Scholar 

  8. Fix, E., Hodges, J.: Discriminatory analysis, nonparametric discrimination: Consistency properties, vol. 4 (1951)

    Google Scholar 

  9. Ghahramani, Z., Jordan, M.I.: Supervised learning from incomplete data via an em approach. In: Advances in Neuronal Information Processing (NIPS), pp. 120–127 (1993)

    Google Scholar 

  10. Gieseke, F., Polsterer, K.L., Thom, A., Zinn, P., Bomanns, D., Dettmar, R.-J., Kramer, O., Vahrenhold, J.: Detecting quasars in large-scale astronomical surveys. In: ICMLA, pp. 352–357 (2010)

    Google Scholar 

  11. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, Berlin (2009)

    Book  MATH  Google Scholar 

  12. Hastie, Y., Stuetzle, W.: Principal curves. Journal of the American Statistical Association 85(406), 502–516 (1989)

    Article  MathSciNet  Google Scholar 

  13. Hull, J.: A database for handwritten text recognition research. IEEE PAMI 5(16), 550–554 (1994)

    Article  Google Scholar 

  14. Jolliffe, I.: Principal component analysis. Springer series in statistics. Springer, New York (1986)

    Book  Google Scholar 

  15. Kitchin, C.: Galaxies in Turmoil – The Active and Starburst Galaxies and the Black Holes That Drive Them. Springer, New York (2007)

    Google Scholar 

  16. Klanke, S., Ritter, H.: Variants of unsupervised kernel regression: General cost functions. Neurocomputing 70(7-9), 1289–1303 (2007)

    Article  Google Scholar 

  17. Kramer, O.: Dimensionalty reduction by unsupervised nearest neighbor regression. In: Proceedings of the 10th International Conference on Machine Learning and Applications (ICMLA), pp. 275–278. IEEE Press (2011)

    Google Scholar 

  18. Kramer, O.: On unsupervised nearest-neighbor regression and robust loss functions. In: International Conference on Artificial Intelligence, pp. 164–170 (2012)

    Google Scholar 

  19. Lawrence, N.D.: Probabilistic non-linear principal component analysis with gaussian process latent variable models. Journal of Machine Learning Research 6, 1783–1816 (2005)

    MathSciNet  MATH  Google Scholar 

  20. Meinicke, P.: Unsupervised Learning in a Generalized Regression Framework. PhD thesis, University of Bielefeld (2000)

    Google Scholar 

  21. Meinicke, P., Klanke, S., Memisevic, R., Ritter, H.: Principal surfaces from unsupervised kernel regression. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(9), 1379–1391 (2005)

    Article  Google Scholar 

  22. Pearson, K.: On lines and planes of closest fit to systems of points in space. Philosophical Magazine 2(6), 559–572 (1901)

    Google Scholar 

  23. Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)

    Article  Google Scholar 

  24. Schafer, J.L., Graham, J.W.: Missing data: Our view of the state of the art. Psychological Methods 7(2), 147–177 (2002)

    Article  Google Scholar 

  25. Schölkopf, B., Smola, A., Müller, K.-R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 10(5), 1299–1319 (1998)

    Article  Google Scholar 

  26. Smola, A.J., Mika, S., Schölkopf, B., Williamson, R.C.: Regularized principal manifolds. Journal on Machine Learning Research 1, 179–209 (2001)

    MATH  Google Scholar 

  27. Tan, S., Mavrovouniotis, M.: Reducing data dimensionality through optimizing neural network inputs. AIChE Journal 41(6), 1471–1479 (1995)

    Article  Google Scholar 

  28. Tenenbaum, J.B., Silva, V.D., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)

    Article  Google Scholar 

  29. Williams, D., Liao, X., Xue, Y., Carin, L., Krishnapuram, B.: On classification with incomplete data. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(3), 427–436 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kramer, O. (2013). Sorting High-Dimensional Patterns with Unsupervised Nearest Neighbors. In: Filipe, J., Fred, A. (eds) Agents and Artificial Intelligence. ICAART 2012. Communications in Computer and Information Science, vol 358. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36907-0_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36907-0_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36906-3

  • Online ISBN: 978-3-642-36907-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics