Feature Reduction and Nearest Neighbours

Lausser, Ludwig; Müssel, Christoph; Maucher, Markus; Kestler, Hans A.

doi:10.1007/978-3-642-24466-7_37

Ludwig Lausser⁵,
Christoph Müssel⁶,
Markus Maucher⁵ &
…
Hans A. Kestler^5,6

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2502 Accesses

Abstract

Feature reduction is a major preprocessing step in the analysis of high-dimensional data, particularly from biomolecular high-throughput technologies. Reduction techniques are expected to preserve the relevant characteristics of the data, such as neighbourhood relations. We investigate the neighbourhood preservation properties of feature reduction empirically and theoretically. Our results indicate that nearest and farthest neighbours are more reliably preserved than other neighbours in a reduced feature set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aggarwal CC, Hinneburg A, Keim DA (2001) On the surprising behavior of distance metrics in high dimensional spaces. In: Proceedings of the 8^th International Conference on Database Theory, Springer, London, UK, pp 420–434
Google Scholar
Beyer K, Goldstein J, Ramakrishnan R, Shaft U (1999) When is “nearest neighbor” meaningful? In: Proceedings of the 7th International Conference on Database Theory. Springer, London, UK, pp 217–235
Google Scholar
Burghouts G, Smeulders A, Geusebroek JM (2008) The distribution family of similarity distances. In: Platt JC, Koller D, Singer Y, Roweis S (eds) Advances in neural information processing systems 20, MIT Press, Cambridge, MA, USA, pp 201–208
Google Scholar
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
MATH Google Scholar
Hinneburg A, Aggarwal CC, Keim DA (2000) What is the nearest neighbor in high dimensional spaces? In: Proceedings of 26^th International Conference on Very Large Data Bases. Morgan Kaufmann, San Francisco, CA, USA, pp 506–515
Google Scholar
Johnson WB, Lindenstrauss J (1984) Extensions of Lipshitz mapping into Hilbert space. Contemp Math 26:189–206
Article MathSciNet MATH Google Scholar
Kohonen T (1989) Self-organization and associative memory, 3rd edn. Springer, Berlin, Germany
Book Google Scholar
MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5^th Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, Berkeley, CA, USA, vol 1, pp 281–297
Google Scholar
Radovanović M, Nanopoulos A, Ivanović M (2009) Nearest neighbors in high-dimensional data: The emergence and influence of hubs. In: Proceedings of the 26^th International Conference on Machine Learning, Omnipress, Madison, WI, USA, pp 865–872
Google Scholar
Ripley BD (1996) Pattern recognition and neural networks. Cambridge University Press, Cambridge, UK
MATH Google Scholar
Vapnik V (1998) Statistical learning theory. Wiley, Chichester, GB
MATH Google Scholar
West M, Blanchette C, Dressman H, Huang E, Ishida S, Spang R, Zuzan H, Olson JA, Marks JR, Nevins JR (2001) Predicting the clinical status of human breast cancer by using gene expression profiles. PNAS 98(20):11462–11467
Article Google Scholar

Download references

Author information

Authors and Affiliations

Internal Medicine 1, University Hospital Ulm, 89081, Ulm, Germany
Ludwig Lausser, Markus Maucher & Hans A. Kestler
Institute of Neural Information Processing, Ulm University, 89069, Ulm, Germany
Christoph Müssel & Hans A. Kestler

Authors

Ludwig Lausser
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Müssel
View author publications
You can also search for this author in PubMed Google Scholar
Markus Maucher
View author publications
You can also search for this author in PubMed Google Scholar
Hans A. Kestler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hans A. Kestler .

Editor information

Editors and Affiliations

Fak. Wirtschaftswissenschaften, Inst. Entscheidungstheorieund, Universität Karlsruhe (TH), Kaiserstr. 12, Karlsruhe, 76128, Germany
Wolfgang A. Gaul
Insitute for Information Systems, and Management (IISM), Karlsruhe Institute of Technology (KIT), Kaiserstr. 12, Karlsruhe, 76131, Baden-Württemberg, Germany
Andreas Geyer-Schulz
, Information Systems, University ofHildesheim, Marienburger Platz 22, Hildesheim, 31141, Germany
Lars Schmidt-Thieme
Institute for Information Systems, and Management (IISM), Karlsruhe Institute of Technology (KIT), Kaiserstraße 12, Karlsruhe, 76128, Germany
Jonas Kunze

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lausser, L., Müssel, C., Maucher, M., Kestler, H.A. (2012). Feature Reduction and Nearest Neighbours. In: Gaul, W., Geyer-Schulz, A., Schmidt-Thieme, L., Kunze, J. (eds) Challenges at the Interface of Data Analysis, Computer Science, and Optimization. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24466-7_37

Download citation

DOI: https://doi.org/10.1007/978-3-642-24466-7_37
Published: 05 January 2012
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24465-0
Online ISBN: 978-3-642-24466-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics