Abstract
Feature reduction is a major preprocessing step in the analysis of high-dimensional data, particularly from biomolecular high-throughput technologies. Reduction techniques are expected to preserve the relevant characteristics of the data, such as neighbourhood relations. We investigate the neighbourhood preservation properties of feature reduction empirically and theoretically. Our results indicate that nearest and farthest neighbours are more reliably preserved than other neighbours in a reduced feature set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aggarwal CC, Hinneburg A, Keim DA (2001) On the surprising behavior of distance metrics in high dimensional spaces. In: Proceedings of the 8th International Conference on Database Theory, Springer, London, UK, pp 420–434
Beyer K, Goldstein J, Ramakrishnan R, Shaft U (1999) When is “nearest neighbor” meaningful? In: Proceedings of the 7th International Conference on Database Theory. Springer, London, UK, pp 217–235
Burghouts G, Smeulders A, Geusebroek JM (2008) The distribution family of similarity distances. In: Platt JC, Koller D, Singer Y, Roweis S (eds) Advances in neural information processing systems 20, MIT Press, Cambridge, MA, USA, pp 201–208
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Hinneburg A, Aggarwal CC, Keim DA (2000) What is the nearest neighbor in high dimensional spaces? In: Proceedings of 26th International Conference on Very Large Data Bases. Morgan Kaufmann, San Francisco, CA, USA, pp 506–515
Johnson WB, Lindenstrauss J (1984) Extensions of Lipshitz mapping into Hilbert space. Contemp Math 26:189–206
Kohonen T (1989) Self-organization and associative memory, 3rd edn. Springer, Berlin, Germany
MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, Berkeley, CA, USA, vol 1, pp 281–297
Radovanović M, Nanopoulos A, Ivanović M (2009) Nearest neighbors in high-dimensional data: The emergence and influence of hubs. In: Proceedings of the 26th International Conference on Machine Learning, Omnipress, Madison, WI, USA, pp 865–872
Ripley BD (1996) Pattern recognition and neural networks. Cambridge University Press, Cambridge, UK
Vapnik V (1998) Statistical learning theory. Wiley, Chichester, GB
West M, Blanchette C, Dressman H, Huang E, Ishida S, Spang R, Zuzan H, Olson JA, Marks JR, Nevins JR (2001) Predicting the clinical status of human breast cancer by using gene expression profiles. PNAS 98(20):11462–11467
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lausser, L., Müssel, C., Maucher, M., Kestler, H.A. (2012). Feature Reduction and Nearest Neighbours. In: Gaul, W., Geyer-Schulz, A., Schmidt-Thieme, L., Kunze, J. (eds) Challenges at the Interface of Data Analysis, Computer Science, and Optimization. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24466-7_37
Download citation
DOI: https://doi.org/10.1007/978-3-642-24466-7_37
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24465-0
Online ISBN: 978-3-642-24466-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)