Abstract
Hubs and anti-hubs are points that appear very close or very far to many other data points due to a problem of measuring distances in high-dimensional spaces. Hubness is an aspect of the curse of dimensionality affecting many machine learning tasks. We present the first large scale empirical study to compare two competing hubness reduction techniques: scaling and centering. We show that scaling consistently reduces hubness and improves nearest neighbor classification, while centering shows rather mixed results. Support vector classification is mostly unaffected by centering-based hubness reduction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Python scripts for hubness analysis are available at: https://github.com/OFAI.
References
Flexer, A.: Improving visualization of high-dimensional music similarity spaces. In: 16th ISMIR Conference (2015)
Flexer, A., Schnitzer, D., Schlüter, J.: A MIREX meta-analysis of hubness in audio music similarity. In: 13th ISMIR Conference (2012)
Francois, D., Wertz, V., Verleysen, M.: The concentration of fractional distances. IEEE Trans. Knowl. Data Eng. 19, 873–886 (2007)
Hara, K., Suzuki, I., Shimbo, M., Kobayashi, K., Fukumizu, K., Radovanović, M.: Localized centering: reducing hubness in large-sample data hubness in high-dimensional data. In: 29th AAAI Conference on Artificial Intelligence, pp. 2645–2651 (2015)
Jegou, H., Harzallah, H., Schmid, C.: A contextual dissimilarity measure for accurate and efficient image search. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2007)
Radovanović, M., Nanopoulos, A., Ivanović, M.: Reverse nearest neighbors in unsupervised distance-based outlier detection. IEEE Trans. Knowl. Data Eng. 27(5), 1369–1382 (2015)
Radovanović, M., Nanopoulos, A., Ivanović, M.: Hubs in space: popular nearest neighbors in high-dimensional data. J. Mach. Learn. Res. 11, 2487–2531 (2010)
Schnitzer, D., Flexer, A.: The unbalancing effect of hubs on K-medoids clustering in high-dimensional spaces. In: International Joint Conference on Neural Networks (2015)
Schnitzer, D., Flexer, A., Schedl, M., Widmer, G.: Local and global scaling reduce hubs in space. J. Mach. Learn. Res. 13, 2871–2902 (2012)
Schnitzer, D., Flexer, A., Tomašev, N.: A case for hubness removal in high-dimensional multimedia retrieval. In: de Rijke, M., Kenter, T., de Vries, A.P., Zhai, C.X., de Jong, F., Radinsky, K., Hofmann, K. (eds.) ECIR 2014. LNCS, vol. 8416, pp. 687–692. Springer, Heidelberg (2014)
Suzuki, I., Hara, K., Shimbo, M., Saerens, M., Fukumizu, K.: Centering similarity measures to reduce hubs. In: Conference on Empirical Methods in Natural Language Processing (EMNLP 2013), pp. 613–623 (2013)
Tomašev, N., Radovanović, M., Mladenić, D., Ivanović, M.: The role of hubness in clustering high-dimensional data. IEEE Trans. Knowl. Data Eng. 26(3), 739–751 (2014)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Acknowledgments
This research is supported by the Austrian Science Fund (FWF): P27082, P27703.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Feldbauer, R., Flexer, A. (2016). Centering Versus Scaling for Hubness Reduction. In: Villa, A., Masulli, P., Pons Rivero, A. (eds) Artificial Neural Networks and Machine Learning – ICANN 2016. ICANN 2016. Lecture Notes in Computer Science(), vol 9886. Springer, Cham. https://doi.org/10.1007/978-3-319-44778-0_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-44778-0_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44777-3
Online ISBN: 978-3-319-44778-0
eBook Packages: Computer ScienceComputer Science (R0)