Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification

Tomašev, Nenad; Radovanović, Miloš; Mladenić, Dunja; Ivanović, Mirjana

doi:10.1007/s13042-012-0137-1

Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification

Original Article
Published: 16 December 2012

Volume 5, pages 445–458, (2014)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Nenad Tomašev¹,
Miloš Radovanović²,
Dunja Mladenić¹ &
…
Mirjana Ivanović²

563 Accesses
23 Citations
Explore all metrics

Abstract

Most data of interest today in data-mining applications is complex and is usually represented by many different features. Such high-dimensional data is by its very nature often quite difficult to handle by conventional machine-learning algorithms. This is considered to be an aspect of the well known curse of dimensionality. Consequently, high-dimensional data needs to be processed with care, which is why the design of machine-learning algorithms needs to take these factors into account. Furthermore, it was observed that some of the arising high-dimensional properties could in fact be exploited in improving overall algorithm design. One such phenomenon, related to nearest-neighbor learning methods, is known as hubness and refers to the emergence of very influential nodes (hubs) in k-nearest neighbor graphs. A crisp weighted voting scheme for the k-nearest neighbor classifier has recently been proposed which exploits this notion. In this paper we go a step further by embracing the soft approach, and propose several fuzzy measures for k-nearest neighbor classification, all based on hubness, which express fuzziness of elements appearing in k-neighborhoods of other points. Experimental evaluation on real data from the UCI repository and the image domain suggests that the fuzzy approach provides a useful measure of confidence in the predicted labels, resulting in improvement over the crisp weighted method, as well as the standard kNN classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Survey on Supervised and Unsupervised Learning Techniques

Selecting critical features for data classification based on machine learning methods

Article Open access 23 July 2020

Clustering graph data: the roadmap to spectral techniques

Article Open access 22 January 2024

Notes

Skewness, the standardized 3rd moment of a probability distribution, is 0 if the distribution is symmetrical, while positive (negative) values indicate skew to the right (left).

References

Aggarwal CC, Hinneburg A, Keim DA (2001) On the surprising behavior of distance metrics in high dimensional spaces. In: Proceedings of the 8th international conference on database theory (ICDT), Lecture notes in computer science, vol 1973. Springer, pp 420–434
Aucouturier JJ (2006) Ten experiments on the modelling of polyphonic timbre. Ph.D. thesis, University of Paris 6
Aucouturier JJ, Pachet F (2004) Improving timbre similarity: how high is the sky? J Negat Results Speech Audio Sci 1. http://jjtok.io/papers/JNRSAS-2004.pdf
Babu VS, Viswanath P (2009) Rough-fuzzy weighted k-nearest leader classifier for large data sets. Pattern Recogn Lett 42(9):1719–1731
Article MATH Google Scholar
Buza K, Nanopoulos A, Schmidt-Thieme L (2011) INSIGHT: Efficient and effective instance selection for time-series classification. In: Proceedings of the 15th pacific-asia conference on knowledge discovery and data mining (PAKDD), Part II, Lecture Notes in Artificial Intelligence, vol 6635. Springer, pp 149–160
Cabello D, Barro S, Salceda JM, Ruiz R, Mira J (1991) Fuzzy k-nearest neighbor classifiers for ventricular arrhythmia detection. Int J Biomed Comput 27(2):77–93
Article Google Scholar
Chen J, Fang H, Saad Y (2009) Fast approximate k NN graph construction for high dimensional data via recursive Lanczos bisection. J Mach Learn Res 10:1989–2012
MATH MathSciNet Google Scholar
Cintra ME, Camargo HA, Monard MC (2008) A study on techniques for the automatic generation of membership functions for pattern recognition. In: Congresso da Academia Trinacional de Ciências (C3N), vol 1, pp 1–10
Durrant RJ, Kabán A (2009) When is ‘nearest neighbour’ meaningful: a converse theorem and implications. J Complex 25(4):385–397
Article MATH Google Scholar
François D, Wertz V, Verleysen M (2007) The concentration of fractional distances. IEEE Trans Knowl Data Eng 19(7):873–886
Article Google Scholar
Houle ME, Kriegel HP, Kröger P, Schubert E, Zimek A (2010) Can shared-neighbor distances defeat the curse of dimensionality? In: Proceedings of the 22nd international conference on scientific and statistical database management (SSDBM), Lecture Notes in Computer Science, vol 6187. Springer, pp 482–500
Huang WL, Chen HM, Hwang SF, Ho SY (2007) Accurate prediction of enzyme subfamily class using an adaptive fuzzy k-nearest neighbor method. Biosyst Eng 90(2):405–413
Article Google Scholar
Keller JE, Gray MR, Givens JA (1985) A fuzzy k-nearest neighbor algorithm. IEEE Trans Syst Man Cybern 15(4):580–585
Article Google Scholar
Nadeau C, Bengio Y (2003) Inference for the generalization error. Mach Learn 52(3):239–281
Article MATH Google Scholar
Pham T.D. (2005) An optimally weighted fuzzy k-NN algorithm. In: Proceedings of the 3rd international conference on advances in pattern recognition (ICAPR), Part I, Lecture Notes in Computer Science, vol 3686. Springer, pp 239–247
Radovanović M, Nanopoulos A, Ivanović M (2009) Nearest neighbors in high-dimensional data: the emergence and influence of hubs. In: Proceedings of the 26th international conference on machine learning (ICML), pp 865–872
Radovanović M, Nanopoulos A, Ivanović M (2010) Hubs in space: Popular nearest neighbors in high-dimensional data. J Mach Learn Res 11:2487–2531
MATH MathSciNet Google Scholar
Radovanović M., Nanopoulos A., Ivanović M. (2010) On the existence of obstinate results in vector space models. In: Proceedings of the 33rd annual international ACM SIGIR conference on research and development in information retrieval, pp 186–193
Radovanović M, Nanopoulos A, Ivanović M (2010) Time-series classification in many intrinsic dimensions. In: Proceedings of the 10th SIAM international conference on data mining (SDM), pp 677–688
Shen HB, Yang J, Chou KC (2006) Fuzzy KNN for predicting membrane protein types from pseudo-amino acid composition. J Theor Biol 240(1):9–13
Article MathSciNet Google Scholar
Sim J, Kim SY, Lee J (2005) Prediction of protein solvent accessibility using fuzzy k-nearest neighbor method. Bioinform Biol Insights 21(12):2844–2849
Article Google Scholar
Singpurwalla N, Booker JM (2004) Membership functions and probability measures of fuzzy sets. J Am Stat Assoc 99:867–877
Article MATH MathSciNet Google Scholar
Tomašev N, Brehar R, Mladenić D, Nedevschi S (2011) The influence of hubness on nearest-neighbor methods in object recognition. In: Proceedings of the 7th IEEE international conference on intelligent computer communication and processing (ICCP), pp 367–374
Tomašev N, Mladenić D (2011) Exploring the hubness-related properties of oceanographic sensor data. In: Proceedings of the 14th international multiconference on information society (IS), A:149–152
Tomašev N, Mladenić D (2011) The influence of weighting the k-occurrences on hubness-aware classification methods. In: Proceedings of 14th international multiconference on information society
Tomašev N, Mladenić D (2012) Nearest neighbor voting in high dimensional data: learning from past occurrences. Comput Sci Inf Syst 9(2):691–712
Article Google Scholar
Tomašev N, Radovanović M, Mladenić D, Ivanović M (2011) Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification. In: Proceedings of the 7th international conference on machine learning and data mining (MLDM), Lecture Notes in Artificial Intelligence, vol 6871. Springer, pp 16–30
Tomašev N, Radovanović M, Mladenić D, Ivanović M (2011) The role of hubness in clustering high-dimensional data. In: Proceedings of the 15th pacific-asia conference on knowledge discovery and data mining (PAKDD), Part I, Lecture Notes in Artificial Intelligence, vol 6634. Springer, pp 183–195
Wang XZ, He YL, Dong LC, Zhao HY (2011) Particle swarm optimization for determining fuzzy measures from data. Inf Sci 181(19):4230–4252
Article MATH Google Scholar
Yu S, Backer SD, Scheunders P (2002) Genetic feature selection combined with composite fuzzy nearest neighbor classifiers for hyperspectral satellite imagery. Pattern Recogn Lett 23(1–3):183–190
Article MATH Google Scholar
Zadeh LA (1965) Fuzzy sets. Inf Control 8(3):338–353
Article MATH MathSciNet Google Scholar
Zhang Z, Zhang R (2009) Multimedia data mining. Chapman and Hall, London
Zheng K, Fung PC, Zhou X (2010) K-nearest neighbor search for fuzzy objects. In: Proceedings of the 36th ACM SIGMOD international conference on management of data, pp 699–710
Zuo W, Zhang D, Wang K (2008) On kernel difference-weighted k-nearest neighbor classification. Pattern Anal Appl 11:247–257
Article MathSciNet Google Scholar

Download references

Acknowledgments

This work was supported by the bilateral project between Slovenia and Serbia “Correlating images and words: Enhancing image analysis through machine learning and semantic technologies,” the Slovenian Research Agency, the Serbian Ministry of Education and Science through project no. OI174023, “Intelligent techniques and their integration into wide-spectrum decision support,” and the ICT Programme of the EC under PASCAL2 (ICT-NoE-216886) and PlanetData (ICT-NoE-257641).

Author information

Authors and Affiliations

Institute Jožef Stefan, Artificial Intelligence Laboratory, Jožef Stefan International Postgraduate School, Jamova 39, 1000, Ljubljana, Slovenia
Nenad Tomašev & Dunja Mladenić
Department of Mathematics and Informatics, University of Novi Sad, Trg D. Obradovića 4, 21000, Novi Sad, Serbia
Miloš Radovanović & Mirjana Ivanović

Authors

Nenad Tomašev
View author publications
You can also search for this author in PubMed Google Scholar
Miloš Radovanović
View author publications
You can also search for this author in PubMed Google Scholar
Dunja Mladenić
View author publications
You can also search for this author in PubMed Google Scholar
Mirjana Ivanović
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nenad Tomašev.

Additional information

This is an extended version of the paper Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification, which was presented at the MLDM 2011 conference [27].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tomašev, N., Radovanović, M., Mladenić, D. et al. Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification. Int. J. Mach. Learn. & Cyber. 5, 445–458 (2014). https://doi.org/10.1007/s13042-012-0137-1

Download citation

Received: 09 March 2012
Accepted: 19 November 2012
Published: 16 December 2012
Issue Date: June 2014
DOI: https://doi.org/10.1007/s13042-012-0137-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification

Abstract

Access this article

Similar content being viewed by others

A Survey on Supervised and Unsupervised Learning Techniques

Selecting critical features for data classification based on machine learning methods

Clustering graph data: the roadmap to spectral techniques

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification

Abstract

Access this article

Similar content being viewed by others

A Survey on Supervised and Unsupervised Learning Techniques

Selecting critical features for data classification based on machine learning methods

Clustering graph data: the roadmap to spectral techniques

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation