Skip to main content
Log in

Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Most data of interest today in data-mining applications is complex and is usually represented by many different features. Such high-dimensional data is by its very nature often quite difficult to handle by conventional machine-learning algorithms. This is considered to be an aspect of the well known curse of dimensionality. Consequently, high-dimensional data needs to be processed with care, which is why the design of machine-learning algorithms needs to take these factors into account. Furthermore, it was observed that some of the arising high-dimensional properties could in fact be exploited in improving overall algorithm design. One such phenomenon, related to nearest-neighbor learning methods, is known as hubness and refers to the emergence of very influential nodes (hubs) in k-nearest neighbor graphs. A crisp weighted voting scheme for the k-nearest neighbor classifier has recently been proposed which exploits this notion. In this paper we go a step further by embracing the soft approach, and propose several fuzzy measures for k-nearest neighbor classification, all based on hubness, which express fuzziness of elements appearing in k-neighborhoods of other points. Experimental evaluation on real data from the UCI repository and the image domain suggests that the fuzzy approach provides a useful measure of confidence in the predicted labels, resulting in improvement over the crisp weighted method, as well as the standard kNN classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. Skewness, the standardized 3rd moment of a probability distribution, is 0 if the distribution is symmetrical, while positive (negative) values indicate skew to the right (left).

References

  1. Aggarwal CC, Hinneburg A, Keim DA (2001) On the surprising behavior of distance metrics in high dimensional spaces. In: Proceedings of the 8th international conference on database theory (ICDT), Lecture notes in computer science, vol 1973. Springer, pp 420–434

  2. Aucouturier JJ (2006) Ten experiments on the modelling of polyphonic timbre. Ph.D. thesis, University of Paris 6

  3. Aucouturier JJ, Pachet F (2004) Improving timbre similarity: how high is the sky? J Negat Results Speech Audio Sci 1. http://jjtok.io/papers/JNRSAS-2004.pdf

  4. Babu VS, Viswanath P (2009) Rough-fuzzy weighted k-nearest leader classifier for large data sets. Pattern Recogn Lett 42(9):1719–1731

    Article  MATH  Google Scholar 

  5. Buza K, Nanopoulos A, Schmidt-Thieme L (2011) INSIGHT: Efficient and effective instance selection for time-series classification. In: Proceedings of the 15th pacific-asia conference on knowledge discovery and data mining (PAKDD), Part II, Lecture Notes in Artificial Intelligence, vol 6635. Springer, pp 149–160

  6. Cabello D, Barro S, Salceda JM, Ruiz R, Mira J (1991) Fuzzy k-nearest neighbor classifiers for ventricular arrhythmia detection. Int J Biomed Comput 27(2):77–93

    Article  Google Scholar 

  7. Chen J, Fang H, Saad Y (2009) Fast approximate k NN graph construction for high dimensional data via recursive Lanczos bisection. J Mach Learn Res 10:1989–2012

    MATH  MathSciNet  Google Scholar 

  8. Cintra ME, Camargo HA, Monard MC (2008) A study on techniques for the automatic generation of membership functions for pattern recognition. In: Congresso da Academia Trinacional de Ciências (C3N), vol 1, pp 1–10

  9. Durrant RJ, Kabán A (2009) When is ‘nearest neighbour’ meaningful: a converse theorem and implications. J Complex 25(4):385–397

    Article  MATH  Google Scholar 

  10. François D, Wertz V, Verleysen M (2007) The concentration of fractional distances. IEEE Trans Knowl Data Eng 19(7):873–886

    Article  Google Scholar 

  11. Houle ME, Kriegel HP, Kröger P, Schubert E, Zimek A (2010) Can shared-neighbor distances defeat the curse of dimensionality? In: Proceedings of the 22nd international conference on scientific and statistical database management (SSDBM), Lecture Notes in Computer Science, vol 6187. Springer, pp 482–500

  12. Huang WL, Chen HM, Hwang SF, Ho SY (2007) Accurate prediction of enzyme subfamily class using an adaptive fuzzy k-nearest neighbor method. Biosyst Eng 90(2):405–413

    Article  Google Scholar 

  13. Keller JE, Gray MR, Givens JA (1985) A fuzzy k-nearest neighbor algorithm. IEEE Trans Syst Man Cybern 15(4):580–585

    Article  Google Scholar 

  14. Nadeau C, Bengio Y (2003) Inference for the generalization error. Mach Learn 52(3):239–281

    Article  MATH  Google Scholar 

  15. Pham T.D. (2005) An optimally weighted fuzzy k-NN algorithm. In: Proceedings of the 3rd international conference on advances in pattern recognition (ICAPR), Part I, Lecture Notes in Computer Science, vol 3686. Springer, pp 239–247

  16. Radovanović M, Nanopoulos A, Ivanović M (2009) Nearest neighbors in high-dimensional data: the emergence and influence of hubs. In: Proceedings of the 26th international conference on machine learning (ICML), pp 865–872

  17. Radovanović M, Nanopoulos A, Ivanović M (2010) Hubs in space: Popular nearest neighbors in high-dimensional data. J Mach Learn Res 11:2487–2531

    MATH  MathSciNet  Google Scholar 

  18. Radovanović M., Nanopoulos A., Ivanović M. (2010) On the existence of obstinate results in vector space models. In: Proceedings of the 33rd annual international ACM SIGIR conference on research and development in information retrieval, pp 186–193

  19. Radovanović M, Nanopoulos A, Ivanović M (2010) Time-series classification in many intrinsic dimensions. In: Proceedings of the 10th SIAM international conference on data mining (SDM), pp 677–688

  20. Shen HB, Yang J, Chou KC (2006) Fuzzy KNN for predicting membrane protein types from pseudo-amino acid composition. J Theor Biol 240(1):9–13

    Article  MathSciNet  Google Scholar 

  21. Sim J, Kim SY, Lee J (2005) Prediction of protein solvent accessibility using fuzzy k-nearest neighbor method. Bioinform Biol Insights 21(12):2844–2849

    Article  Google Scholar 

  22. Singpurwalla N, Booker JM (2004) Membership functions and probability measures of fuzzy sets. J Am Stat Assoc 99:867–877

    Article  MATH  MathSciNet  Google Scholar 

  23. Tomašev N, Brehar R, Mladenić D, Nedevschi S (2011) The influence of hubness on nearest-neighbor methods in object recognition. In: Proceedings of the 7th IEEE international conference on intelligent computer communication and processing (ICCP), pp 367–374

  24. Tomašev N, Mladenić D (2011) Exploring the hubness-related properties of oceanographic sensor data. In: Proceedings of the 14th international multiconference on information society (IS), A:149–152

  25. Tomašev N, Mladenić D (2011) The influence of weighting the k-occurrences on hubness-aware classification methods. In: Proceedings of 14th international multiconference on information society

  26. Tomašev N, Mladenić D (2012) Nearest neighbor voting in high dimensional data: learning from past occurrences. Comput Sci Inf Syst 9(2):691–712

    Article  Google Scholar 

  27. Tomašev N, Radovanović M, Mladenić D, Ivanović M (2011) Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification. In: Proceedings of the 7th international conference on machine learning and data mining (MLDM), Lecture Notes in Artificial Intelligence, vol 6871. Springer, pp 16–30

  28. Tomašev N, Radovanović M, Mladenić D, Ivanović M (2011) The role of hubness in clustering high-dimensional data. In: Proceedings of the 15th pacific-asia conference on knowledge discovery and data mining (PAKDD), Part I, Lecture Notes in Artificial Intelligence, vol 6634. Springer, pp 183–195

  29. Wang XZ, He YL, Dong LC, Zhao HY (2011) Particle swarm optimization for determining fuzzy measures from data. Inf Sci 181(19):4230–4252

    Article  MATH  Google Scholar 

  30. Yu S, Backer SD, Scheunders P (2002) Genetic feature selection combined with composite fuzzy nearest neighbor classifiers for hyperspectral satellite imagery. Pattern Recogn Lett 23(1–3):183–190

    Article  MATH  Google Scholar 

  31. Zadeh LA (1965) Fuzzy sets. Inf Control 8(3):338–353

    Article  MATH  MathSciNet  Google Scholar 

  32. Zhang Z, Zhang R (2009) Multimedia data mining. Chapman and Hall, London

  33. Zheng K, Fung PC, Zhou X (2010) K-nearest neighbor search for fuzzy objects. In: Proceedings of the 36th ACM SIGMOD international conference on management of data, pp 699–710

  34. Zuo W, Zhang D, Wang K (2008) On kernel difference-weighted k-nearest neighbor classification. Pattern Anal Appl 11:247–257

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

This work was supported by the bilateral project between Slovenia and Serbia “Correlating images and words: Enhancing image analysis through machine learning and semantic technologies,” the Slovenian Research Agency, the Serbian Ministry of Education and Science through project no. OI174023, “Intelligent techniques and their integration into wide-spectrum decision support,” and the ICT Programme of the EC under PASCAL2 (ICT-NoE-216886) and PlanetData (ICT-NoE-257641).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nenad Tomašev.

Additional information

This is an extended version of the paper Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification, which was presented at the MLDM 2011 conference [27].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tomašev, N., Radovanović, M., Mladenić, D. et al. Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification. Int. J. Mach. Learn. & Cyber. 5, 445–458 (2014). https://doi.org/10.1007/s13042-012-0137-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-012-0137-1

Keywords

Navigation