skip to main content
10.1145/2063576.2063919acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

A probabilistic approach to nearest-neighbor classification: naive hubness bayesian kNN

Published: 24 October 2011 Publication History

Abstract

Most machine-learning tasks, including classification, involve dealing with high-dimensional data. It was recently shown that the phenomenon of hubness, inherent to high-dimensional data, can be exploited to improve methods based on nearest neighbors (NNs). Hubness refers to the emergence of points (hubs) that appear among the k NNs of many other points in the data, and constitute influential points for kNN classification. In this paper, we present a new probabilistic approach to kNN classification, naive hubness Bayesian k-nearest neighbor (NHBNN), which employs hubness for computing class likelihood estimates. Experiments show that NHBNN compares favorably to different variants of the kNN classifier, including probabilistic kNN (PNN) which is often used as an underlying probabilistic framework for NN classification, signifying that NHBNN is a promising alternative framework for developing probabilistic NN algorithms.

References

[1]
T. Cover and P. Hart. Nearest neighbor pattern classification. IEEE T Inform Theory, 13:21--27, 1967.
[2]
L. Cucala, J. M. Marin, C. P. Robert, and D. M. Titterington. A Bayesian reassessment of nearest-neighbor classification. J Am Stat Assoc, 104:263--273, 2009.
[3]
R. J. Durrant and A. Kabán. When is 'nearest neighbour' meaningful: A converse theorem and implications. J Complexity, 25(4):385--397, 2009.
[4]
D. François, V. Wertz, and M. Verleysen. The concentration of fractional distances. IEEE T Knowl Data En, 19:873--886, 2007.
[5]
R. Guo and S. Chakraborty. Bayesian adaptive nearest neighbor. Stat Anal Data Min, 3:92--105, 2010.
[6]
C. C. Holmes and N. M. Adams. A probabilistic nearest neighbor method for statistical pattern recognition. J R Stat Soc B, 64:295--306, 2002.
[7]
S. Manocha and M. A. Girolami. An empirical analysis of the probabilistic k-nearest neighbour classifier. Pattern Recogn Lett, 28:1818--1824, 2007.
[8]
M. Radovanović, A. Nanopoulos, and M. Ivanović. Hubs in space: Popular nearest neighbors in high-dimensional data. J Mach Learn Res, 11:2487--2531, 2010.
[9]
I. Rish. An empirical study of the naive Bayes classifier. In Proc. IJCAI Workshop on Empirical Methods in Artificial Intelligence, 2001.
[10]
N. Toma\vsev, M. Radovanović, M. Ivanović, and D. Mladenić. Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification. In Proc. 7th Int. Conf. on Machine Learning and Data Mining (MLDM), 2011.
[11]
Z. Zhang and R. Zhang. Multimedia Data Mining: a Systematic Introduction to Concepts and Theory. Chapman and Hall, 2008.

Cited By

View all
  • (2024)Efficient precision and recall metrics for assessing generative models using hubness-aware samplingProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693266(29682-29699)Online publication date: 21-Jul-2024
  • (2021)Designing Explainable Text Classification Pipelines: Insights from IT Ticket Complexity Prediction Case StudyInterpretable Artificial Intelligence: A Perspective of Granular Computing10.1007/978-3-030-64949-4_10(293-332)Online publication date: 27-Mar-2021
  • (2020)Survey on Clustering High-Dimensional data using HubnessInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology10.32628/CSEIT195671(01-07)Online publication date: 1-Jan-2020
  • Show More Cited By

Index Terms

  1. A probabilistic approach to nearest-neighbor classification: naive hubness bayesian kNN

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management
    October 2011
    2712 pages
    ISBN:9781450307178
    DOI:10.1145/2063576
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 October 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. bayesian
    2. classification
    3. high dimensional
    4. hubness
    5. nearest neighbors

    Qualifiers

    • Poster

    Conference

    CIKM '11
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Efficient precision and recall metrics for assessing generative models using hubness-aware samplingProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693266(29682-29699)Online publication date: 21-Jul-2024
    • (2021)Designing Explainable Text Classification Pipelines: Insights from IT Ticket Complexity Prediction Case StudyInterpretable Artificial Intelligence: A Perspective of Granular Computing10.1007/978-3-030-64949-4_10(293-332)Online publication date: 27-Mar-2021
    • (2020)Survey on Clustering High-Dimensional data using HubnessInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology10.32628/CSEIT195671(01-07)Online publication date: 1-Jan-2020
    • (2020)IT Ticket Classification: The Simpler, the BetterIEEE Access10.1109/ACCESS.2020.30328408(193380-193395)Online publication date: 2020
    • (2019)Effects of Alpha–Proton Differential Flow on Proton Temperature Anisotropy Instabilities in the Solar Wind: Wind ObservationsThe Astrophysical Journal10.3847/1538-4357/ab3d35884:1(60)Online publication date: 11-Oct-2019
    • (2019)Nearest neighbor regression in the presence of bad hubsKnowledge-Based Systems10.1016/j.knosys.2015.06.01086:C(250-260)Online publication date: 1-Jan-2019
    • (2019)The Hubness Phenomenon in High-Dimensional SpacesResearch in Data Science10.1007/978-3-030-11566-1_2(15-45)Online publication date: 26-Mar-2019
    • (2018)Hubs in Nearest-Neighbor GraphsProceedings of the 8th International Conference on Web Intelligence, Mining and Semantics10.1145/3227609.3227691(1-4)Online publication date: 25-Jun-2018
    • (2018)Hubness-aware kNN classification of high-dimensional data in presence of label noiseNeurocomputing10.1016/j.neucom.2014.10.084160:C(157-172)Online publication date: 31-Dec-2018
    • (2018)Co-metricFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-013-2110-x7:3(359-369)Online publication date: 15-Dec-2018
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media