poster

A probabilistic approach to nearest-neighbor classification: naive hubness bayesian kNN

Authors:

Nenad Tomasev,

Miloa Radovanović,

Dunja Mladenić,

Mirjana IvanovićAuthors Info & Claims

CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

Pages 2173 - 2176

https://doi.org/10.1145/2063576.2063919

Published: 24 October 2011 Publication History

Get Access

Abstract

Most machine-learning tasks, including classification, involve dealing with high-dimensional data. It was recently shown that the phenomenon of hubness, inherent to high-dimensional data, can be exploited to improve methods based on nearest neighbors (NNs). Hubness refers to the emergence of points (hubs) that appear among the k NNs of many other points in the data, and constitute influential points for kNN classification. In this paper, we present a new probabilistic approach to kNN classification, naive hubness Bayesian k-nearest neighbor (NHBNN), which employs hubness for computing class likelihood estimates. Experiments show that NHBNN compares favorably to different variants of the kNN classifier, including probabilistic kNN (PNN) which is often used as an underlying probabilistic framework for NN classification, signifying that NHBNN is a promising alternative framework for developing probabilistic NN algorithms.

References

[1]

T. Cover and P. Hart. Nearest neighbor pattern classification. IEEE T Inform Theory, 13:21--27, 1967.

Digital Library

Google Scholar

[2]

L. Cucala, J. M. Marin, C. P. Robert, and D. M. Titterington. A Bayesian reassessment of nearest-neighbor classification. J Am Stat Assoc, 104:263--273, 2009.

Crossref

Google Scholar

[3]

R. J. Durrant and A. Kabán. When is 'nearest neighbour' meaningful: A converse theorem and implications. J Complexity, 25(4):385--397, 2009.

Digital Library

Google Scholar

[4]

D. François, V. Wertz, and M. Verleysen. The concentration of fractional distances. IEEE T Knowl Data En, 19:873--886, 2007.

Digital Library

Google Scholar

[5]

R. Guo and S. Chakraborty. Bayesian adaptive nearest neighbor. Stat Anal Data Min, 3:92--105, 2010.

Digital Library

Google Scholar

[6]

C. C. Holmes and N. M. Adams. A probabilistic nearest neighbor method for statistical pattern recognition. J R Stat Soc B, 64:295--306, 2002.

Crossref

Google Scholar

[7]

S. Manocha and M. A. Girolami. An empirical analysis of the probabilistic k-nearest neighbour classifier. Pattern Recogn Lett, 28:1818--1824, 2007.

Digital Library

Google Scholar

[8]

M. Radovanović, A. Nanopoulos, and M. Ivanović. Hubs in space: Popular nearest neighbors in high-dimensional data. J Mach Learn Res, 11:2487--2531, 2010.

Digital Library

Google Scholar

[9]

I. Rish. An empirical study of the naive Bayes classifier. In Proc. IJCAI Workshop on Empirical Methods in Artificial Intelligence, 2001.

Google Scholar

[10]

N. Toma\vsev, M. Radovanović, M. Ivanović, and D. Mladenić. Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification. In Proc. 7th Int. Conf. on Machine Learning and Data Mining (MLDM), 2011.

Digital Library

Google Scholar

[11]

Z. Zhang and R. Zhang. Multimedia Data Mining: a Systematic Introduction to Concepts and Theory. Chapman and Hall, 2008.

Digital Library

Google Scholar

Cited By

View all

Liang YWu JLai YQin YSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Efficient precision and recall metrics for assessing generative models using hubness-aware samplingProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693266(29682-29699)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693266
Revina ABuza KMeister V(2021)Designing Explainable Text Classification Pipelines: Insights from IT Ticket Complexity Prediction Case StudyInterpretable Artificial Intelligence: A Perspective of Granular Computing10.1007/978-3-030-64949-4_10(293-332)Online publication date: 27-Mar-2021
https://doi.org/10.1007/978-3-030-64949-4_10
Chaudahri MVani M(2020)Survey on Clustering High-Dimensional data using HubnessInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology10.32628/CSEIT195671(01-07)Online publication date: 1-Jan-2020
https://doi.org/10.32628/CSEIT195671
Show More Cited By

Index Terms

A probabilistic approach to nearest-neighbor classification: naive hubness bayesian kNN
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

Discriminant Adaptive Nearest Neighbor Classification

Nearest neighbor classification expects the class conditional probabilities to be locally constant, and suffers from bias in high dimensions. We propose a locally adaptive form of nearest neighbor classification to try to ameliorate this curse of ...
Improving nearest neighbor classification with cam weighted distance

Nearest neighbor (NN) classification assumes locally constant class conditional probabilities, and suffers from bias in high dimensions with a small sample set. In this paper, we propose a novel cam weighted distance to ameliorate the curse of ...
Using k-nearest-neighbor classification in the leaves of a tree

We construct a hybrid (composite) classifier by combining two classifiers in common use--classification trees and k-nearest-neighbor (k-NN). In our scheme we divide the feature space up by a classification tree, and then classify test set items using ...

Comments

Information & Contributors

Information

Published In

CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

October 2011

2712 pages

ISBN:9781450307178

DOI:10.1145/2063576

Editors:
Bettina Berendt,
Arjen de Vries,
Wenfei Fan,
Craig Macdonald
University of Glasgow, UK
,
Iadh Ounis
University of Glasgow, UK
,
Ian Ruthven
University of Strathclyde, UK

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Poster

Conference

CIKM '11

Sponsor:

CIKM '11: International Conference on Information and Knowledge Management

October 24 - 28, 2011

Glasgow, Scotland, UK

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

26
Total Citations
View Citations
298
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Liang YWu JLai YQin YSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Efficient precision and recall metrics for assessing generative models using hubness-aware samplingProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693266(29682-29699)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693266
Revina ABuza KMeister V(2021)Designing Explainable Text Classification Pipelines: Insights from IT Ticket Complexity Prediction Case StudyInterpretable Artificial Intelligence: A Perspective of Granular Computing10.1007/978-3-030-64949-4_10(293-332)Online publication date: 27-Mar-2021
https://doi.org/10.1007/978-3-030-64949-4_10
Chaudahri MVani M(2020)Survey on Clustering High-Dimensional data using HubnessInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology10.32628/CSEIT195671(01-07)Online publication date: 1-Jan-2020
https://doi.org/10.32628/CSEIT195671
Revina ABuza KMeister V(2020)IT Ticket Classification: The Simpler, the BetterIEEE Access10.1109/ACCESS.2020.30328408(193380-193395)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.3032840
Zhao GLi HFeng HWu DLi HZhao A(2019)Effects of Alpha–Proton Differential Flow on Proton Temperature Anisotropy Instabilities in the Solar Wind: Wind ObservationsThe Astrophysical Journal10.3847/1538-4357/ab3d35884:1(60)Online publication date: 11-Oct-2019
https://doi.org/10.3847/1538-4357/ab3d35
Buza KNanopoulos ANagy G(2019)Nearest neighbor regression in the presence of bad hubsKnowledge-Based Systems10.1016/j.knosys.2015.06.01086:C(250-260)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.1016/j.knosys.2015.06.010
Mani PVazquez MMetcalf-Burton JDomeniconi CFairbanks HBal GBeer ETari S(2019)The Hubness Phenomenon in High-Dimensional SpacesResearch in Data Science10.1007/978-3-030-11566-1_2(15-45)Online publication date: 26-Mar-2019
https://doi.org/10.1007/978-3-030-11566-1_2
Radovanović M(2018)Hubs in Nearest-Neighbor GraphsProceedings of the 8th International Conference on Web Intelligence, Mining and Semantics10.1145/3227609.3227691(1-4)Online publication date: 25-Jun-2018
https://dl.acm.org/doi/10.1145/3227609.3227691
Tomašev NBuza K(2018)Hubness-aware kNN classification of high-dimensional data in presence of label noiseNeurocomputing10.1016/j.neucom.2014.10.084160:C(157-172)Online publication date: 31-Dec-2018
https://dl.acm.org/doi/10.1016/j.neucom.2014.10.084
Qian QChen S(2018)Co-metricFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-013-2110-x7:3(359-369)Online publication date: 15-Dec-2018
https://dl.acm.org/doi/10.1007/s11704-013-2110-x
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Discriminant Adaptive Nearest Neighbor Classification

Improving nearest neighbor classification with cam weighted distance

Using k-nearest-neighbor classification in the leaves of a tree

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations