Loading [a11y]/accessibility-menu.js
Efficient Classification by Removing Bayesian Confusing Samples | IEEE Journals & Magazine | IEEE Xplore

Efficient Classification by Removing Bayesian Confusing Samples


Abstract:

Improving the generalization performance of classifiers from data pre-processing perspective has recently received considerable attention in the machine learning communit...Show More

Abstract:

Improving the generalization performance of classifiers from data pre-processing perspective has recently received considerable attention in the machine learning community. Although many methods have been proposed in the past decades, most of them lack theoretical foundations and cannot guarantee better generalization performance of classifiers on processed datasets. To overcome this flaw, in this paper, we propose a method, which is supported by Bayesian decision theory and percolation theory, to improve generalization performance by removing Bayesian confusing samples (abbr. BCS). Specifically, for a training set, we define the samples that misclassified by the Bayesian optimal classifier as BCS and prove that a classifier trained on the training set after removing BCS can obtain better generalization performance. To find out BCS, we indicate that BCS can be identified according to the size of global homogeneous cluster, a set of samples with the same labels, based on percolation theory. Based on these analysis, we propose a method to construct global homogeneous clusters and remove BCS from the training set. Extensive experiments show that the proposed method is effective for a number of classical and state-of-the-art classifiers.
Published in: IEEE Transactions on Knowledge and Data Engineering ( Volume: 36, Issue: 3, March 2024)
Page(s): 1084 - 1098
Date of Publication: 09 August 2023

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.