Abstract:
In feature selection, a measure that captures nonlinear relationships between features and class is the mutual information (MI), which is based on how information in the ...Show MoreMetadata
Abstract:
In feature selection, a measure that captures nonlinear relationships between features and class is the mutual information (MI), which is based on how information in the features reduces the uncertainty in the output. In this paper, we propose a new measure that is related to MI, called neighborhood entropy, and a novel filter method based on its minimization in a greedy procedure. Our algorithm integrates sequential forward selection with approximated nearest-neighbors techniques and locality-sensitive hashing. Experiments show that the classification accuracy is usually higher than that of other state-of-the-art algorithms, with the best results obtained with problems that are highly unbalanced and nonlinearly separable. The order by which the features are selected is also better, leading to a higher accuracy for fewer features. The experimental results indicate that our technique can be employed effectively in offline scenarios when one can dedicate more CPU time to achieve superior results and more robustness to noise and to class imbalance.
Published in: IEEE Transactions on Neural Networks and Learning Systems ( Volume: 29, Issue: 12, December 2018)