Data reduction is crucial in order to turn large datasets into information, the major purpose of data science. The classic and richer area of dimensionality reduction (DR) has traditionally been based on feature extraction by combining primary features in a linear fashion, aiming to preserve or maintain covariance/correlations between the features. Nonlinear alternatives have been developed, including information-theoretic approaches using mutual information as well and conditional entropy based on target features. Here, we further this approach to feature selection or reduction strategy based on the concept of conditional Shannon entropy of two random variables. Novel results include (a) a dimensionality reduction method based on conditional entropy between predictors themselves along two variants, disregarding the influence of the target feature; (b) an error-prevention method inspired by error-detection and correction in information theory for DR with genomic data that can be used for abiotic data as well; and (c) a comparative assessment of the performance of several machine learning models on input features selected by these methods. We assess the quality of the techniques based on their performance in solving three application problems (Malware Classification, BioTaxonomy, and Noisy Classification) of various degrees of difficulty with competitive outcomes. Some useful heuristics arise from the analysis of the results and also suggest some problems of interest for further research.

All data for the MC and BT problem are publicly available. The synthetic dataset used in the Noisy classification problem is available in the supplementary materials.
Code Availability Statement
The software used to run these applications is a part of the sklearn package, or is publicly available at bmc.memphis.edu/DSAx/, where 3D plots that allow full comparison of the results on all models and data sets can also be found.
The use of HPC at the U of Memphis for processing datasets and training models is gratefully acknowledged. We are also grateful to the reviewers for valuable comments that resulted in substantial improvements to the quality and presentation of this work.
