research-article

Evolutionary Feature Subset Selection with Compression-based Entropy Estimation

Authors:
Pavel Krömer

VSB - Technical University of Ostrava, Ostrava, Czech Rep

VSB - Technical University of Ostrava, Ostrava, Czech Rep
View Profile

,
Jan Platoš

VSB - Technical University of Ostrava, Ostrava, Czech Rep

VSB - Technical University of Ostrava, Ostrava, Czech Rep
View Profile

GECCO '16: Proceedings of the Genetic and Evolutionary Computation Conference 2016July 2016Pages 933–940https://doi.org/10.1145/2908812.2908853

Published:20 July 2016Publication History

GECCO '16: Proceedings of the Genetic and Evolutionary Computation Conference 2016

Pages 933–940

ABSTRACT

Modern massive data sets often comprise of millions of records and thousands of features. Their efficient processing by traditional methods represents an increasing challenge. Feature selection methods form a family of traditional instruments for data dimensionality reduction. They aim at selecting subsets of data features so that the loss of information, contained in the full data set, is minimized. Evolutionary feature selection methods have shown good ability to identify feature subsets in very-high-dimensional data sets. Their efficiency depends, among others, on a particular optimization algorithm, feature subset representation, and objective function definition. In this paper, two evolutionary methods for fixed-length subset selection are employed to find feature subsets on the basis of their entropy, estimated by a fast data compression algorithm. The reasonability of the fitness criterion, ability of the investigated methods to find good feature subsets, and the usefulness of selected feature subsets for practical data mining, is evaluated using two well-known data sets and several widely-used classification algorithms.

References

M. Affenzeller, S. Winkler, S. Wagner, and A. Beham. Genetic Algorithms and Genetic Programming: Modern Concepts and Practical Applications. Chapman & Hall/CRC, 2009. Google ScholarDigital Library
C. C. Aggarwal and P. S. Yu. Outlier detection for high dimensional data. SIGMOD Rec., 30(2):37--46, 2001. Google ScholarDigital Library
A. L. Berger, V. J. D. Pietra, and S. A. D. Pietra. A maximum entropy approach to natural language processing. Comput. Linguist., 22(1):39--71, 1996. Google ScholarDigital Library
J. Biesiada, W. Duch, A. Kachel, K. Maczka, and S. Palucha. Feature ranking methods based on information entropy with parzen windows. In Int. Conf. on Research in Electrotechnology and Applied Informatics, vol. 1, p. 1, 2005.Google Scholar
M. Burtscher and P. Ratanaworabhan. Fpc: A high-speed compressor for double-precision floating-point data. IEEE Trans. on Computers, 58(1):18--31, 2009. Google ScholarDigital Library
V. A. Cicirello. Non-wrapping order crossover: An order preserving crossover operator that respects absolute position. In Proc. of the 8th Annual Conf. on Genetic and Evolutionary Computation, GECCO '06, pp. 1125--1132, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
R. Cilibrasi and P. Vitanyi. Clustering by compression. Information Theory, IEEE Trans. on, 51(4):1523--1545, 2005. Google ScholarDigital Library
A. Czarn, C. MacNish, K. Vijayan, and B. A. Turlach. Statistical exploratory analysis of genetic algorithms: The influence of gray codes upon the difficulty of a problem. In Australian Conf. on Art. Int., vol. 3339 of LNCS, pp. 1246--1252. Springer, 2004. Google ScholarDigital Library
R. Diao and Q. Shen. Nature inspired feature selection meta-heuristics. Art. Int. Rev., 44(3):311--340, 2015. Google ScholarDigital Library
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer New York, 2013.Google Scholar
P. Jaganathan and R. Kuppuchamy. A threshold fuzzy entropy based feature selection for medical database classification. Computers in Biology and Medicine, 43(12):2222 -- 2229, 2013. Google ScholarDigital Library
F. Jiang, Y. Sui, and L. Zhou. A relative decision entropy-based feature selection approach. Pattern Recognition, 48(7):2151 -- 2163, 2015. Google ScholarDigital Library
A. N. Kolmogorov. Three approaches to the quantitative definition of information. Problems of Information Transmission, 1(1):1--7, 1965.Google Scholar
P. Kromer, J. Platos, and V. Snasel. Traditional and self-adaptive differential evolution for the p-median problem. In Cybernetics (CYBCONF), 2015 IEEE 2nd Int. Conf. on, pp. 299--304, 2015.Google ScholarCross Ref
P. Krömer and J. Platos. Genetic algorithm for sampling from scale-free data and networks. In Proc. of the 2014 Annual Conf. on Genetic and Evolutionary Computation, GECCO '14, pp. 793--800, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
C. Largeron, C. Moulin, and M. Géry. Entropy based feature selection for text categorization. In Proc. of the 2011 ACM Symposium on Applied Computing, SAC '11, pp. 924--928, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
M. Li, X. Chen, X. Li, B. Ma, and P. Vitanyi. The similarity metric. Inf. Theory, IEEE Trans. on, 50(12):3250--3264, 2004. Google ScholarDigital Library
M. Lichman. UCI machine learning repository, 2013.Google Scholar
H. Liu and H. Motoda. Feature Selection for Knowledge Discovery and Data Mining. The Springer Int. Series in Eng. and Comp. Sci. Springer US, 2013.Google Scholar
M. Nilsson and W. Kleijn. On the estimation of differential entropy from data located on embedded manifolds. Information Theory, IEEE Trans. on, 53(7):2330--2341, 2007. Google ScholarDigital Library
K. V. Price, R. M. Storn, and J. A. Lampinen. Differential Evolution A Practical Approach to Global Optimization. Natural Comp. Series. Springer-Verlag, Berlin, Germany, 2005. Google ScholarDigital Library
J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco, CA, USA, 1993. Google ScholarDigital Library
N. Salkind. Encyclopedia of Measurement and Statistics. SAGE Publications, 2006.Google Scholar
C. Shannon. A mathematical theory of communication. Bell System Technical Journal, The, 27(3):379--423, 1948.Google Scholar
M. Verleysen and D. François. IWANN 2005, Proc. of, ch. The Curse of Dimensionality in Data Mining and Time Series Prediction, pp. 758--770. Springer Berlin Heidelberg, 2005. Google ScholarDigital Library
P. M. B. Vitányi. Universal similarity. In Information Theory Workshop, 2005 IEEE, pp. 6 pp.--, Aug 2005.Google ScholarCross Ref
A. S. Wu, R. K. Lindsay, and R. Riolo. Empirical observations on the roles of crossover and mutation. In T. Back, editor, Proc. of the Seventh Int. Conf. on Genetic Algorithms, pp. 362--369, San Francisco, CA, 1997. Morgan Kaufmann.Google Scholar
H. Xiong, G. Pandey, M. Steinbach, and V. Kumar. Enhancing data analysis with noise removal. IEEE Trans. on Knowl. and Data Eng., 18(3):304--319, 2006. Google ScholarDigital Library
K. Zeng, K. She, and X. Niu. Feature selection with neighborhood entropy-based cooperative game theory. Comp. int. and neuroscience, 2014:11, 2014. Google ScholarDigital Library
Y. Zheng and C. K. Kwoh. A feature subset selection method based on high-dimensional mutual information. Entropy, 13(4):860, 2011.Google ScholarCross Ref

Index Terms

Evolutionary Feature Subset Selection with Compression-based Entropy Estimation

Recommendations

Feature subset selection using multimodal multiobjective differential evolution
Abstract
The main aim of feature subset selection is to find the minimum number of required features to perform classification without affecting the accuracy. It is one of the useful real-world applications for different types of classification ...
Highlights
- Multimodal multiobjective feature subset selection using differential evolution is explored in this paper.
Read More
Scalable feature subset selection for big data using parallel hybrid evolutionary algorithm based wrapper under apache spark environment
Abstract
Extant sequential wrapper-based feature subset selection (FSS) algorithms are not scalable and yield poor performance when applied to big datasets. Hence, to circumvent these challenges, we propose parallel and distributed hybrid evolutionary ...
Read More
Feature subset selection based on fuzzy entropy measures for handling classification problems

In this paper, we present a new method for dealing with feature subset selection based on fuzzy entropy measures for handling classification problems. First, we discretize numeric features to construct the membership function of each fuzzy set of a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
GECCO '16: Proceedings of the Genetic and Evolutionary Computation Conference 2016
July 2016
1196 pages
ISBN:9781450342063
DOI:10.1145/2908812
Editor:
Tobias Friedrich
Hasso Plattner Institute
,
General Chair:
Frank Neumann
University of Adelaide
,
Program Chair:
Andrew M. Sutton
Hasso Plattner Institute
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 July 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
differential evolution
entropy estimation
feature subset selection
genetic algorithms
Qualifiers
- research-article
Conference

Acceptance Rates
GECCO '16 Paper Acceptance Rate137of381submissions,36%Overall Acceptance Rate1,669of4,410submissions,38%
More
Upcoming Conference
GECCO '24

Sponsor:

sigevo

Genetic and Evolutionary Computation Conference

July 14 - 18, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 108
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Evolutionary Feature Subset Selection with Compression-based Entropy Estimation

GECCO '16: Proceedings of the Genetic and Evolutionary Computation Conference 2016

ABSTRACT

References

Cited By

Index Terms

Recommendations

Feature subset selection using multimodal multiobjective differential evolution

Scalable feature subset selection for big data using parallel hybrid evolutionary algorithm based wrapper under apache spark environment

Feature subset selection based on fuzzy entropy measures for handling classification problems