skip to main content
10.1145/2739482.2768451acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

An Evolutionary Missing Data Imputation Method for Pattern Classification

Published: 11 July 2015 Publication History

Abstract

Data analysis plays an important role in our Information Era; however, most of statistical and machine learning algorithms were not developed to tackle the ubiquitous issue of missing values. In pattern classification, several strategies have been proposed to handle this problem, where missing data imputation is the most used one, which can be viewed as an optimization problem where the goal is to reduce the bias imposed by the absence of information. Although most imputation methods are restricted to one type of variable only (categorical or numerical), they usually ignore information within incomplete instances. To fill these gaps, we propose an evolutionary missing data imputation method for pattern classification, based on a genetic algorithm, which is suitable for mixed-attribute datasets and takes into account information from incomplete instances and model building -- more specifically, the classification accuracy. To assess the performance of our method, we used three algorithms in order to represent the three groups of classification methods: 1) rule induction learning, 2) approximate models and 3) lazy learning. Experiments have shown that the proposed method outperforms some well-established missing value treatment methods.

References

[1]
M. Abdella and T. Marwala. The use of genetic algorithms and Neural Networks to approximate missing data in database. Computing and Informatics, 24:577--589, 2005.
[2]
A. Aussem and S. Rodrigues de Morais. A conservative feature subset selection algorithm with missing data. Neurocomputing, 73(4--6):585--590, Jan. 2010.
[3]
G. E. A. P. A. Batista and M. C. Monard. An analysis of four missing data treatmente methods for supervised learning. Applied Artificial Intelligence, (Dm):519--533, 2003.
[4]
M. L. Brown and J. F. Kros. The Impact of Missing Data on Data Mining. In J. Wang, editor, Data Mining: Opportunities and Challenges, chapter The impact, pages 174--198. IGI Publishing, Hershey, PA, USA, 2003.
[5]
E. Eirola, A. Lendasse, V. Vandewalle, and C. Biernacki. Mixture of Gaussians for distance estimation with missing data. Neurocomputing, 131:32--42, May 2014.
[6]
J. Figueroa García, D. Kalenatic, and C. Lopez Bello. Missing Data Imputation in Time Series by Evolutionary Algorithms. In D.-S. Huang, D. Wunsch II, D. Levine, and K.-H. Jo, editors, Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence SE - 34, volume 5227 of Lecture Notes in Computer Science, pages 275--283. Springer Berlin Heidelberg, 2008.
[7]
J. C. Figueroa García, D. Kalenatic, and C. A. López Bello. An Evolutionary Approach for Imputing Missing Data in Time Series. Journal of Circuits, Systems and Computers, 19(01):107--121, Feb. 2010.
[8]
J. C. Figueroa García, D. Kalenatic, and C. A. Lopez Bello. Missing data imputation in multivariate data by evolutionary algorithms. Computers in Human Behavior, 27(5):1468--1474, Sept. 2011.
[9]
A. A. Freitas. Data Mining and Knowledge Discovery with Evolutionary Algorithms. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2002.
[10]
B. Gabrys. Neuro-fuzzy approach to processing inputs with missing values in pattern recognition problems. International Journal of Approximate Reasoning, 30(3):149--179, Sept. 2002.
[11]
P. J. García-Laencina, J.-L. Sancho-Gómez, and A. R. Figueiras-Vidal. Pattern classification with missing data: a review. Neural Computing and Applications, 19(2):263--282, Sept. 2009.
[12]
M. Ghannad-Rezaie, H. Soltanian-Zadeh, H. Ying, and M. Dong. Selection-Fusion Approach for Classification of Datasets with Missing Values. Pattern recognition, 43(6):2340--2350, June 2010.
[13]
J. W. Graham. Missing data analysis: making it work in the real world. Annual review of psychology, 60:549--76, Jan. 2009.
[14]
E. R. Hruschka, A. J. T. Garcia, E. R. Hruschka Jr., and N. F. F. Ebecken. On the influence of imputation in classification: practical issues. Journal of Experimental & Theoretical Artificial Intelligence, 21(1):43--58, Mar. 2009.
[15]
E. R. Hruschka, E. R. Hruschka, and N. F. F. Ebecken. A Bayesian imputation method for a clustering genetic algorithm. Journal of Computational Methods in Sciences and Engineering, 11:173--183, 2011.
[16]
T. Liao, K. Socha, M. A. Montes de Oca, T. Stutzle, and M. Dorigo. Ant Colony Optimization for Mixed-Variable Optimization Problems. Evolutionary Computation, IEEE Transactions on, 18(4):503--518, 2014.
[17]
M. Lichman. UCI machine learning repository, 2013.
[18]
C.-P. Lim, J.-H. Leong, and M.-M. Kuan. A hybrid neural network system for pattern classification tasks with missing features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(4):648--653, 2005.
[19]
R. J. A. Little and D. B. Rubin. Statistical analysis with missing data. Wiley series in probability and mathematical statistics: Applied probability and statistics. Wiley, 1 edition, 1987.
[20]
R. J. A. Little and D. B. Rubin. Statistical Analysis with missing data. Wiley, New York, 2 edition, 2002.
[21]
Y. Liu and S. D. Brown. Comparison of five iterative imputation methods for multivariate classification. Chemometrics and Intelligent Laboratory Systems, 120:106--115, Jan. 2013.
[22]
J. Luengo, S. García, and F. Herrera. On the choice of the best imputation methods for missing values considering three groups of classification methods. Knowledge and Information Systems, 32(1):77--108, June 2011.
[23]
P. Mcknight, K. Mcknight, S. Sidani, and A. Figueredo. Missing Data: A Gentle Introduction (Methodology In The Social Sciences). The Guilford Press, Apr. 2007.
[24]
V. Miranda, J. Krstulovic, H. Keko, C. Moreira, and J. Pereira. Reconstructing missing data in state estimation with autoenconders. IEEE Transactions on Power Systems, 27, 2012.
[25]
L. Nanni, A. Lumini, and S. Brahnam. A classifier ensemble approach for the missing feature problem. Artificial Intelligence in Medicine, 55(1):37--50, May 2012.
[26]
D. B. Rubin. Inference and missing data. Biometrika, 63(3):581--592, 1976.
[27]
J. L. Schafer. Analysis of Incomplete Multivariate Data, volume 11 of C&H/CRC Monographs on Statistics & Applied Probability. Chapman & Hall, 1997.
[28]
J.L. Schafer and J. W. Graham. Missing data: our view of the state of the art. Psychological Methods, 7(2):147--177, 2002.
[29]
J.D.A. Silva and E.R. Hruschka. An experimental study on the use of nearest neighbor-based imputation algorithms for classification tasks. Data & Knowledge Engineering, 84:47--58, Jan. 2013.
[30]
D. J. Stekhoven and P. Bühlmann. MissForest--non-parametric missing value imputation for mixed-type data. Bioinformatics (Oxford, England), 28(1):112--8, Jan. 2012.
[31]
L. Wohlrab and J. Fürnkranz. A review and comparison of strategies for handling missing values in separate-and-conquer rule learning. Journal of Intelligent Information Systems, 36(1):73--98, Apr. 2010.
[32]
S. Zhang. Nearest neighbor selection for iteratively kNN imputation. Journal of Systems and Software, 85(11):2541--2552, Nov. 2012.
[33]
S. Zhang, Z. Jin, and X. Zhu. Missing value estimation for mixed-attribute data sets. IEEE Transactions on Knowledge and Data Engineering, 23(3):110--121, Mar. 2011.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
GECCO Companion '15: Proceedings of the Companion Publication of the 2015 Annual Conference on Genetic and Evolutionary Computation
July 2015
1568 pages
ISBN:9781450334884
DOI:10.1145/2739482
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data imputation
  2. evolutionary computing
  3. genetic algorithms
  4. missing data

Qualifiers

  • Research-article

Funding Sources

  • onselho Nacional de Desenvolvimento Científico e Tecnológico

Conference

GECCO '15
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)1
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Missing Value Imputation Designs and Methods of Nature-Inspired Metaheuristic Techniques: A Systematic ReviewIEEE Access10.1109/ACCESS.2022.317231910(61544-61566)Online publication date: 2022
  • (2022)EvoImputerKnowledge-Based Systems10.1016/j.knosys.2021.107734236:COnline publication date: 25-Jan-2022
  • (2021)A new imputation method based on genetic programming and weighted KNN for symbolic regression with incomplete dataSoft Computing10.1007/s00500-021-05590-yOnline publication date: 7-Feb-2021
  • (2019)A Genetic Asexual Reproduction Optimization Algorithm for Imputing Missing Values2019 9th International Conference on Computer and Knowledge Engineering (ICCKE)10.1109/ICCKE48569.2019.8964808(214-218)Online publication date: Oct-2019
  • (2019)Application of the Modified Imputation Method to Missing Data to Increase Classification Performance2019 IEEE 4th International Conference on Computer and Communication Systems (ICCCS)10.1109/CCOMS.2019.8821632(134-139)Online publication date: Feb-2019
  • (2019)Metaheuristic approaches in biopharmaceutical process development data analysisBioprocess and Biosystems Engineering10.1007/s00449-019-02147-0Online publication date: 22-May-2019
  • (2017)Gap Filling of Missing Streaming Data in a Network of Intelligent Surveillance CamerasProceedings of the 23rd Brazillian Symposium on Multimedia and the Web10.1145/3126858.3131585(309-312)Online publication date: 17-Oct-2017
  • (2016)Time Series Imputation Using Genetic Programming and Lagrange Interpolation2016 5th Brazilian Conference on Intelligent Systems (BRACIS)10.1109/BRACIS.2016.040(169-174)Online publication date: Oct-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media