To read this content please select one of the options below:

Data mining and the impact of missing data

Marvin L. Brown (School of Business, Hawaii Pacific University, Honolulu, Hawaii, USA)
John F. Kros (Department of Decision Sciences, East Carolina University, Greenville, North Carolina, USA)

Industrial Management & Data Systems

ISSN: 0263-5577

Article publication date: 1 November 2003

7082

Abstract

The actual data mining process deals significantly with prediction, estimation, classification, pattern recognition and the development of association rules. Therefore, the significance of the analysis depends heavily on the accuracy of the database and on the chosen sample data to be used for model training and testing. Data mining is based upon searching the concatenation of multiple databases that usually contain some amount of missing data along with a variable percentage of inaccurate data, pollution, outliers and noise. The issue of missing data must be addressed since ignoring this problem can introduce bias into the models being evaluated and lead to inaccurate data mining conclusions. The objective of this research is to address the impact of missing data on the data mining process.

Keywords

Citation

Brown, M.L. and Kros, J.F. (2003), "Data mining and the impact of missing data", Industrial Management & Data Systems, Vol. 103 No. 8, pp. 611-621. https://doi.org/10.1108/02635570310497657

Publisher

:

MCB UP Ltd

Copyright © 2003, MCB UP Limited

Related articles