As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
The Knowledge Discovery in Databases (KDD) methodology seems to be attractive on the analyze of large clinical databases. In the KDD process, the preprocessing step (data cleaning and handling of missing values) is paramount since it conditions the quality of the results obtained by data mining procedures and represents about 80% of the whole project time. The aims of the present study were to analyze this step and provide tools to handle inconsistent data and missing values. We have broken down the process into 3 main stages : data cleaning — explanatory study of missing values — choice of the procedure used for handling missing values. The data cleaning stage was based on a system of logical rules to correct mistakes and on cluster analysis to discard the poorly filled files. The missing-data mechanism was analyzed by means of multivariate statistical procedures. Two methods to deal with missing values were compared : imputation by the most common value (mode) and imputation using decision trees. This study was performed on a large medical diabetes database (23601 patients) including numerous missing values. A system of logical rules allowed to correct mistakes on essential parameters (for example, the type of diabetes). Cluster analysis allowed to identify 10% of poorly filled files. After multivariate analysis, the missing-data mechanism could be considered as random. For variables with low number of missing values (<10%) and categories(<4), imputation using decision trees provided better results than imputation by mode.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.