International Conference on Computational Science, ICCS 2010
Data preprocessing evaluation for web log mining: reconstruction of activities of a web visitor

https://doi.org/10.1016/j.procs.2010.04.255Get rights and content
Under a Creative Commons license
open access

Abstract

Presumptions of each data analysis are data themselves, regardless of the analysis focus (visit rate analysis, optimization of portal, personalization of portal, etc.). Results of selected analysis highly depend on the quality of analyzed data. In case of portal usage analysis, these data can be obtained by monitoring web server log file. We are able to create data matrices and web map based on these data which will serve for searching for behaviour patterns of users. Data preparation from the log file represents the most time-consuming phase of whole analysis. We realized an experiment so that we can find out to which criteria are necessary to realize this time-consuming data preparation. We aimed at specifying the inevitable steps that are required for obtaining valid data from the log file. Specially, we focused on the reconstruction of activities of the web visitor. This advanced technique of data preprocessing belongs to time consuming one. In the article we tried to assess the impact of reconstruction of activities of a web visitor on the quantity and quality of the extracted rules which represent the web users’ behaviour patterns.

Keywords

Data preprocessing
Data cleaning
Identification of sessions
Reconstruction of activities of a web visitor
Web log mining
Evaluation

Cited by (0)