Abstract.
We introduce a partitioning-based distributed document-clustering algorithm using user access patterns from multi-server web sites. Our algorithm makes it possible to exploit simultaneously adaptive document replication and persistent connections, two techniques that are most effective in decreasing the response time that is observed by web users. The algorithm first distributes the user access data evenly among the servers by using a hash function. Then, each server generates a local clustering on its fair share of the user sessions records by employing a traditional single-machine document-clustering algorithm. Finally, those local clustering results are combined together by using a novel procedure that generates maximal large itemsets of web documents. We present preliminary experimental results and discuss alternative approaches to be pursued in the future.
Similar content being viewed by others
Author information
Authors and Affiliations
Additional information
Received 30 August 2000 / Revised 30 January 2001 / Accepted in revised form 9 May 2001
Rights and permissions
About this article
Cite this article
Sayal, M., Scheuermann, P. Distributed Web Log Mining Using Maximal Large Itemsets. Knowledge and Information Systems 3, 389–404 (2001). https://doi.org/10.1007/PL00011675
Issue Date:
DOI: https://doi.org/10.1007/PL00011675