Abstract
Although efficient identification of user access sessions from very large web logs is an unavoidable data preparation task for the success of higher level web log mining, little attention has been paid to algorithmic study of this problem. In this paper we consider two types of user access sessions, interval sessions and gap sessions. We design two efficient algorithms for finding respectively those two types of sessions with the help of new data structures. We present both theoretical and empirical analysis of the algorithms and prove that both algorithms have optimal time complexity.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
B. Berendt and M. Spiliopoulou. Analysis of navigation behavior in web sites integrating multiple information systems. The VLDB Journal, 9:56–75, 2000.
Bettina Berendt, Bamshad Mobasher, Myra Spiliopoulou, and Jim Wiltshire. Measuring the accuracy of sessionizers for web usage analysis. Proceedings of the Workshop on Web Mining at the First SIAM International Conference on Data Mining, pages 7–14, April 2001.
Alex G. Buchner and Maurice D. Mulvenna. Discovering internet marketing intelligence through online analytical web usage mining. ACM SIGMOD RECORD, pages 54–61, Dec. 1998.
L. Catledge and J. Pitkow. Characterizing browsing behaviors on the world wide web. Computer Networks and ISDN Systems, 27, 1995.
M.S. Chen, J.S. Park, and P.S. Yu. Efficient data mining for path traversal patterns. IEEE Transactions on Knowledge and Data Engineering, 10:2:209–221, 1998.
Robert Cooley, Bamshad Mobasher, and Jaidep Srivastava. Data preparation for mining world wide web browsing patterns. Journal of Knowledge and Information Systems, 1:1, 1999.
M. Perkowitz and O. Etzioni. Adaptive web pages: Automatically synthesizing web pages. Proceedings of AAAI/IAAI’98, pages 727–732, 1998.
J. Pitkow. In search of reliable usage data on the WWW. Proceedings of the Sixth World Wide Web Conference. pages 451–463, Santa Clara, CA, 1997.
P. Pirolli, J. Pitkow, and R. Rao. Silk from sow’s ear: Extracting usable structures from the Web. Proceedings of the 1996 Conference on Human Factors in Computing Systems (CHI’96). Vancouver, British Columbia, Canada, 1996.
Myra Spiliopoulou and Lukas C. Faulstich. Wum: A tool for web utilization analysis. Proceedings of EDBT Workshop WebDB’98, LNCS1590, pages 184–203. Springer Verlag, 1999.
Myra Spiliopoulou, Carsten Pohle, and Lukas C. Faulstich. Improving the effectiveness of a web site with web usage mining. KDD’99 Workshop on Web Usage Analysis and User Profiling WEBKDD’99, Aug, 1999.
W3C. World wide web committee web usage characterization activity. W3C Working Draft: Web Characterization Terminology and Definitions Sheet, pages www.w3.org/1999/05/WCA-terms/, 1999.
Osmar Zaïane, Man Xin, and Jiawei Han. Discovering web access patterns and trends by applying olap and data mining technology on web logs. Advances in Digital Libraries, pages 19–29, April, 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, Z., Fu, A.WC., Tong, F.CH. (2002). Optimal Algorithms for Finding User Access Sessions from Very Large Web Logs. In: Chen, MS., Yu, P.S., Liu, B. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2002. Lecture Notes in Computer Science(), vol 2336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47887-6_28
Download citation
DOI: https://doi.org/10.1007/3-540-47887-6_28
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43704-8
Online ISBN: 978-3-540-47887-4
eBook Packages: Springer Book Archive