Skip to main content

Optimal Algorithms for Finding User Access Sessions from Very Large Web Logs

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2336))

Abstract

Although efficient identification of user access sessions from very large web logs is an unavoidable data preparation task for the success of higher level web log mining, little attention has been paid to algorithmic study of this problem. In this paper we consider two types of user access sessions, interval sessions and gap sessions. We design two efficient algorithms for finding respectively those two types of sessions with the help of new data structures. We present both theoretical and empirical analysis of the algorithms and prove that both algorithms have optimal time complexity.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. B. Berendt and M. Spiliopoulou. Analysis of navigation behavior in web sites integrating multiple information systems. The VLDB Journal, 9:56–75, 2000.

    Article  Google Scholar 

  2. Bettina Berendt, Bamshad Mobasher, Myra Spiliopoulou, and Jim Wiltshire. Measuring the accuracy of sessionizers for web usage analysis. Proceedings of the Workshop on Web Mining at the First SIAM International Conference on Data Mining, pages 7–14, April 2001.

    Google Scholar 

  3. Alex G. Buchner and Maurice D. Mulvenna. Discovering internet marketing intelligence through online analytical web usage mining. ACM SIGMOD RECORD, pages 54–61, Dec. 1998.

    Google Scholar 

  4. L. Catledge and J. Pitkow. Characterizing browsing behaviors on the world wide web. Computer Networks and ISDN Systems, 27, 1995.

    Google Scholar 

  5. M.S. Chen, J.S. Park, and P.S. Yu. Efficient data mining for path traversal patterns. IEEE Transactions on Knowledge and Data Engineering, 10:2:209–221, 1998.

    Article  Google Scholar 

  6. Robert Cooley, Bamshad Mobasher, and Jaidep Srivastava. Data preparation for mining world wide web browsing patterns. Journal of Knowledge and Information Systems, 1:1, 1999.

    Google Scholar 

  7. M. Perkowitz and O. Etzioni. Adaptive web pages: Automatically synthesizing web pages. Proceedings of AAAI/IAAI’98, pages 727–732, 1998.

    Google Scholar 

  8. J. Pitkow. In search of reliable usage data on the WWW. Proceedings of the Sixth World Wide Web Conference. pages 451–463, Santa Clara, CA, 1997.

    Google Scholar 

  9. P. Pirolli, J. Pitkow, and R. Rao. Silk from sow’s ear: Extracting usable structures from the Web. Proceedings of the 1996 Conference on Human Factors in Computing Systems (CHI’96). Vancouver, British Columbia, Canada, 1996.

    Google Scholar 

  10. Myra Spiliopoulou and Lukas C. Faulstich. Wum: A tool for web utilization analysis. Proceedings of EDBT Workshop WebDB’98, LNCS1590, pages 184–203. Springer Verlag, 1999.

    Google Scholar 

  11. Myra Spiliopoulou, Carsten Pohle, and Lukas C. Faulstich. Improving the effectiveness of a web site with web usage mining. KDD’99 Workshop on Web Usage Analysis and User Profiling WEBKDD’99, Aug, 1999.

    Google Scholar 

  12. W3C. World wide web committee web usage characterization activity. W3C Working Draft: Web Characterization Terminology and Definitions Sheet, pages www.w3.org/1999/05/WCA-terms/, 1999.

  13. Osmar Zaïane, Man Xin, and Jiawei Han. Discovering web access patterns and trends by applying olap and data mining technology on web logs. Advances in Digital Libraries, pages 19–29, April, 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chen, Z., Fu, A.WC., Tong, F.CH. (2002). Optimal Algorithms for Finding User Access Sessions from Very Large Web Logs. In: Chen, MS., Yu, P.S., Liu, B. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2002. Lecture Notes in Computer Science(), vol 2336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47887-6_28

Download citation

  • DOI: https://doi.org/10.1007/3-540-47887-6_28

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43704-8

  • Online ISBN: 978-3-540-47887-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics