Skip to main content

A Hybrid Approach to Web Usage Mining

  • Conference paper
  • First Online:
Data Warehousing and Knowledge Discovery (DaWaK 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2454))

Included in the following conference series:

Abstract

With the large number of companies using the Internet to distribute and collect information, knowledge discovery on the web has become an important research area.Web usage mining, which is the main topic of this paper, focuses on knowledge discovery from the clicks in the web log for a given site (the so-called click-stream), especially on analysis of sequences of clicks. Existing techniques for analyzing click sequences have different drawbacks, i.e., either huge storage requirements, excessive I/O cost, or scalability problems when additional information is introduced into the analysis.

In this paper we present a new hybrid approach for analyzing click sequences that aims to overcome these drawbacks. The approach is based on a novel combination of existing approaches, more specifically the Hypertext Probabilistic Grammar (HPG) and Click Fact Table approaches. The approach allows for additional information, e.g., user demographics, to be included in the analysis without introducing performance problems. The development is driven by experiences gained from industry collaboration. A prototype has been implemented and experiments are presented that show that the hybrid approach performs well compared to the existing approaches. This is especially true when mining sessions containing clicks with certain characteristics, i.e., when constraints are introduced. The approach is not limited to web log analysis, but can also be used for general sequence mining tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J. Andersen, A. Giversen, A. H. Jensen, R. S. Larsen, T. B. Pedersen, and J. Skyt. Analyzing clickstreams using subsessions. In Proceedings of the Second International Workshop on Data Warehousing and OLAP, 2000.

    Google Scholar 

  2. R. Agrawal and R. Srikant. Mining sequential patterns. In Proceedings of the 11th International Conference on Data Engineering, 1995.

    Google Scholar 

  3. J. Borges. A Data Mining Model to Capture User Web Navigation Patterns. PhD thesis, Department of Computer Science, University College London, 2000.

    Google Scholar 

  4. J. Borges and M. Levene. Data mining of user navigation patterns. In Proceedings of WEBKDD, 1999.

    Google Scholar 

  5. J. Borges and M. Levene. Heuristics for mining high quality user web navigation patterns. Research Note RN/99/68. Department of Computer Science, University College London, Gower Street, London, UK, 1999.

    Google Scholar 

  6. J. Borges and M. Levene. A fine grained heuristic to capture web navigation patterns. SIGKDD Explorations, 2000.

    Google Scholar 

  7. A.G. Büchner, S.S. Anand, M.D. Mulvenna, and J.G. Hughes. Discovering internet marketing intelligence through web log mining. In Proceedings of UNICOM99, 1999.

    Google Scholar 

  8. R. Cooley, J. Srivastava, and B. Mobasher. Web mining: Information and pattern discovery on the world wide web. In Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’97), 1997.

    Google Scholar 

  9. R. Cooley, P. Tan, and J. Srivastava. Websift: the web site information filter system. In Proceedings of the 1999 KDD Workshop on Web Mining, 1999.

    Google Scholar 

  10. S. Jespersen, T. B. Pedersen, and J. Thorhauge. A Hybrid Approach toWeb Usage Mining-Technical Report R02-5002 Dept. of CS, Aalborg University, 2002

    Google Scholar 

  11. J. Han and M. Kamber. Data Mining — Concepts and Techniques. Morgan Kaufmann, 2000.

    Google Scholar 

  12. R. Kimball and R. Merz. The Data Webhouse Toolkit. Wiley, 2000.

    Google Scholar 

  13. J. Pei, J. Han, B. Mortazavi-Asl, and H. Zhu. Mining access patterns efficiently from web logs. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2000.

    Google Scholar 

  14. J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M. Hsu. PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth. In Proceedings of the 17th International Conference on Data Engineering.

    Google Scholar 

  15. Sawmill, http://www.sawmill.net.

  16. M. Spiliopoulou and L. C. Faulstich. WUM: a Web Utilization Miner. In Proceedings of the Workshop on the Web and Data Bases, 1998.

    Google Scholar 

  17. R. Srikant and R. Agrawal. Mining Sequential Patterns: Generalizations and Performance Improvements. In Proceedings of the EDBT Conference, 1996.

    Google Scholar 

  18. T. Cormen et. al. Introduction to Algorithms MIT Press, 2001.

    Google Scholar 

  19. WebTrends LogAnalyzer. http://www.webtrends.com/products/log/.

  20. K.-L. Wu, P. S. Yu, and A. Ballman. Speedtracer:A web usage mining and analysis tool. IBM System Journal, Internet Computing, Volume 37, 1998.

    Google Scholar 

  21. Zenaria A/S. http://www.zenaria.com.

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jespersen, S.E., Thorhauge, J., Pedersen, T.B. (2002). A Hybrid Approach to Web Usage Mining. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2002. Lecture Notes in Computer Science, vol 2454. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46145-0_8

Download citation

  • DOI: https://doi.org/10.1007/3-540-46145-0_8

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44123-6

  • Online ISBN: 978-3-540-46145-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics