Abstract
With the large number of companies using the Internet to distribute and collect information, knowledge discovery on the web has become an important research area.Web usage mining, which is the main topic of this paper, focuses on knowledge discovery from the clicks in the web log for a given site (the so-called click-stream), especially on analysis of sequences of clicks. Existing techniques for analyzing click sequences have different drawbacks, i.e., either huge storage requirements, excessive I/O cost, or scalability problems when additional information is introduced into the analysis.
In this paper we present a new hybrid approach for analyzing click sequences that aims to overcome these drawbacks. The approach is based on a novel combination of existing approaches, more specifically the Hypertext Probabilistic Grammar (HPG) and Click Fact Table approaches. The approach allows for additional information, e.g., user demographics, to be included in the analysis without introducing performance problems. The development is driven by experiences gained from industry collaboration. A prototype has been implemented and experiments are presented that show that the hybrid approach performs well compared to the existing approaches. This is especially true when mining sessions containing clicks with certain characteristics, i.e., when constraints are introduced. The approach is not limited to web log analysis, but can also be used for general sequence mining tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
J. Andersen, A. Giversen, A. H. Jensen, R. S. Larsen, T. B. Pedersen, and J. Skyt. Analyzing clickstreams using subsessions. In Proceedings of the Second International Workshop on Data Warehousing and OLAP, 2000.
R. Agrawal and R. Srikant. Mining sequential patterns. In Proceedings of the 11th International Conference on Data Engineering, 1995.
J. Borges. A Data Mining Model to Capture User Web Navigation Patterns. PhD thesis, Department of Computer Science, University College London, 2000.
J. Borges and M. Levene. Data mining of user navigation patterns. In Proceedings of WEBKDD, 1999.
J. Borges and M. Levene. Heuristics for mining high quality user web navigation patterns. Research Note RN/99/68. Department of Computer Science, University College London, Gower Street, London, UK, 1999.
J. Borges and M. Levene. A fine grained heuristic to capture web navigation patterns. SIGKDD Explorations, 2000.
A.G. Büchner, S.S. Anand, M.D. Mulvenna, and J.G. Hughes. Discovering internet marketing intelligence through web log mining. In Proceedings of UNICOM99, 1999.
R. Cooley, J. Srivastava, and B. Mobasher. Web mining: Information and pattern discovery on the world wide web. In Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’97), 1997.
R. Cooley, P. Tan, and J. Srivastava. Websift: the web site information filter system. In Proceedings of the 1999 KDD Workshop on Web Mining, 1999.
S. Jespersen, T. B. Pedersen, and J. Thorhauge. A Hybrid Approach toWeb Usage Mining-Technical Report R02-5002 Dept. of CS, Aalborg University, 2002
J. Han and M. Kamber. Data Mining — Concepts and Techniques. Morgan Kaufmann, 2000.
R. Kimball and R. Merz. The Data Webhouse Toolkit. Wiley, 2000.
J. Pei, J. Han, B. Mortazavi-Asl, and H. Zhu. Mining access patterns efficiently from web logs. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2000.
J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M. Hsu. PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth. In Proceedings of the 17th International Conference on Data Engineering.
Sawmill, http://www.sawmill.net.
M. Spiliopoulou and L. C. Faulstich. WUM: a Web Utilization Miner. In Proceedings of the Workshop on the Web and Data Bases, 1998.
R. Srikant and R. Agrawal. Mining Sequential Patterns: Generalizations and Performance Improvements. In Proceedings of the EDBT Conference, 1996.
T. Cormen et. al. Introduction to Algorithms MIT Press, 2001.
WebTrends LogAnalyzer. http://www.webtrends.com/products/log/.
K.-L. Wu, P. S. Yu, and A. Ballman. Speedtracer:A web usage mining and analysis tool. IBM System Journal, Internet Computing, Volume 37, 1998.
Zenaria A/S. http://www.zenaria.com.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jespersen, S.E., Thorhauge, J., Pedersen, T.B. (2002). A Hybrid Approach to Web Usage Mining. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2002. Lecture Notes in Computer Science, vol 2454. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46145-0_8
Download citation
DOI: https://doi.org/10.1007/3-540-46145-0_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44123-6
Online ISBN: 978-3-540-46145-6
eBook Packages: Springer Book Archive