Abstract
We propose a data mining model that captures the user navigation behaviour patterns. The user navigation sessions are modelled as a hypertext probabilistic grammar whose higher probability strings correspond to the user’s preferred trails. An algorithm to efficiently mine such trails is given. We make use of the N gram model which assumes that the last N pages browsed affect the probability of the next page to be visited. The model is based on the theory of probabilistic grammars providing it with a sound theoretical foundation for future enhancements. Moreover, we propose the use of entropy as an estimator of the grammar’s statistical properties. Extensive experiments were conducted and the results show that the algorithm runs in linear time, the grammar’s entropy is a good estimator of the number of mined trails and the real data rules confirm the effectiveness of the model.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Azer Bestavros. Using speculation to reduce server load and service time on the www. In Proc. of the fourth ACM International Conference on Information and Knowledge Management, pages 403–410, Baltimore,MD, 1995.
J. Borges and M. Levene. Mining association rules in hypertext databases. In Proc. of the fourth Int. Conf. on Knowledge Discovery and Data Mining, pages 149–153, August 1998.
José Borges and Mark Levene. Heuristics for mining high quality user web navigation patterns. Research Note RN/99/68, Department of Computer Science, University College London, Gower Street, London, UK, October 1999.
Alex G. Büchner, M. Baumgarten, S.S. Anand, Maurice D. Mulvenna, and J.G. Hughes. Navigation pattern discovery from internet data. In Proc. of the Web Usage Analysis and User Profiling Workshop, pages 25–30, San Diego, California, August 1999.
Alex G. Büchner, Maurice D. Mulvenna, Sarab S. Anand, and John G. Hughes. An internet-enabled knowledge discovery process. In Proc. of 9th International Database Conference, pages 13–27, Hong Kong, July 1999.
Lara D. Catledge and James E. Pitkow. Characterizing browsing strategies in the world wide web. Computer Networks and ISDN Systems, 27(6):1065–1073, April 1995.
Soumen Chakrabarti, Byron E. Dom, David Gibson, Jon Kleinberg, Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew S. Tomkins. Mining the link structure of the world wide web. IEEE Computer, 32(8):60–67, August 1999.
E. Charniak. Statistical Language Learning. The MIT Press, 1996.
Christopher Chatfield. Statistical inferences regarding markov chain models. Applied Statistics, 22:7–20, 1973.
M.-S. Chen, J. S. Park, and P. S. Yu. Efficient data mining for traversal patterns. IEEE Trans. on Knowledge and Data Eng., 10(2):209–221, March/April 1998.
R. Cooley, B. Mobasher, and J. Srivastava. Data preparation for mining world wide web browsing patterns. Knowledge and Information Systems, 1(1):5–32, February 1999.
T. Cover and J. Thomas. Elements of Information Theory. John Wiley & Sons, 1991.
M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam, and S. Slattery. Learning to extract symbolic knowledge from the world wide web. In Proc. of the 15th National Conf. on Artificial Intelligence, pages 509–516, July 1998.
W. Feller. An Introduction to Probability Theory and Its Applications. John Wiley & Sons, second edition, 1968.
J. Hopcroft and J. Ullman. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, 1979.
M. Levene and G. Loizou. A probabilistic approach to navigation in hypertext. Information Sciences, 114:165–186, 1999.
Venkata N. Padmanabhan and Jeffrey C. Mogul. Using predictive prefetching to improve world wide web latency. Computer Communications Review, 26, 1996.
Mike Perkowitz and Oren Etzioni. Adaptive web sites: an AI challenge. In Proc. of Fifteenth International Joint Conference on Artificial Intelligence (IJCAI-97), pages 16–21, Nagoya, Japan, August 1997.
Mike Perkowitz and Oren Etzioni. Adaptive sites: Automatically synthesizing web pages. In Proc. of the Fifteenth National Conference on Artificial Intelligence (AAAI-98), pages 727–732, Madison, Wisconsin, July 1998.
Peter L.T. Pirolli and James E. Pitkow. Distributions of surfers’ paths through the world wide web: Empirical characterizations. World Wide Web, 2:29–45, 1999.
L. Rosenfeld and P. Morville. Information Architecture for the World Wide Web. O’Reilly, 1998.
S. Schechter, M. Krishnan, and M. D. Smith. Using path profiles to predict http requests. Computer Networks and ISDN Systems, 30:457–467, 1998.
M. Spiliopoulou, L. C. Faulstich, and K. Wilkler. A data miner analyzing the navigational behaviour of web users. In Proc. of the Workshop on Machine Learning in User Modelling of the ACAI99, Greece, July 1999.
Myra Spiliopoulou and Lukas C. Faulstich. WUM: a tool for web utilization analysis. In Proc. of the International Workshop on the Web and Databases (WebDB’98), pages 184–203, Valencia, Spain, March 1998.
R. Stout. Web Site Stats: tracking hits and analyzing traffic. Osborne McGraw-Hill, 1997.
C. S. Wetherell. Probabilistic languages: A review and some open questions. Computing Surveys, 12(4):361–379, December 1980.
T. W. Yan, M. Jacobsen, H. Garcia-Molina, and U. Dayal. From user access patterns to dynamic hypertext linking. In Proc. of the 5th Int. World Wide Web Conference, pages 1007–1014, 1996.
O. R. Zaïane, M. Xin, and J. Han. Discovering web access patterns and trends by applying olap and data mining technology on web logs. In Proc. Advances in Digital Libraries Conf., pages 12–29, Santa Barbara, CA, April 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Borges, J., Levene, M. (2000). Data Mining of User Navigation Patterns. In: Masand, B., Spiliopoulou, M. (eds) Web Usage Analysis and User Profiling. WebKDD 1999. Lecture Notes in Computer Science(), vol 1836. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44934-5_6
Download citation
DOI: https://doi.org/10.1007/3-540-44934-5_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67818-2
Online ISBN: 978-3-540-44934-8
eBook Packages: Springer Book Archive