Mining WWW Access Sequence by Matrix Clustering

Oyanagi, Shigeru; Kubota, Kazuto; Nakase, Akihiko

doi:10.1007/978-3-540-39663-5_8

Shigeru Oyanagi¹⁰,
Kazuto Kubota¹¹ &
Akihiko Nakase¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2703))

Included in the following conference series:

International Workshop on Mining Web Data for Discovering Usage Patterns and Profiles

359 Accesses
3 Citations

Abstract

Sequence pattern mining is one of the most important methods for mining WWW access log. The Apriori algorithm is well known as a typical algorithm for sequence pattern mining. However, it suffers from inherent difficulties in finding long sequential patterns and in extracting interesting patterns among a huge amount of results.

This article proposes a new method for finding generalized sequence pattern by matrix clustering. This method decomposes a sequence into a set of sequence elements, each of which corresponds to an ordered pair of items. Then matrix clustering is applied to extract a cluster of similar sequences. The resulting sequence elements are composed into a generalized sequence.

Our method is evaluated with practical WWW access log, which shows that it is practically useful in finding long sequences and in presenting the generalized sequence in a graph.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Berry, M.J.A., Linoff, G.: Data Mining Technologies: for marketing, sales, and customer support. John Wiley & Sons, Chichester (1997)
Google Scholar
Fukuda, T., Morimoto, Y., Tokuyama, T.: Data Mining, Kyoritsu-Pub. (2001) (in Japanese)
Google Scholar
Han, J., Lakshmanan, L.V.S., Pei, J.: Scalable Frequent-Pattern Mining Methods: An Overview, in: Tutorial Notes of KDD 2001 (2001)
Google Scholar
Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: Proc. 20th VLDB Conf. (1994)
Google Scholar
Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proc. Intl. Conf. Data Engineering (1995)
Google Scholar
Srikant, R., Agrawal, R.: Mining Sequential Patterns: Generalizations and Performance Improvements. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057. Springer, Heidelberg (1996)
Chapter Google Scholar
Mannila, H., Toivonen, H., Verkamo, A.: Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery 1, 259–289 (1997)
Article Google Scholar
Han, J., Pei, J., Asl, M., Chen, Q., Dayal, U., Hsu, M.: FreeSpan: Frequent Pattern- Projected Sequential Pattern Mining. In: ACM Proc. KDD 2000 (2000)
Google Scholar
Pei, J., Han, J., Asl, M., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: PrefixSpan: Mining Sequential Pattern Efficiently by Prefix-Projected Pattern Growth. In: Proc. 2001 Intl. Conf. on Data Engineering, ICDE 2001 (2001)
Google Scholar
Bayardo Jr., R.J.: Efficiently Mining Long Patterns from Databases. In: ACM Proc. SIGMOD (1998)
Google Scholar
Srivastava, J., Cooley, R., Deshpande, M., Tan, P.: Web Usage Mining: Discovery and Applications of Usage Patterns fromWeb Data. SIGKDD Explorations 1(2) (2000)
Google Scholar
Mobasher, B., Cooley, R., Srivastava, J.: Automatic Personalization Based on Web Usage Mining. Comm. ACM 43(8), 142–151 (2000)
Article Google Scholar
Schafer, J., Konstan, J., Riedl, J.: E-Commerce Recommendation Applications. in: ACM Conference on EC (2000)
Google Scholar
Sarwar, B., Karypis, G., Konstan, J., Riedl, J.: Analysis of Recommendation Algorithms for E-Commerce. In: ACM Conference on EC (2000)
Google Scholar
Herlocker, J.L., Konstan, J.A., Borchers, A., Riedl, J.: An Algorithmic Framework for Performing Collaborative Filtering. In: Conf. Research and Development in Information Retrieval (1999)
Google Scholar
Dhillon, I.: Co-clustering documents and words using bipartite spectral graph partitioning. In: KDD 2001, pp. 269–274. ACM, New York (2001)
Chapter Google Scholar
Baker, L.D., McCallum, A.K.: Distributional clustering of words for text classification. In: ACM SIGIR Conference (1998)
Google Scholar
Kohavi, R.: Mining E-Commerce Data: The Good, the Bad, and the Ugly. In: SIGKDD 2001 (2001)
Google Scholar
Oyanagi, S., Kubota, K., Nakase, A.: Matrix Clustering: A New Data Mining Method for CRM. Trans. IPSJ 42(8), 2156–2166 (2001) (in Japanese)
Google Scholar
Oyanagi, S., Kubota, K., Nakase, A.: Application of Matrix Clustering to Web Log Analysis and Access Prediction. In: WEBKDD 2001 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Ritsumeikan University,
Shigeru Oyanagi
TOSHIBA Corp,
Kazuto Kubota & Akihiko Nakase

Authors

Shigeru Oyanagi
View author publications
You can also search for this author in PubMed Google Scholar
Kazuto Kubota
View author publications
You can also search for this author in PubMed Google Scholar
Akihiko Nakase
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Alberta, Canada
Osmar R. Zaïane
University of Minnesota, Minneapolis, MN, USA
Jaideep Srivastava
Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, Germany
Myra Spiliopoulou
Data Miners Inc., 77 North Washington Street, MA 02114, Boston, USA
Brij Masand

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Oyanagi, S., Kubota, K., Nakase, A. (2003). Mining WWW Access Sequence by Matrix Clustering. In: Zaïane, O.R., Srivastava, J., Spiliopoulou, M., Masand, B. (eds) WEBKDD 2002 - Mining Web Data for Discovering Usage Patterns and Profiles. WebKDD 2002. Lecture Notes in Computer Science(), vol 2703. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39663-5_8

Download citation

DOI: https://doi.org/10.1007/978-3-540-39663-5_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20304-9
Online ISBN: 978-3-540-39663-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics