A Generalization-Based Approach to Clustering of Web Usage Sessions

Fu, Yongjian; Sandhu, Kanwalpreet; Shih, Ming-Yi

doi:10.1007/3-540-44934-5_2

Yongjian Fu³,
Kanwalpreet Sandhu³ &
Ming-Yi Shih³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1836))

Included in the following conference series:

International Workshop on Web Usage Analysis and User Profiling

880 Accesses
55 Citations

Abstract

The clustering of Web usage sessions based on the access patterns is studied. Access patterns of Web users are extracted from Web server log files, and then organized into sessions which represent episodes of interaction between the Web users and the Web server. Using attribute-oriented induction, the sessions are then generalized according to a page hierarchy which organizes pages based on their contents. These generalized sessions are finally clustered using a hierarchical clustering method. Our experiments on a large real data set show that the approach is efficient and practical for Web mining applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In Proc. ACM SIGMOD Int. Conf. on Management of Data, Seattle, Washington, 1998.
Google Scholar
J. C. Bezdek and S. K. Pal. Fuzzy Models for Pattern Recognition. IEEE Press, 1992.
Google Scholar
J. Borges and M. Levene. Mining association rules in hypertext databases. In Proc. 1998 Int’l Conf. on Data Mining and Knowledge Discovery (KDD’98), pages 149–153, August 1998.
Google Scholar
A. Büchner and M. Mulvenna. Discovering internet marketing intelligence through online analytical web usage mining. SIGMOD Record, 27, 1998.
Google Scholar
M.S. Chen, J.S. Park, and P.S. Yu. Efficient data mining for path traversal patterns in distributed systems. Proc. 1996 Int’l Conf. on Distributed Computing Systems, 385, May 1996.
Google Scholar
R. Cooley, B. Mobasher, and J. Srivastava. Data preparation for mining world wide web browsing patterns. Journal of Knowledge and Information Systems, 1, 1999.
Google Scholar
R. Cooley, B. Mobasher, and J. Srivastava. Web mining: Information and pattern discovery on the world wide web. In Proc. Int. Conf. on Tools with Artificial Intelligence, pages 558–567, Newport Beach, CA, 1999.
Google Scholar
O. Etzioni. The world-wide web: Quangmire or gold mine? Communications of ACM, 39:65–68, 1996.
Article Google Scholar
J. Han, Y. Cai, and N. Cercone. Knowledge discovery in databases: An attributeoriented approach. In Proc. 18th Int. Conf. Very Large Data Bases, pages 547–559, Vancouver, Canada, August 1992.
Google Scholar
J. Han and Y. Fu. Dynamic generation and refinement of concept hierarchies for knowledge discovery in databases. In Proc. AAAI’94 Workshop on Knowledge Discovery in Databases (KDD’94), pages 157–168, Seattle, WA, July 1994.
Google Scholar
J. Han and Y. Fu. Exploration of the power of attribute-oriented induction in data mining. In U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 399–421. AAAI/MIT Press, 1996.
Google Scholar
A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Printice Hall, 1988.
Google Scholar
L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, 1990.
Google Scholar
R. S. Michalski and R. Stepp. Automated construction of classifications: Conceptual clustering versus numerical taxonomy. IEEE Trans. Pattern Analysis and Machine Intelligence, 5:396–410, 1983.
Article Google Scholar
B. Mobasher, N. Jain, S. Han, and J. Srivastava. Web Mining: Pattern Discovery from World Wide Web Transcations. Technical Report, University of Minnesota, avialable at ftp://ftp.cs.umn.edu/users/kumar/webmining.ps., 1996.
J. Moore, S. Han, D. Boley, M. Gini, R. Gross, K. Hastings, G. Karypis, V. Kumar, and B. Mobasher. Web Page Categorization and Feature Selection Using Association Rule and Principal Component Clustering. Workshop on Information Technologies and Systems, avialable at ftp://ftp.cs.umn.edu/users/kumar/webwits.ps., 1997.
M. Perkowitz and O. Etzioni. Adaptive web pages: Automatically synthesizing web pages. In Proc. 15th National Conf. on Artificial Intelligence (AAAI/IAAI’98), pages 727–732, Madison, Wisconsin, July, 1998.
Google Scholar
C. Shahabi, A. Z. Zarkesh, J. Adibi, and V. Shah. Knowledge discovery from users web-page navigation. In Proc. of 1997 Int. Workshop on Research Issues on Data Engineering (RIDE’97), Birmingham, England, April 1997.
Google Scholar
M. Spiliopoulou and L. Faulstich. Wum: A web utilization miner. In Proc. EDBT Workshop WebDB’98, Valencia, Spain, 1998.
Google Scholar
A. Woodru., P. M. Aoki, E. Brewer, P. Gauthier, and L. A. Rowe. An Investigation of Documents from the World Wide Web. 5th Int. World Wide Web Conference, Paris, France, May, 1996.
Google Scholar
T. W. Yan, M. Jacobsen, H. Garcia-Molina, and U. Dayal. From User Access Patterns to Dynamic Hypertext Linking. 5th Int. World Wide Web Conference, Paris, France, May, 1996.
Google Scholar
O. R. Zaïane, X. Xin, and J. Han. Discovering web access patterns and trends by applying olap and data mining technology on web logs. In Proc. Advances in Digital Libraries, pages 19–29, 1998.
Google Scholar
O. Zamir, O. Etzioni, O. Madani, and R. Karp. Fast and intuitive clustering of web documents. In Proc. Int’l Conf. on Data Mining and Knowledge Discovery (KDD’97), pages 287–290, Newport Beach, CA, August 1997.
Google Scholar
T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: an efficient data clustering method for very large databases. In Proc. 1996 ACM-SIGMOD Int. Conf. Management of Data, pages 103–114, Montreal, Canada, June 1996.
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, University of Missouri-Rolla, USA
Yongjian Fu, Kanwalpreet Sandhu & Ming-Yi Shih

Authors

Yongjian Fu
View author publications
You can also search for this author in PubMed Google Scholar
Kanwalpreet Sandhu
View author publications
You can also search for this author in PubMed Google Scholar
Ming-Yi Shih
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Discovery and Intelligent Agents Technology, Redwood Investment Systems Inc., Boston, MA, 02110-1225, USA
Brij Masand (Director of Knowledge) (Director of Knowledge)
Institut für Wirtschaftsinformatik, Humboldt Universität zu Berlin, Spandauer Str. 1, 10178, Berlin, Germany
Myra Spiliopoulou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fu, Y., Sandhu, K., Shih, MY. (2000). A Generalization-Based Approach to Clustering of Web Usage Sessions. In: Masand, B., Spiliopoulou, M. (eds) Web Usage Analysis and User Profiling. WebKDD 1999. Lecture Notes in Computer Science(), vol 1836. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44934-5_2

Download citation

DOI: https://doi.org/10.1007/3-540-44934-5_2
Published: 11 June 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67818-2
Online ISBN: 978-3-540-44934-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics