Abstract
The explosive growth of the web has drastically changed the way in which information is managed and accessed. The large-scale of web data sources and the wide availability of services over the internet have increased the need for effective web data mining techniques and mechanisms . A sophisticated method to organize the layout of the information and assist user navigation is therefore particularly important. In this work, we focus on web usage mining, applying data mining techniques to web server logs. Web usage mining is the non-trivial process of distinguishing implicit, previously unknown but potentially useful clickstream patterns that may exist in any collection of web access logs. The required abstraction can be generated by clustering the web access logs based on some sort of similarity measure. Clustering is done such that the web access logs within the same group or cluster are more similar than data points from different clusters. In this chapter, we propose a partitional algorithm namely Multi Pass Combined Standard Deviation(CSD) Means algorithm which automatically generates the optimum number of clusters from the web clickstream patterns. The quality of clusters obtained using these algorithms are compared using K-Means algorithm, Rough K-Means algorithm and model based algorithms ANTCLUST and ACCANTCLUST. The experimental analysis of mined clickstream patterns shows the effectiveness of the proposed algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abraham, A.: Natural Computation for Business Intelligence from Web Usage Mining. In: Proceedings of the Seventh International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC 2005) (2005)
Baumgarten, M., Bchner, A.G., Anand, S.S., Mulvenna, M.D., Hughes, J.G.: Navigation Pattern Discovery from Internet Data. In: Masand, B., Spiliopoulou, M. (eds.) WebKDD 1999. LNCS, vol. 1836. Springer, Heidelberg (2000)
Bezdek, J.C.: Numerical Taxonomy with Fuzzy Sets. J. Math. Biol. 1, 57–71 (1974)
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum, New York (1981)
Cooley, R.: Web Usage Mining: Discovery and Application of Interesting Patterns from web data, Ph.D. Thesis, University of Minnesota (2000)
Cooley, R., Mobasher, B., Srivastava, J.: Data Preparation for Mining World Wide Web Browsing Patterns. J. Knowledge and Information Systems 1(1), 5–32 (1999)
Cooley, R., Srivastava, J., Mobasher, B.: Web Mining: Information and Pattern Discovery on the World Wide Web. In: Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 1997), pp. 558–567 (1997b)
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Analysis and Machine Intelligence 1(4), 224–227 (1979)
Pierrakos, D., Paliouras, G.O., Papatheodorou, C., Spyropoulos, C.D.: Web Usage Mining as a Tool for Personalization: A Survey. User Modeling and User-Adapted Interaction 13, 311–372 (2003)
Dubes, R., Jain, A.K.: Validity studies in clustering methodologies. Pattern Recognition 11(1), 235–253 (1979)
Duda, R., Hart, P.: Pattern Classification and Scene Analysis. Wiley Interscience, New York (1973)
Dunn, J.C.: A Fuzzy Relative of the ISODATA Process and its Use in Detecting Compact Well-Separated Clusters. Journal Cybern. 3(3), 32–57 (1973)
Estivill-Castro, V., Yang, J.: Fast and robust general purpose clustering algorithms. In: Pacific Rim International Conference on Artificial intelligence, pp. 208–218 (1979)
Flake, G.W., Lawrence, S., Lee Giles, C., Coetzee, F.M.: Self- organization and identification of Web communities. IEEE Computer 35(3), 66–71 (2002)
Fu, Y., Sandhu, K., Shi, M.: Clustering of web users based on access patterns. In: Masand, B., Spiliopoulou, M. (eds.) WebKDD 1999. LNCS (LNAI), vol. 1836, pp. 21–38. Springer, Heidelberg (2000)
Pallis, G., Angelis, L., Vakali, A.: Validation and interpretation of Web users’ sessions clusters. In: Information Processing and Management (2006)
Peters, G.: Some refinements of rough k-means clustering. Pattern Recognition 39, 1481–1491 (2006)
Hannah Inbarani, H., Thangavel, K.: Clickstream Intelligent Clustering using Accelerated Ant Colony Algorithm. In: Advanced Computing and Communications, 2006. ADCOM 2006. International Conference I, pp. 129–134 (2006)
Hannah Inbarani, H., Thangavel, K., Pethalakshmi, A.: Rough Set Based Feature Selection for Web Usage Mining. In: Proceedings of the International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007), pp. 33–38 (2007) ISBN:0-7695-3050-8
Heer, J., Chi, E.: Mining the structure of user activity using cluster stability. In: Proceedings of the Workshop on Web Analytics, SIAM Conference on Data Mining, Arlington, VA (April 2002)
Chang, H.-J., Hung, L.-P., Ho, C.-L.: An anticipation model of potential customers’ purchasing behavior based on clustering analysis and association rules analysis. Expert Systems with Applications 32, 753–764 (2007)
Krishnapuram, R., Keller, J.: A possibilistic approach to clustering. IEEE Trans. Fuzzy Syst. 1(2), 98–110 (1993)
Kuo, R.J., Wang, H.S., Hu, T.-L., Chou, S.H.: Application of Ant K-Means on Clustering Analysis. Computers and Mathematics with Applications 50, 1709–1724 (2005)
Lingras, P., West, C.: Interval Set Clustering of Web Users with Rough K-means. Journal of Intelligent Information Systems (2002)
McQueen, J.: Some methods for classification and analysis of multivariate observations. In: Le Cam, L.M., Newman, J. (eds.) Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)
Mobasher, B., Cooley, R., Srivastava, J.: Creating adaptive web sites through usage-based clustering of URLs. In: Proceedings of the 1999 IEEE Knowledge and Data Engineering Exchange Workshop (KDEX) (1999)
Labroche, N., Monmarche, N., Venturini, G.: A new clustering algorithm based on the chemical recognition system of ants. In: Proceedings of 15th European Conference on Artificial Intelligence (ECAI 2002), Lyon FRANCE, pp. 345–349 (2002)
Labroche, N., Monmarche, N., Venturini, G.: Web session clustering with artificial ants colonies’. In: Proc. of WWW 2003, May 20-24 (2003)
Perkowitz, M., Etzioni O.: Adaptive sites: automatically learning from user access patterns. In: Proceedings of WWW6 (1997), www.scope.gmd.de/info/www6/posters/722/index.html
Song, Q., Shepperd, M.: Mining web browsing patterns for E-commerce. Computers in Industry 57, 622–630 (2006)
Bucklin, R.E., Lattin, J.M., Ansari, A., Gupta, S., Bell, D., Coupey, E., Little, J.D.C., Mela, C., Montgomery, A., Steckel, J.: Choice And the Internet: From Clickstream to Research Stream. Marketing Letters 13(3), 245–258 (2002)
Selamat, A., Sigeru, O.: Web page feature selection and classification using neural networks. Information Sciences 158, 69–88 (2004)
Song, A.-B., Zhao, M.-X., Liang, Z.-P., Dong, Y.-S., Luo, J.-Z.: Discovering user profiles for Web personalization recommendation. Journal of Computer Science and Technology 19(3), 320–328 (2004)
Srivastava, J., Cooley, R., Deshpande, M., Tan, P.T.: Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data. SIGKDD Explorations 1(2), 2–23 (2000)
Kumar De, S., Radha Krishna, P.: Clustering web transactions using rough approximation. Fuzzy Sets and Systems 148, 131–138 (2004)
Mitra, S.: Rough-Fuzzy Collaborative Clustering. IEEE Transactions on Systems, Man and Cybernetics 36(4) (2006)
Thangavel, K., Ashok Kumar, D.: Pattern Clustering using Neural Network, Vision 2020: The Strategic role of Operational Research. Allied Publishers PVT LTD, New Delhi, pp. 662–679 (2006)
Titterington, D., Smith, A., Makov, U.: Statistical analysis of finite mixture distributions. John Wiley and Sons, Chichester (1985)
Voges, K.E., Pope, N.K.L., Brown, M.R.: Cluster analysis of marketing data examining online shopping orientation: a comparison of k-means and rough clustering approaches. In: Abbass, H.A., Sarker, R.A., Newton, C.S. (eds.) Heuristics and Optimization for Knowledge Discovery, pp. 207–224. Idea Group Publishing, Hershey (2002)
Wang, X., Abraham, A., Smith, K.: Intelligent web traffic mining and analysis. Journal of Network and Computer Applications 28(2), 147–165 (2005)
WangBin, Liuzhijing: Web Mining Research. In: Proceedings of the Fifth nternational Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2003) (2003)
Xing, W., Ghorbani, A.: Weighted PageRank Algorithm. In: Proceedings of the Second Annual Conference on Communication Networks and Services Research (CNSR 2004) (2004)
W.W.W. Consortium. The Common Log file Format (1995), http://www.w3.org/Daemon/User/Config/Logging.html#common-logfile-format
Xie, X.L., Beni, G.: A Validity Measure for fuzzy Clustering. IEEE Trans. on Pattern Analysis and MachineIntelligence 13(8), 841–847 (1991)
Yan, T.W., Jacobsen, M., Garcia-Molina, H., Dayal, U.: From user access patterns to dynamic hypertext linking. In: Proceedings of 5th WWW, pp. 1007–1014 (1996)
Zhang, X., Gong, W., Kawamura, Y.: Customer behavior pattern discovering with web mining. In: Proceedings of Asia Pacific web conference, Hangzhou, China, pp. 844–853 (2004)
Pabarskaite, Z., Raudys, A.: A process of knowledge discovery from web log data: Systematization and critical review. Journal of Intelligent Information Systems 28, 79–104 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Inbarani, H.H., Thangavel, K. (2009). Mining and Analysis of Clickstream Patterns. In: Abraham, A., Hassanien, AE., de Leon F. de Carvalho, A.P., Snášel, V. (eds) Foundations of Computational, IntelligenceVolume 6. Studies in Computational Intelligence, vol 206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01091-0_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-01091-0_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01090-3
Online ISBN: 978-3-642-01091-0
eBook Packages: EngineeringEngineering (R0)