Abstract
With the fast increase in Web activities, Web data mining has recently become an important research topic and is receiving a significant amount of interest from both academic and industrial environments. While existing methods are efficient for the mining of frequent path traversal patterns from the access information contained in a log file, these approaches are likely to over evaluate associations. Explicitly, most previous studies of mining path traversal patterns are based on the model of a uniform support threshold, where a single support threshold is used to determine frequent traversal patterns without taking into consideration such important factors as the length of a pattern, the positions of Web pages, and the importance of a particular pattern, etc. As a result, a low support threshold will lead to lots of uninteresting patterns derived whereas a high support threshold may cause some interesting patterns with lower supports to be ignored. In view of this, this paper broadens the horizon of frequent path traversal pattern mining by introducing a flexible model of mining Web traversal patterns with dynamic thresholds. Specifically, we study and apply the Markov chain model to provide the determination of support threshold of Web documents; and further, by properly employing some effective techniques devised for joining reference sequences, the proposed algorithm dynamic threshold miner (DTM) not only possesses the capability of mining with dynamic thresholds, but also significantly improves the execution efficiency as well as contributes to the incremental mining of Web traversal patterns. Performance of algorithm DTM and the extension of existing methods is comparatively analyzed with synthetic and real Web logs. It is shown that the option of algorithm DTM is very advantageous in reducing the number of unnecessary rules produced and leads to prominent performance improvement.
Similar content being viewed by others
References
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of ACM SIGMOD, pp. 207–216, (1993)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases, pp. 478–499 (1994)
Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of the 11th international conference on data engineering, pp. 3–14, (1995)
Ale, J.M., Rossi, G.: An Approach to discovering temporal association rules. In: ACM Symposium on Applied Computing (2000)
Ayad, A.M., El-Makky, N.M., Taha, Y.: Incremental mining of constrained association rules. In: Proceedings of the 1st SIAM conference on data mining (2001)
Borges, J., Levene, M.: Mining association rules in hypertext databases. In: Proceedings of conference on knowledge discovery and data mining (KDD’98), pp. 151–160 (1998)
Chen M.-S., Han J. and Yu P.S. (1996). Data mining: an overview from database perspective. IEEE Trans. Knowl. Data Eng. 8(6): 866–883
Chen M.-S., Park J.-S. and Yu P.S. (1998). Efficient data mining for path traversal patterns. IEEE Trans. Knowl. Data Eng. 10(2): 209–221
Chen, X., Petr, I.: Discovering temporal association rules: algorithms, language and system. In: Proceedings of 2000 International Conference on Data Engineering (2000)
Cheung, D., Han, J., Ng, V., Wong, C.Y.: Maintenance of discovered association rules in large databases: an incremental updating technique. In: Proceedings of 1996 International Conference on Data Engineering, pp. 106–114 (1996)
Cooley, R.: The importance of understanding web site structure and content when performing web usage mining (2000)
Cooley, R., Tan, P.-N., Srivastava, J.: Websift: the web site information filter system. In: Proceedings of the 1999 KDD Workshop on Web Mining (1999)
Doyle, P.G., Snell, J.L.: Random walks and electric networks. The Mathematical Association of America (1984)
Grimmett, G.R., Stirzaker, D.R.: Probability and Random Processes, 2nd edn. Oxford Science Publications (1992)
Han, J., Fu, Y.: Discovery of multiple-level association rules from large databases. In: Proceedings of the 21st International Conference on Very Large Data Bases, pp. 420–431 (1995)
Han, J., Pei, J., Yin, Y.: Mining Frequent Patterns without Candidate Generation. In: Proc. of 2000 ACM-SIGMOD International Conference on Management of Data, pp. 486–493 (2000)
Keogh, E., Chakrabarti, K., Mehrotra, S., Pazzani, M.: Locally adaptive dimensionality reduction for indexing large time series databases. In: Proceedings of 2001 ACM-SIGMOD Conference on Management of Data (2001)
Lakshmanan, L.V.S., Ng, R., Han, J., Pang, A.: Optimization of constrained frequent set queries with 2-variable constraints. In: Proceedings of 1999 ACM-SIGMOD Conference on Management of Data, pp. 157–168 (1999)
Lee, C.-H., Lin, C.-R., Chen, M.-S.: On mining general temporal association rules in a publication database. In: Proceedings of 2001 IEEE International Conference on Data Mining (2001)
Lee, C.-H., Lin, C.-R., Chen, M.-S.: Sliding-Window Filtering: An Efficient Algorithm for Incremental Mining. In: Proceedings of the Tenth ACM International Conference on Information and Knowledge Management (2001)
Lin, J.-L., Dunham, M.H.: Mining association rules: anti-skew algorithms. In: Proceedings of 1998 International Conference on Data Engineering, pp. 486–493 (1998)
Liu, B., Hsu, W., Ma, Y.: Mining Association Rules with Multiple Minimum Supports. In: Proceedings of 1999 International Conference on Knowledge Discovery and Data Mining (1999)
Liu, B., Ma, Y., Yu, P.S.: Discovering unexpected information from your competitors’ web sites. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2001)
Mannila, H., Rusakov, D.: Decomposition of event sequences into independent components. In: Proceedings of the First SIAM Conference on Data Mining (2001)
Mannila H., Toivonen H. and Verkamo A.I. (1997). Discovery of frequent episodes in event sequences. Data Mining Knowl. Discov. 1(3): 259–289
Mao, R., Lu, Y., Han, J.: Mining multi-level and multi-dimensional frequent patterns with flexible support constraints. In: Proceedings of IEEE International Conference on Data Mining (2001)
Nanopoulos, A., Manolopoulos, Y.: Finding generalized path patterns for web log data mining. In: Proceedings of East-European Conference on Advanced Databases and Information System, pp. 215–228 (2000)
Nasraoui, O., Cardona, C., Rojas, C., Gonzalez, F.: Mining evolving user profiles in noisy web clickstream data with a scalable immune system clustering algorithm. In: Proceedings of the Workshop on Web mining as a premise to effective and intelligent Web applications (WEBKDD’03) (2003)
Park J.-S., Chen M.-S. and Yu P.S. (1997). Using a Hash-based method with transaction trimming for mining association rules. IEEE Trans. on Knowl. Data Eng. 9(5): 813–825
Pei, J., Han, J., Mortazavi-Asl, B., Zhu, H.: Mining Access Patterns Efficiently from Web Logs. In: Proceedings Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’00) (2000)
Srikant, R., Agrawal, R.: Mining generalized association rules. In: Proceedings of the 21th International Conference on Very Large Data Bases, pp. 407–419 (1995)
Srikant, R., Agrawal, R.: Mining quantitative association rules in large relational tables. In: Proceedings of 1996 ACM-SIGMOD Conference on Management of Data (1996)
Tan, P.-N., Kumar, V., Srivastava, J.: Selecting the right interestingness measure for association patterns. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002)
Veloso, A.A., Meira, W. Jr., de~Carvalho, M.B., Possas, B., Parthasarathy, S., Javeed Zaki, M.: Mining frequent itemsets in evolving databases. In: Proceedings of 2nd SIAM International Conference on Data Mining (2002)
Verykios V.S., Elmagarmid A.K., Bertino E., Saygin Y. and Dasseni E. (2004). Association rule hiding. IEEE Trans. Knowl. Data Eng. 16(4): 434–447
Wang, K., He, Y., Han, J.: Mining frequent Itemsets using support constraints. Proceedings of 2000 International Conference on Very Large Data Bases (2000)
Wang, K., Zhou, S.Q., Liew, S.C.: Building hierarchical classifiers using class proximity. In: Proceedings of 1999 International Conference on Very Large Data Bases, pp. 363–374 (1999)
Wang, W., Yang, J., Muntz, R.R.: TAR: temporal association rules on evolving numerical attributes. In: Proceedings of 2000 International Conference on Data Engineering (2001)
Wolff, R., Schuster, A.: Association rule mining in peer-to-peer systems. In: Proceedings of the 3rd IEEE International Conference on Data Mining, pp. 363–370 (2003)
Yang, C., Fayyad, U., Bradley, P.: Efficient discovery of error-tolerant frequent itemsets in high dimensions. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2001)
Yi, L., Liu, B., Li, X.: Eliminating noisy information in web pages for data mining. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (2003)
Yu, H., Han, J.: Pebl: Positive example based learning for web page classification using svm. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ou, JC., Lee, CH. & Chen, MS. Efficient algorithms for incremental Web log mining with dynamic thresholds. The VLDB Journal 17, 827–845 (2008). https://doi.org/10.1007/s00778-006-0043-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-006-0043-9