Efficient algorithms for incremental Web log mining with dynamic thresholds

Ou, Jian-Chih; Lee, Chang-Hung; Chen, Ming-Syan

doi:10.1007/s00778-006-0043-9

Efficient algorithms for incremental Web log mining with dynamic thresholds

Regular Paper
Published: 24 January 2007

Volume 17, pages 827–845, (2008)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Jian-Chih Ou¹,
Chang-Hung Lee¹ &
Ming-Syan Chen¹

210 Accesses
13 Citations
Explore all metrics

Abstract

With the fast increase in Web activities, Web data mining has recently become an important research topic and is receiving a significant amount of interest from both academic and industrial environments. While existing methods are efficient for the mining of frequent path traversal patterns from the access information contained in a log file, these approaches are likely to over evaluate associations. Explicitly, most previous studies of mining path traversal patterns are based on the model of a uniform support threshold, where a single support threshold is used to determine frequent traversal patterns without taking into consideration such important factors as the length of a pattern, the positions of Web pages, and the importance of a particular pattern, etc. As a result, a low support threshold will lead to lots of uninteresting patterns derived whereas a high support threshold may cause some interesting patterns with lower supports to be ignored. In view of this, this paper broadens the horizon of frequent path traversal pattern mining by introducing a flexible model of mining Web traversal patterns with dynamic thresholds. Specifically, we study and apply the Markov chain model to provide the determination of support threshold of Web documents; and further, by properly employing some effective techniques devised for joining reference sequences, the proposed algorithm dynamic threshold miner (DTM) not only possesses the capability of mining with dynamic thresholds, but also significantly improves the execution efficiency as well as contributes to the incremental mining of Web traversal patterns. Performance of algorithm DTM and the extension of existing methods is comparatively analyzed with synthetic and real Web logs. It is shown that the option of algorithm DTM is very advantageous in reducing the number of unnecessary rules produced and leads to prominent performance improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of ACM SIGMOD, pp. 207–216, (1993)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases, pp. 478–499 (1994)
Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of the 11th international conference on data engineering, pp. 3–14, (1995)
Ale, J.M., Rossi, G.: An Approach to discovering temporal association rules. In: ACM Symposium on Applied Computing (2000)
Ayad, A.M., El-Makky, N.M., Taha, Y.: Incremental mining of constrained association rules. In: Proceedings of the 1st SIAM conference on data mining (2001)
Borges, J., Levene, M.: Mining association rules in hypertext databases. In: Proceedings of conference on knowledge discovery and data mining (KDD’98), pp. 151–160 (1998)
Chen M.-S., Han J. and Yu P.S. (1996). Data mining: an overview from database perspective. IEEE Trans. Knowl. Data Eng. 8(6): 866–883
Article Google Scholar
Chen M.-S., Park J.-S. and Yu P.S. (1998). Efficient data mining for path traversal patterns. IEEE Trans. Knowl. Data Eng. 10(2): 209–221
Article Google Scholar
Chen, X., Petr, I.: Discovering temporal association rules: algorithms, language and system. In: Proceedings of 2000 International Conference on Data Engineering (2000)
Cheung, D., Han, J., Ng, V., Wong, C.Y.: Maintenance of discovered association rules in large databases: an incremental updating technique. In: Proceedings of 1996 International Conference on Data Engineering, pp. 106–114 (1996)
Cooley, R.: The importance of understanding web site structure and content when performing web usage mining (2000)
Cooley, R., Tan, P.-N., Srivastava, J.: Websift: the web site information filter system. In: Proceedings of the 1999 KDD Workshop on Web Mining (1999)
Doyle, P.G., Snell, J.L.: Random walks and electric networks. The Mathematical Association of America (1984)
Grimmett, G.R., Stirzaker, D.R.: Probability and Random Processes, 2nd edn. Oxford Science Publications (1992)
Han, J., Fu, Y.: Discovery of multiple-level association rules from large databases. In: Proceedings of the 21st International Conference on Very Large Data Bases, pp. 420–431 (1995)
Han, J., Pei, J., Yin, Y.: Mining Frequent Patterns without Candidate Generation. In: Proc. of 2000 ACM-SIGMOD International Conference on Management of Data, pp. 486–493 (2000)
Keogh, E., Chakrabarti, K., Mehrotra, S., Pazzani, M.: Locally adaptive dimensionality reduction for indexing large time series databases. In: Proceedings of 2001 ACM-SIGMOD Conference on Management of Data (2001)
Lakshmanan, L.V.S., Ng, R., Han, J., Pang, A.: Optimization of constrained frequent set queries with 2-variable constraints. In: Proceedings of 1999 ACM-SIGMOD Conference on Management of Data, pp. 157–168 (1999)
Lee, C.-H., Lin, C.-R., Chen, M.-S.: On mining general temporal association rules in a publication database. In: Proceedings of 2001 IEEE International Conference on Data Mining (2001)
Lee, C.-H., Lin, C.-R., Chen, M.-S.: Sliding-Window Filtering: An Efficient Algorithm for Incremental Mining. In: Proceedings of the Tenth ACM International Conference on Information and Knowledge Management (2001)
Lin, J.-L., Dunham, M.H.: Mining association rules: anti-skew algorithms. In: Proceedings of 1998 International Conference on Data Engineering, pp. 486–493 (1998)
Liu, B., Hsu, W., Ma, Y.: Mining Association Rules with Multiple Minimum Supports. In: Proceedings of 1999 International Conference on Knowledge Discovery and Data Mining (1999)
Liu, B., Ma, Y., Yu, P.S.: Discovering unexpected information from your competitors’ web sites. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2001)
Mannila, H., Rusakov, D.: Decomposition of event sequences into independent components. In: Proceedings of the First SIAM Conference on Data Mining (2001)
Mannila H., Toivonen H. and Verkamo A.I. (1997). Discovery of frequent episodes in event sequences. Data Mining Knowl. Discov. 1(3): 259–289
Article Google Scholar
Mao, R., Lu, Y., Han, J.: Mining multi-level and multi-dimensional frequent patterns with flexible support constraints. In: Proceedings of IEEE International Conference on Data Mining (2001)
Nanopoulos, A., Manolopoulos, Y.: Finding generalized path patterns for web log data mining. In: Proceedings of East-European Conference on Advanced Databases and Information System, pp. 215–228 (2000)
Nasraoui, O., Cardona, C., Rojas, C., Gonzalez, F.: Mining evolving user profiles in noisy web clickstream data with a scalable immune system clustering algorithm. In: Proceedings of the Workshop on Web mining as a premise to effective and intelligent Web applications (WEBKDD’03) (2003)
Park J.-S., Chen M.-S. and Yu P.S. (1997). Using a Hash-based method with transaction trimming for mining association rules. IEEE Trans. on Knowl. Data Eng. 9(5): 813–825
Article Google Scholar
Pei, J., Han, J., Mortazavi-Asl, B., Zhu, H.: Mining Access Patterns Efficiently from Web Logs. In: Proceedings Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’00) (2000)
Srikant, R., Agrawal, R.: Mining generalized association rules. In: Proceedings of the 21th International Conference on Very Large Data Bases, pp. 407–419 (1995)
Srikant, R., Agrawal, R.: Mining quantitative association rules in large relational tables. In: Proceedings of 1996 ACM-SIGMOD Conference on Management of Data (1996)
Tan, P.-N., Kumar, V., Srivastava, J.: Selecting the right interestingness measure for association patterns. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002)
Veloso, A.A., Meira, W. Jr., de~Carvalho, M.B., Possas, B., Parthasarathy, S., Javeed Zaki, M.: Mining frequent itemsets in evolving databases. In: Proceedings of 2nd SIAM International Conference on Data Mining (2002)
Verykios V.S., Elmagarmid A.K., Bertino E., Saygin Y. and Dasseni E. (2004). Association rule hiding. IEEE Trans. Knowl. Data Eng. 16(4): 434–447
Article Google Scholar
Wang, K., He, Y., Han, J.: Mining frequent Itemsets using support constraints. Proceedings of 2000 International Conference on Very Large Data Bases (2000)
Wang, K., Zhou, S.Q., Liew, S.C.: Building hierarchical classifiers using class proximity. In: Proceedings of 1999 International Conference on Very Large Data Bases, pp. 363–374 (1999)
Wang, W., Yang, J., Muntz, R.R.: TAR: temporal association rules on evolving numerical attributes. In: Proceedings of 2000 International Conference on Data Engineering (2001)
Wolff, R., Schuster, A.: Association rule mining in peer-to-peer systems. In: Proceedings of the 3rd IEEE International Conference on Data Mining, pp. 363–370 (2003)
Yang, C., Fayyad, U., Bradley, P.: Efficient discovery of error-tolerant frequent itemsets in high dimensions. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2001)
Yi, L., Liu, B., Li, X.: Eliminating noisy information in web pages for data mining. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (2003)
Yu, H., Han, J.: Pebl: Positive example based learning for web page classification using svm. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002)

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan, ROC
Jian-Chih Ou, Chang-Hung Lee & Ming-Syan Chen

Authors

Jian-Chih Ou
View author publications
You can also search for this author inPubMed Google Scholar
Chang-Hung Lee
View author publications
You can also search for this author inPubMed Google Scholar
Ming-Syan Chen
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Jian-Chih Ou.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ou, JC., Lee, CH. & Chen, MS. Efficient algorithms for incremental Web log mining with dynamic thresholds. The VLDB Journal 17, 827–845 (2008). https://doi.org/10.1007/s00778-006-0043-9

Download citation

Received: 18 December 2005
Accepted: 23 September 2006
Published: 24 January 2007
Issue Date: July 2008
DOI: https://doi.org/10.1007/s00778-006-0043-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient algorithms for incremental Web log mining with dynamic thresholds

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Performance Analysis of Tree-Based Algorithms for Incremental High Utility Pattern Mining

An incremental framework to extract coverage patterns for dynamic databases

A sequential tree approach for incremental sequential pattern mining

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Efficient algorithms for incremental Web log mining with dynamic thresholds

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Performance Analysis of Tree-Based Algorithms for Incremental High Utility Pattern Mining

An incremental framework to extract coverage patterns for dynamic databases

A sequential tree approach for incremental sequential pattern mining

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now