Abstract
We consider the problem of mining web access patterns with super-pattern constraint. This constraint requires that the sequential patterns in the sequence database must contain a particular set of patterns as sub-patterns. One common application of this constraint is web usage mining which mines the user access behavior on the web. In this paper, we introduce an efficient strategy for mining web access patterns with super-pattern constraint that requires only one database scan. Firstly, we present the MWAPC (M ining W eb A ccess P atterns based on super-pattern C onstraint) algorithm, in which each frequent pattern has to be checked if it contains at least one pattern from a user-defined set of patterns. Then we develop an effective algorithm, called EMWAPC that prunes the search space at the beginning of mining process and avoids checking the constraints one by one based on three proposed propositions. We have conducted the experiments on real web log databases. The experimental results show that the proposed algorithms outperform the previous methods.
Similar content being viewed by others
References
Agrawal R, Srikant R (1995) Mining sequential patterns. Proceedings of the 11th International Conference on Data Engineering, pp 3–14
Ayres J, Gehrke JE, Yiu T, Flannick J (2002) Sequential pattern mining using a bitmap representation. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Dada Mining, pp 429–435
Béchet N, Cellier P, Charnois T, Crémilleux B (2015) Sequence mining under multiple constraints. Proceedings of the 30th Annual ACM Symposium on Applied Computing, pp 908–914
Chen E, Cao H, Li Q, Qian T (2008) Efficient strategies for tough aggregate constraint-based sequential pattern mining. Inf Sci 176(1):1498–1518
Fournier-Viger FV, Gomariz A, Campos M, Thomas R (2014) Fast Vertical Mining of Sequential Patterns Using Co-occurrence Information. PAKDD’14, pp 40–52
Garofalakis MN, Rastogi R, Shim K (1999) SPIRIT: Sequential pattern with mining regular expression constraints. VLDB 99:7–10
Gouda K, Hassaan M, Zaki MJ (2010) Prism: An effective approach for frequent sequence mining via prime-block encoding. Comput Syst Sci 76(1):88–102
Guerbas A, Addam O, Nagi M, Elhajj A, Ridley M, Alhajj R (2013) Effective web log mining and online navigational pattern prediction. Knowl-Based Syst 49:50–62
Ho J, Lukov L, Chawla S (2005) Sequential pattern mining with constraints on large protein databases. In: COMAD, pp 89–100
Le B, Tran MT, Vo B (2015) Mining frequent closed inter-sequence patterns efficiently using dynamic bit vectors. Appl Intell 43(1):74–84
Lu Y, Ezeife CI (2003) Position Coded Pre-order Linked WAP-Tree for Web Log Sequential Pattern Mining. In: PAKDD 2003, LNCS (LNAI), vol 2637, pp 337–349
Mary SP, Baburaj E (2016) A novel framework for an efficient online recommendation system using constraint based web usage mining techniques. Biomedical Research, pp 92–98
Masseglia F, Poncelet P, Teisseire M (2009) Efficient mining of sequential patterns with time constraints: Reducing the combinations. Expert Syst Appl 36(2):2677–2690
Mooney CH, Roddick JF (2013) Sequential pattern mining-approaches and algorithms. ACM Comput Surv 45(2):19
Orlando S, Perego R, Silvestri C (2004) A New Algorithm for gap constrained sequence mining. In: Proceedings of the ACM Symposium on Applied Computing, pp 540–547
Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu MC (2004) Mining sequential patterns by pattern-growth: The PrefixSpan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440
Pei J, Han J, Wang W (2007) Constraint-based sequential pattern mining: the pattern-growth methods. J Intell Inf Syst 28(2):133–160
Pei J, Han J, Mortazavi-asl B, Zhu H (2000) Mining access patterns efficiently from web logs. In PAKDD 2000, LNCS, vol 1805, pp 396–407
Rathore KS, Sharma S (2016) Web personalization based on enhanced web access pattern using sequential pattern mining. Int Eng Comput Sci 5(6):17152–17159
Rajimol A, Raju G (2012) Web access pattern mining–a survey. Data Engineering, Management, Lecture Notes in Computer Science, vol 6411. Springer, Berlin, pp 24–31
Srikant R, Agrawal R (1996) Mining sequential patterns: Generalizations and performance improvements. Advances in Database Technology, EDBT’96, pp 1–17
Tang P, Turkia MP, Gallivan KA (2007) Mining web access patterns with first-occurrencelinked WAP-trees. In SEDE’, vol 07, pp 247–252
Thushara Y, Ramesh V (2016) A study of web mining application on E-commerce using google analytics tool. Int J Comput Appl 149(11):21–26
Tran MT, Le B, Vo B (2015) Combination of dynamic bit vectors and transaction information for mining frequent closed sequences efficiently. Eng Appl Artif Intell 38:183–189
Van T, Vo B, Le B (2011) Mining sequential rules based on prefix-tree. In New Challenges for Intelligent Information and Database Systems, pp 147–156
Vijayalakshmi S, Mohan V, Suresh RS (2010) Mining of users access behavior for frequent sequential pattern from web logs. Int J Database Manag Syst 2(3):31–45
Vo B, Hong TP, Le B (2012) DBV-Miner: A Dynamic Bit vector approach for fast mining frequent closed itemsets. Expert Syst Appl 39(8):7196–7206
Wu X, Zhu X, He Y, Arslan AN (2013) PMBC: Pattern mining from biological sequences with wildcard constraints. Comput Biol Med 43(5):481–492
Zaki MJ (2000) Sequence mining in categorical domains: incorporating constraints. Proceedings of the 9th International Conference on Information and Knowledge Management, pp 422–429
Zaki MJ (2001) SPADE: An Efficient Algorithm for Mining Frequent Sequences. Mach Learn 42(1):31–60
Acknowledgments
This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant number 102.05-2015.07.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Van, T., Yoshitaka, A. & Le, B. Mining web access patterns with super-pattern constraint. Appl Intell 48, 3902–3914 (2018). https://doi.org/10.1007/s10489-018-1182-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-018-1182-6