Skip to main content
Log in

Mining web access patterns with super-pattern constraint

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

We consider the problem of mining web access patterns with super-pattern constraint. This constraint requires that the sequential patterns in the sequence database must contain a particular set of patterns as sub-patterns. One common application of this constraint is web usage mining which mines the user access behavior on the web. In this paper, we introduce an efficient strategy for mining web access patterns with super-pattern constraint that requires only one database scan. Firstly, we present the MWAPC (M ining W eb A ccess P atterns based on super-pattern C onstraint) algorithm, in which each frequent pattern has to be checked if it contains at least one pattern from a user-defined set of patterns. Then we develop an effective algorithm, called EMWAPC that prunes the search space at the beginning of mining process and avoids checking the constraints one by one based on three proposed propositions. We have conducted the experiments on real web log databases. The experimental results show that the proposed algorithms outperform the previous methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Agrawal R, Srikant R (1995) Mining sequential patterns. Proceedings of the 11th International Conference on Data Engineering, pp 3–14

  2. Ayres J, Gehrke JE, Yiu T, Flannick J (2002) Sequential pattern mining using a bitmap representation. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Dada Mining, pp 429–435

  3. Béchet N, Cellier P, Charnois T, Crémilleux B (2015) Sequence mining under multiple constraints. Proceedings of the 30th Annual ACM Symposium on Applied Computing, pp 908–914

  4. Chen E, Cao H, Li Q, Qian T (2008) Efficient strategies for tough aggregate constraint-based sequential pattern mining. Inf Sci 176(1):1498–1518

    Article  MathSciNet  Google Scholar 

  5. Fournier-Viger FV, Gomariz A, Campos M, Thomas R (2014) Fast Vertical Mining of Sequential Patterns Using Co-occurrence Information. PAKDD’14, pp 40–52

    Chapter  Google Scholar 

  6. Garofalakis MN, Rastogi R, Shim K (1999) SPIRIT: Sequential pattern with mining regular expression constraints. VLDB 99:7–10

    Google Scholar 

  7. Gouda K, Hassaan M, Zaki MJ (2010) Prism: An effective approach for frequent sequence mining via prime-block encoding. Comput Syst Sci 76(1):88–102

    Article  MathSciNet  Google Scholar 

  8. Guerbas A, Addam O, Nagi M, Elhajj A, Ridley M, Alhajj R (2013) Effective web log mining and online navigational pattern prediction. Knowl-Based Syst 49:50–62

    Article  Google Scholar 

  9. Ho J, Lukov L, Chawla S (2005) Sequential pattern mining with constraints on large protein databases. In: COMAD, pp 89–100

  10. Le B, Tran MT, Vo B (2015) Mining frequent closed inter-sequence patterns efficiently using dynamic bit vectors. Appl Intell 43(1):74–84

    Article  Google Scholar 

  11. Lu Y, Ezeife CI (2003) Position Coded Pre-order Linked WAP-Tree for Web Log Sequential Pattern Mining. In: PAKDD 2003, LNCS (LNAI), vol 2637, pp 337–349

    Chapter  Google Scholar 

  12. Mary SP, Baburaj E (2016) A novel framework for an efficient online recommendation system using constraint based web usage mining techniques. Biomedical Research, pp 92–98

  13. Masseglia F, Poncelet P, Teisseire M (2009) Efficient mining of sequential patterns with time constraints: Reducing the combinations. Expert Syst Appl 36(2):2677–2690

    Article  Google Scholar 

  14. Mooney CH, Roddick JF (2013) Sequential pattern mining-approaches and algorithms. ACM Comput Surv 45(2):19

    Article  Google Scholar 

  15. Orlando S, Perego R, Silvestri C (2004) A New Algorithm for gap constrained sequence mining. In: Proceedings of the ACM Symposium on Applied Computing, pp 540–547

  16. Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu MC (2004) Mining sequential patterns by pattern-growth: The PrefixSpan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440

    Article  Google Scholar 

  17. Pei J, Han J, Wang W (2007) Constraint-based sequential pattern mining: the pattern-growth methods. J Intell Inf Syst 28(2):133–160

    Article  Google Scholar 

  18. Pei J, Han J, Mortazavi-asl B, Zhu H (2000) Mining access patterns efficiently from web logs. In PAKDD 2000, LNCS, vol 1805, pp 396–407

    Chapter  Google Scholar 

  19. Rathore KS, Sharma S (2016) Web personalization based on enhanced web access pattern using sequential pattern mining. Int Eng Comput Sci 5(6):17152–17159

    Google Scholar 

  20. Rajimol A, Raju G (2012) Web access pattern mining–a survey. Data Engineering, Management, Lecture Notes in Computer Science, vol 6411. Springer, Berlin, pp 24–31

    Google Scholar 

  21. Srikant R, Agrawal R (1996) Mining sequential patterns: Generalizations and performance improvements. Advances in Database Technology, EDBT’96, pp 1–17

  22. Tang P, Turkia MP, Gallivan KA (2007) Mining web access patterns with first-occurrencelinked WAP-trees. In SEDE’, vol 07, pp 247–252

  23. Thushara Y, Ramesh V (2016) A study of web mining application on E-commerce using google analytics tool. Int J Comput Appl 149(11):21–26

    Google Scholar 

  24. Tran MT, Le B, Vo B (2015) Combination of dynamic bit vectors and transaction information for mining frequent closed sequences efficiently. Eng Appl Artif Intell 38:183–189

    Article  Google Scholar 

  25. Van T, Vo B, Le B (2011) Mining sequential rules based on prefix-tree. In New Challenges for Intelligent Information and Database Systems, pp 147–156

    Chapter  Google Scholar 

  26. Vijayalakshmi S, Mohan V, Suresh RS (2010) Mining of users access behavior for frequent sequential pattern from web logs. Int J Database Manag Syst 2(3):31–45

    Article  Google Scholar 

  27. Vo B, Hong TP, Le B (2012) DBV-Miner: A Dynamic Bit vector approach for fast mining frequent closed itemsets. Expert Syst Appl 39(8):7196–7206

    Article  Google Scholar 

  28. Wu X, Zhu X, He Y, Arslan AN (2013) PMBC: Pattern mining from biological sequences with wildcard constraints. Comput Biol Med 43(5):481–492

    Article  Google Scholar 

  29. Zaki MJ (2000) Sequence mining in categorical domains: incorporating constraints. Proceedings of the 9th International Conference on Information and Knowledge Management, pp 422–429

  30. Zaki MJ (2001) SPADE: An Efficient Algorithm for Mining Frequent Sequences. Mach Learn 42(1):31–60

    Article  Google Scholar 

Download references

Acknowledgments

This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant number 102.05-2015.07.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Trang Van.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Van, T., Yoshitaka, A. & Le, B. Mining web access patterns with super-pattern constraint. Appl Intell 48, 3902–3914 (2018). https://doi.org/10.1007/s10489-018-1182-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-018-1182-6

Keywords

Navigation