Abstract
Point and click at web pages generate continuous data sequences, which flow into the web log data, causing the need to update previously mined web sequential patterns. Algorithms for mining web sequential patterns from scratch include WAP, PLWAP and Apriori-based GSP. Reusing old patterns with only recent additional data sequences in an incremental fashion, when updating patterns, would achieve fast response time with reasonable memory space usage. This paper proposes two algorithms, RePL4UP (Revised PLWAP For UPdate), and PL4UP (PLWAP For UPdate), which use the PLWAP tree structure to incrementally update web sequential patterns efficiently without scanning the whole database even when previous small items become frequent. The RePL4UP concisely stores the position codes of small items in the database sequences in its metadata during tree construction. During mining, RePL4UP scans only the new additional database sequences, revises the old PLWAP tree to restore information on previous small items that have become frequent, while it deletes previous frequent items that have become small using the small item position codes. PL4UP initially builds a bigger PLWAP tree that includes all sequences in the database using a tolerance support, t, that is lower than the regular minimum support, s. The position code features of the PLWAP tree are used to efficiently mine these trees to extract current frequent patterns when the database is updated. These approaches more quickly update old frequent patterns without the need to re-scan the entire updated database.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the 11th Int’l conference on data engineering, Taipei, pp 3–14
Berendt B, Spiliopoulou M (2000) Analyzing navigation behavior in web sites integrating multiple information systems. VLDB Journal, Special Issue on Databases and the Web 9(1): 56–75
Cheung H, Yan X, Han J (2004) IncSpan: incremental mining of sequential patterns. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, Seattle, pp 527–532
Cheung DW, Han J, Ng VT, Wong CY (1996) Maintenance of discovered association rules in large database: an incremental updating technique. In: Proceedings of the 12th international conference on data Engineering, New Orleans
Cheung D, Kao B, Lee J (1997) Discovering user access patterns on the world wide web. In: Proceedings of the 1st Pacific-Asia conference on knowledge discovery and data mining (PAKDD’97)
El-Sayed M, Carolina R, Elke AR (2004) FS-miner: efficient and incremental mining of frequent sequence patterns in web logs. In: Proceedings of the 6th ACM international workshop on web information and data management, Washington DC, pp 128–135
Ezeife CI, Chen M (2004a) Mining web sequential patterns incrementally with revised PLWAP tree. In: Proceedings of the fifth international conference on web-age information management (WAIM 2004) Dalian, published in LNCS by Springer, pp 539–548
Ezeife CI, Chen M (2004b) Incremental mining of web sequential patterns using PLWAP tree on tolerance minsupport. In: Proceedings of the IEEE 8th international database engineering and applications symposium (IDEAS04), Coimbra, pp 465–479
Ezeife CI, Lu Y (2005) Mining web log sequential patterns with position coded pre-order linked WAP-tree. Int J Data Mining Knowl Discov, Kluwer Acad Publ 10: 5–38
Ezeife CI, Lu Yi, Liu Yi (2005) PLWAP sequential mining: open source code proceedings of the open source data mining workshop on frequent pattern mining implementations, in conjunction with ACM SIGKDD, Chicago, August 21–24, pp 26–29
Han J, Kamber M (2001) Data mining: concepts and techniques Morgan Kaufmann
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Int J Data Mining Knowl Discov, Kluwer Acad Publ 8(1): 53–87
Kao B, Zhang M, Yi C-L, Cheung DW (2005) Efficient algorithms for mining and incremental update of maximal frequent sequences. Int J Data Mining Knowl Discov, Springer Sci Publ 10: 87–116
Lee Y-S, Yen S-J (2008) Incremental and interactive mining of web traversal patterns. Inform Sci 178(2): 287–306
Liu J-W, Yu S-J, Le J-J (2003) Online mining dynamic web news patterns using machine learn methods. FSKD Conference, Springer Lecture Notes in AI 3614, pp 462–465
Lu Yi, Ezeife CI (2003) Position coded pre-order linked WAP-tree for web log sequential pattern mining. In: Proceedings of the 7th Pacific-Asia conference on knowledge discovery and data mining (PAKDD 2003), Seoul, Korea
Masseglia F, Poncelet P, Cicchetti R (1999) An efficient algorithm for web usage mining. Netw Inform Syst J 2(5–6): 571–603
Masseglia F, Poncelet P, Teisseire M (2003) Incremental mining of sequential patterns in large databases. Data Knowl Eng 46(1): 97–121
Nanopoulos A, Manolopoulos Y (2000) Finding generalized path patterns for web log data mining. Data Knowl Eng 37(3): 243–266
Nanopoulos A, Manolopoulos Y (2001) Mining patterns from graph traversals. Data Knowl Eng 37(3): 243–266
Nguyen S, Sun X, Orlowska M (2005) Improvements of incSpan: incremental mining of sequential patterns in large database. In: Proceedings 2000 Pacific-Asia conference on knowledge discovery and data mining (PAKDD’05), pp 442–451
Ou J-C, Lee C-H, Chen M-S (2008) Incremental web log mining with dynamic threshold. VLDBJ 17: 827–847
Parthasarathy S, Zaki MJ, Ogihara M, Dwarkadas S (1999) Incremental and interactive sequence mining. In: Proceedings of the 8th international conference on information and knowledge management (CIKM99), Kansas City, pp 251– 258
Pei J, Han J, Mortazavi-asl B, Zhu H (2000) Mining access patterns efficiently from web logs. In: proceedings 2000 Pacific-Asia conference on knowledge discovery and data mining (PAKDD’00), Kyoto, pp 396–407
Pei J, Han J, Mortazavi-Asl B, Pinto H (2001) PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: The proceedings of the 2001 international conference on data engineering (ICDE ’01), pp 215–224
Srikant R, Agrawal R (1995) Mining generalized association rules. In: Proceedings of the 21st int’l conference on very large databases (VLDB), Zurich
Spiliopoulou M (1999) The laborious way from data mining to web mining. J Comput Syst Sci Eng, Special Issue Semant Web 14: 113–126
Tang P, Turkia M (2007) Mining frequent web access patterns with partial enumerations. 45th ACM Annual Southeast Regional Conference, 23–24 March 2007, Winston-Salem, N.Carolina, pp 226–231
Wang K (1997) Discovering patterns from large and dynamic sequential data. J Intell Inform Syst 9(1): 33–56
Wang K, Tan J (1996) Incremental discovery of sequential patterns. In: Proceedings of the ACM workshop on research issues on data mining and knowledge discovery, Montreal
Yen S-J, Lee Y-S (2006) An incremental data mining algorithm for discovering web access patterns. Int J Bus Intell Data Mining 1(3): 288–303
Zaki MJ (2000) SPADE: an efficient algorithm for mining frequent sequences. Mach Learn 42: 31–60
Zhang M, Kao B, Cheung D, Yip C-L (2002) Efficient algorithms for incremental update of frequent sequences. In: Proceedings of the sixth Pacific-Asia conference on knowledge discovery and data mining (PAKDD), pp 186–197
Zhang M, Kao B, Yip C-L (2002) A comparison study on algorithms for incremental update of frequent sequences. In: Proceedings of the IEEE international conference on data mining ICDM, pp 554–561
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Eamonn Keogh.
An erratum to this article can be found at http://dx.doi.org/10.1007/s10618-009-0144-3
Rights and permissions
About this article
Cite this article
Ezeife, C.I., Liu, Y. Fast incremental mining of web sequential patterns with PLWAP tree. Data Min Knowl Disc 19, 376–416 (2009). https://doi.org/10.1007/s10618-009-0133-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-009-0133-6