ABSTRACT
Sequential pattern mining is one of the basic problems in data mining and it has many applications in web mining. The WAP-Tree (Web Access Pattern Tree) data structure provides a compact representation of single-item sequence databases. WAP-Tree based algorithms have shown notable execution time and memory consumption performance on mining single-item sequence databases. We propose a new algorithm FOF-SP, a WAP-Tree based algorithm which combines an early prunning strategy called "Sibling Principle" from the literature and FOF (First Occurrence Forest) strategy. Experimental results revealed that FOF-SP finds patterns faster than previous WAP-Tree based algorithms PLWAP and FOF. Moreover, FOF-SP can mine patterns faster than PrefixSpan and as fast as LAPIN on real sequence databases from web usage mining and bioinformatics.
- I. T. Agrawal, R. and A. Swami. Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD Conference on Management of Data, pages 207--216. ACM, 1993. Google ScholarDigital Library
- R. Agrawal and R. Srikant. Mining sequential patterns. In Proceedings of the Eleventh International Conference on Data Engineering (ICDE'95), pages 3--14. IEEE, 1995. Google ScholarDigital Library
- J. Ayres, J. Flannick, J. Gehrke, and T. Yiu. Sequential pattern mining using a bitmap representation. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 429--435. ACM, 2002. Google ScholarDigital Library
- B. Berendt and M. Spiliopoulou. Analysis of navigation behaviour in web sites integrating multiple information systems. The VLDB journal, 9(1):56--75, 2000. Google ScholarDigital Library
- C. Ezeife and Y. Lu. Mining web log sequential patterns with position coded pre-order linked wap-tree. Data Mining and Knowledge Discovery, 10(1):5--38, 2005. Google ScholarDigital Library
- K. Gouda, M. Hassaan, and M. Zaki. Prism: An effective approach for frequent sequence mining via prime-block encoding. Journal of Computer and System Sciences, 76(1):88--102, 2010. Google ScholarDigital Library
- M. Gupta and J. Han. Applications of pattern discovery using sequential data mining. Pattern Discovery Using Sequence Data Mining: Applications and Studies, pages 1--23, 2011.Google Scholar
- J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M. Hsu. Freespan: frequent pattern-projected sequential pattern mining. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 355--359. ACM, 2000. Google ScholarDigital Library
- J. Han, J. Pei, and X. Yan. Sequential pattern mining by pattern-growth: Principles and extensions*. Foundations and Advances in Data Mining, pages 183--220, 2005.Google ScholarCross Ref
- N. Hariri, B. Mobasher, and R. Burke. Context-aware music recommendation based on latenttopic sequential patterns. In Proceedings of the Sixth ACM Conference on Recommender Systems, RecSys '12, pages 131--138, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- L. Liu and J. Liu. Mining web log sequential patterns with layer coded breadth-first linked wap-tree. In International Conference of Information Science and Management Engineering (ISME'2010), volume 1, pages 28--31. IEEE, 2010. Google ScholarDigital Library
- N. Mabroukeh and C. Ezeife. A taxonomy of sequential pattern mining algorithms. ACM Computing Surveys (CSUR), 43(1):3, 2010. Google ScholarDigital Library
- F. Masseglia, P. Poncelet, and R. Cicchetti. An efficient algorithm for web usage mining. Networking and Information Systems Journal, 2(5/6):571--604, 2000.Google Scholar
- C. H. Mooney and J. F. Roddick. Sequential pattern mining -- approaches and algorithms. ACM Comput. Surv., 45(2):19:1--19:39, Mar. 2013. Google ScholarDigital Library
- K. D. Onal and P. Karagoz. Extracting multi-item sequential patterns by wap-tree based approach. In WEBIST 2014-Proceedings of the 10th International Conference on Web Information Systems and Technologies, volume 2, pages 215--222, 2014.Google Scholar
- J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M. Hsu. Prefixspan: Mining sequential patterns by prefix-projected growth. In ICDE 2001, pages 215--224, 2001. Google ScholarDigital Library
- J. Pei, J. Han, B. Mortazavi-Asl, and H. Zhu. Mining access patterns efficiently from web logs. Knowledge Discovery and Data Mining. Current Issues and New Applications, pages 396--407, 2000. Google ScholarDigital Library
- E. Peterson and P. Tang. Mining frequent sequential patterns with first-occurrence forests. In Proceedings of the 46th Annual Southeast Regional Conference (ACMSE), pages 34--39. ACM, 2008. Google ScholarDigital Library
- S. Song, H. Hu, and S. Jin. Hvsm: A new sequential pattern mining algorithm using bitmap representation. Advanced Data Mining and Applications, pages 731--732, 2005. Google ScholarDigital Library
- R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. Advances in Database Technology - EDBT'96, pages 1--17, 1996. Google ScholarDigital Library
- P. Tang, M. Turkia, and K. Gallivan. Mining web access patterns with first-occurrence linked wap-trees. In Proceedings of the 16th International Conference on Software Engineering and Data Engineering (SEDE'07), pages 247--252. Citeseer, 2006.Google Scholar
- Z. Yang, Y. Wang, and M. Kitsuregawa. Lapin: Effective sequential pattern mining algorithms by last position induction. Rapport technique, Tokyo University, 2005.Google Scholar
- Z. Yang, Y. Wang, and M. Kitsuregawa. Effective sequential pattern mining algorithms for dense database. In Japanese Data Engineering Workshop (DEWS), 2006.Google Scholar
- M. Zaki. Spade: An efficient algorithm for mining frequent sequences. Machine Learning, 42(1):31--60, 2001. Google ScholarDigital Library
Index Terms
- Improving Efficiency of Sequence Mining by Combining First Occurrence Forest (FOF) Strategy and Sibling Principle
Recommendations
A novel Boolean algebraic framework for association and pattern mining
Data mining has been defined as the non- trivial extraction of implicit, previously unknown and potentially useful information from data. Association mining and sequential mining analysis are considered as crucial components of strategic control over a ...
Utility of Mining Algorithms in Blogs Searching Efficiency
CICN '11: Proceedings of the 2011 International Conference on Computational Intelligence and Communication NetworksThe aim of this paper is focus discovering frequent patterns in Web log data and to obtain information about the navigational behavior of the users. Frequent pattern mining is researched area in the area of data mining. Using frequent pattern discovery ...
A Boolean algebraic framework for association and pattern mining
ICCOMP'08: Proceedings of the 12th WSEAS international conference on ComputersData mining and sequential mining analysis are considered as crucial components of strategic control over a broad variety of disciplines in business, science and engineering. Data mining has been defined as the non- trivial extraction of implicit, ...
Comments