skip to main content
10.1145/2611040.2611061acmotherconferencesArticle/Chapter ViewAbstractPublication PageswimsConference Proceedingsconference-collections
research-article

Improving Efficiency of Sequence Mining by Combining First Occurrence Forest (FOF) Strategy and Sibling Principle

Authors Info & Claims
Published:02 June 2014Publication History

ABSTRACT

Sequential pattern mining is one of the basic problems in data mining and it has many applications in web mining. The WAP-Tree (Web Access Pattern Tree) data structure provides a compact representation of single-item sequence databases. WAP-Tree based algorithms have shown notable execution time and memory consumption performance on mining single-item sequence databases. We propose a new algorithm FOF-SP, a WAP-Tree based algorithm which combines an early prunning strategy called "Sibling Principle" from the literature and FOF (First Occurrence Forest) strategy. Experimental results revealed that FOF-SP finds patterns faster than previous WAP-Tree based algorithms PLWAP and FOF. Moreover, FOF-SP can mine patterns faster than PrefixSpan and as fast as LAPIN on real sequence databases from web usage mining and bioinformatics.

References

  1. I. T. Agrawal, R. and A. Swami. Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD Conference on Management of Data, pages 207--216. ACM, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Agrawal and R. Srikant. Mining sequential patterns. In Proceedings of the Eleventh International Conference on Data Engineering (ICDE'95), pages 3--14. IEEE, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Ayres, J. Flannick, J. Gehrke, and T. Yiu. Sequential pattern mining using a bitmap representation. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 429--435. ACM, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. Berendt and M. Spiliopoulou. Analysis of navigation behaviour in web sites integrating multiple information systems. The VLDB journal, 9(1):56--75, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Ezeife and Y. Lu. Mining web log sequential patterns with position coded pre-order linked wap-tree. Data Mining and Knowledge Discovery, 10(1):5--38, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. K. Gouda, M. Hassaan, and M. Zaki. Prism: An effective approach for frequent sequence mining via prime-block encoding. Journal of Computer and System Sciences, 76(1):88--102, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Gupta and J. Han. Applications of pattern discovery using sequential data mining. Pattern Discovery Using Sequence Data Mining: Applications and Studies, pages 1--23, 2011.Google ScholarGoogle Scholar
  8. J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M. Hsu. Freespan: frequent pattern-projected sequential pattern mining. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 355--359. ACM, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Han, J. Pei, and X. Yan. Sequential pattern mining by pattern-growth: Principles and extensions*. Foundations and Advances in Data Mining, pages 183--220, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  10. N. Hariri, B. Mobasher, and R. Burke. Context-aware music recommendation based on latenttopic sequential patterns. In Proceedings of the Sixth ACM Conference on Recommender Systems, RecSys '12, pages 131--138, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. Liu and J. Liu. Mining web log sequential patterns with layer coded breadth-first linked wap-tree. In International Conference of Information Science and Management Engineering (ISME'2010), volume 1, pages 28--31. IEEE, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. N. Mabroukeh and C. Ezeife. A taxonomy of sequential pattern mining algorithms. ACM Computing Surveys (CSUR), 43(1):3, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. F. Masseglia, P. Poncelet, and R. Cicchetti. An efficient algorithm for web usage mining. Networking and Information Systems Journal, 2(5/6):571--604, 2000.Google ScholarGoogle Scholar
  14. C. H. Mooney and J. F. Roddick. Sequential pattern mining -- approaches and algorithms. ACM Comput. Surv., 45(2):19:1--19:39, Mar. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. K. D. Onal and P. Karagoz. Extracting multi-item sequential patterns by wap-tree based approach. In WEBIST 2014-Proceedings of the 10th International Conference on Web Information Systems and Technologies, volume 2, pages 215--222, 2014.Google ScholarGoogle Scholar
  16. J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M. Hsu. Prefixspan: Mining sequential patterns by prefix-projected growth. In ICDE 2001, pages 215--224, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Pei, J. Han, B. Mortazavi-Asl, and H. Zhu. Mining access patterns efficiently from web logs. Knowledge Discovery and Data Mining. Current Issues and New Applications, pages 396--407, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. E. Peterson and P. Tang. Mining frequent sequential patterns with first-occurrence forests. In Proceedings of the 46th Annual Southeast Regional Conference (ACMSE), pages 34--39. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Song, H. Hu, and S. Jin. Hvsm: A new sequential pattern mining algorithm using bitmap representation. Advanced Data Mining and Applications, pages 731--732, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. Advances in Database Technology - EDBT'96, pages 1--17, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. P. Tang, M. Turkia, and K. Gallivan. Mining web access patterns with first-occurrence linked wap-trees. In Proceedings of the 16th International Conference on Software Engineering and Data Engineering (SEDE'07), pages 247--252. Citeseer, 2006.Google ScholarGoogle Scholar
  22. Z. Yang, Y. Wang, and M. Kitsuregawa. Lapin: Effective sequential pattern mining algorithms by last position induction. Rapport technique, Tokyo University, 2005.Google ScholarGoogle Scholar
  23. Z. Yang, Y. Wang, and M. Kitsuregawa. Effective sequential pattern mining algorithms for dense database. In Japanese Data Engineering Workshop (DEWS), 2006.Google ScholarGoogle Scholar
  24. M. Zaki. Spade: An efficient algorithm for mining frequent sequences. Machine Learning, 42(1):31--60, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Improving Efficiency of Sequence Mining by Combining First Occurrence Forest (FOF) Strategy and Sibling Principle

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        WIMS '14: Proceedings of the 4th International Conference on Web Intelligence, Mining and Semantics (WIMS14)
        June 2014
        506 pages
        ISBN:9781450325387
        DOI:10.1145/2611040

        Copyright © 2014 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 2 June 2014

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        WIMS '14 Paper Acceptance Rate41of90submissions,46%Overall Acceptance Rate140of278submissions,50%
      • Article Metrics

        • Downloads (Last 12 months)1
        • Downloads (Last 6 weeks)0

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader