Abstract
Sequential Pattern Mining (SPM) problem is much studied and extended in several directions. With the tremendous growth in the size of datasets, traditional algorithms are not scalable. In order to solve the scalability issue, recently few researchers have developed distributed algorithms based on MapReduce. However, the existing MapReduce algorithms require multiple rounds of MapReduce, which increases communication and scheduling overhead. Also, they do not address the issue of handling long sequences. They generate huge number of candidate sequences that do not appear in the input database and increases the search space. This results in more number of candidate sequences for support counting. Our algorithm is a two phase MapReduce algorithm that generates the promising candidate sequences using the pruning strategies. It also reduces the search space and thus the support computation is effective. We make use of the item co-occurrence information and the proposed Sequence Index List (SIL) data structure helps in computing the support at fast. The experimental results show that the proposed algorithm has better performance over the existing MapReduce algorithms for the SPM problem.












Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agrawal R, Srikant R (1995) Mining Sequential Patterns. In: Proceedings of the Eleventh international conference on data engineering, pp 3–14
Aseervatham S, Osmani A, Viennet E (2006) bitSPADE: a lattice-based sequential pattern mining algorithm using bitmap representation. In: Proceedings of the Sixth international conference on data mining
Ayres J, Flannick J, Gehrke J, Yiu T (2002) Sequential PAttern mining using a bitmap representation. In: Proceedings of the Eighth ACM SIGKDD international conference on knowledge discovery and data mining
Chen CC, Shuai HH, Chen MS (2017) Distributed and scalable sequential pattern mining through stream processing. Knowl Inf Syst 53(2):365–390
Chen CC, Tseng CY, Chen MS (2013) Highly scalable sequential pattern mining based on MapReduce model on the cloud. In: Proceedings of IEEE international congress on big data, pp 310–317
Chen J (2010) An UpDown directed acyclic graph approach for sequential pattern mining. IEEE Trans Knowl Data Eng 22(7):913–928
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51 (1):107–113
Fournier-Viger P, Gomariz A, Campos M, Thomas R (2014) Fast vertical mining of sequential patterns using co-occurrence information. In: Tseng VS, Ho TB, Zhou ZH, Chen ALP, Kao HY (eds) Advances in knowledge discovery and data mining. Springer, Cham, pp 40–52
Fournier-Viger P, Lin JCW, Kiran RU, Koh YS, Thomas R (2017) A survey of sequential pattern mining. Data Science and Pattern Recognition 1(1):54–77
Fumarola F, Lanotte PF, Ceci M, Malerba D (2016) cloFAST: closed sequential pattern mining using sparse and vertical id-lists. Knowl Inf Syst 48(2):429–463
Gomariz A, Campos M, Marin R, Goethals B (2013) claSP: an efficient algorithm for mining frequent closed sequences. In: Pei J, Tseng VS, Cao L, Motoda H, Xu G (eds) Advances in knowledge discovery and data mining, vol 7818. Springer, Heidelberg, pp 50–61
Guralnik V, Karypis G (2004) Parallel tree-projection-based sequence mining algorithms. Parallel Comput 30(4):443–472
Han J, Pei J, Mortazavi-Asl B, Chen Q, Dayal U, Hsu MC (2000) FreeSpan: frequent pattern-projected sequential pattern mining. In: Proceedings of the Sixth ACM SIGKDD international conference on knowledge discovery and data mining, pp 355–359
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a Frequent-Pattern tree approach. Data Min Knowl Disc 8(1):53–87
Hoang T, Le B, Tran MT (2017) Distributed algorithm for sequential pattern mining on a large sequence dataset. In: Proceedings of the Ninth international conference on knowledge and systems engineering, pp 18–23
Huang JW, Lin SC, Chen MS (2010) DPSP: distributed progressive sequential pattern mining on the cloud. In: Zaki MJ, Yu JX, Ravindran B, Pudi V (eds) Advances in knowledge discovery and data mining. Springer, Berlin, pp 27–34
Huynh B, Vo B, Snasel V (2017) An efficient method for mining frequent sequential patterns using multi-Core processors. Appl Intell 46(3):703–716
Kieu T, Vo B, Le T, Deng ZH, Le B (2017) Mining top-k co-occurrence items with sequential pattern. Expert Syst Appl 85(1):123–133
Mabroukeh NR, Ezeife CI (2010) A taxonomy of sequential pattern mining algorithms. ACM Comput Surv 43(1):3:1–3:41
Masseglia F, Cathala F, Poncelet P (1998) The PSP approach for mining sequential patterns. In: Proceedings of the Second European symposium on principles of data mining and knowledge discovery, Lect Notes Comput Sci, vol 1510, pp 176–184
Miliaraki I, Berberich K, Gemulla R, Zoupanos S (2013) Mind the gap: large-scale frequent sequence mining. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data, pp 797–808
Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu MC (2004) Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440
Salvemini E, Fumarola F, Malerba D, Han J (2011) FAST sequence mining based on sparse Id-Lists. In: Kryszkiewicz M, Rybinski H, Skowron A, Ras ZW (eds) Foundations of intelligent systems. Springer, Berlin, pp 316–325
Shintani T, Kitsuregawa M (1998) Mining algorithms for sequential patterns in parallel : hash based approach. In: Wu X, Kotagiri R, Korb KB (eds) Research and development in knowledge discovery and data mining, vol 1394. Springer, Berlin, pp 283–294
Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Proceedings of the Fifth international conference on extending database technology, vol 1057, pp 3–17
Wang J, Huang JL, Chen YC (2016) On efficiently mining high utility sequential patterns knowledge information systems. https://doi.org/10.1007/s10115-015-0914-8
Wang X, Wang J, Wang T, Li H, Yang D (2010) Parallel sequential pattern mining by transaction decomposition. In: Proceedings of the Seventh international conference on fuzzy systems and knowledge discovery, pp 1746–1750
White T (2015) Hadoop: The Definitive guide, fourth edn O’Reilly Media
Yang Z, Kitsuregawa M (2005) LAPIN-SPAM: an improved algorithm for mining sequential pattern. In: Proceedings of the 21st international conference on data engineering
Yang Z, Wang Y, Kitsuregawa M (2007) LAPIN: Effective sequential pattern mining algorithms by last position induction for dense databases. In: Kotagiri R, Krishna PR, Mohania M, Nantajeewarawat E (eds) Advances in databases: concepts, systems and applications, vol 4443. Springer, Berlin, pp 1020–1023
Yong-qing W, Dong L, Lin-shan D (2012) Distributed prefixspan algorithm based on MapReduce. In: Proceedings of 2012 internatioanl symposium on information technology in medicine and education, pp 901–904
Yu X, Liu J, Liu X, Ma C, Li B (2015) A MapReduce reinforced distributed sequential pattern mining algorithm. In: Wang G, Zomaya A, Martinez G, Li K (eds) Algorithms and architectures for parallel processing, vol 9529. Springer, Cham, pp 183– 197
Zaki MJ (2001) Parallel sequence mining on Shared-Memory machines. J Parallel Distrib Comput 61(3):401–426
Zaki MJ (2001) SPADE: An efficient algorithm for mining frequent sequences. Mach Learn 42(1-2):31–60
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Saleti, S., Subramanyam, R.B.V. A novel mapreduce algorithm for distributed mining of sequential patterns using co-occurrence information. Appl Intell 49, 150–171 (2019). https://doi.org/10.1007/s10489-018-1259-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-018-1259-2