Abstract
The problem of mining frequent sequential patterns (FSPs) has attracted a great deal of research attention. Although there are many efficient algorithms for mining FSPs, the mining time is still high, especially for large or dense datasets. Parallel processing has been widely applied to improve processing speed for various problems. Some parallel algorithms have been proposed, but most of them have problems related to synchronization and load balancing. Based on a multi-core processor architecture, this paper proposes a load-balancing parallel approach called Parallel Dynamic Bit Vector Sequential Pattern Mining (pDBV-SPM) for mining FSPs from huge datasets using the dynamic bit vector data structure for fast determining support values. In the pDBV-SPM approach, the support count is sorted in ascending order before the set of frequent 1-sequences is partitioned into parts, each of which is assigned to a task on a processor so that most of the nodes in the leftmost branches will be infrequent and thus pruned during the search; this strategy helps to better balance the search tree. Experiments are conducted to verify the effectiveness of pDBV-SPM. The experimental results show that the proposed algorithm outperforms PIB-PRISM for mining FSPs in terms of mining time and memory usage.
















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agrawal R, Srikant R (1995) Mining Sequential Patterns. ICDE’95:3–14
Agrawal R, Srikant R (1996a) Mining Sequential Patterns: Generalizations and Performance Improvements. EDBT’96:3–17
Andrew B (2008) Multi-Core Processor Architecture Explained. http://software.intel.com/en-us/articles/multi-core-processor-architecture-explained. Accessed 20 Aug 2014
Ayres J, Gehrke J, Yiu T, Flannick J (2002) Sequential Pattern Mining using a Bitmap Representaion. SIGKDD’02:1–7
Casali A, Ernst C (2013) Extracting Correlated Patterns on Multicore Architectures. CD-ARES’13:118–133
Cong S, Han J, Padua D (2005) Parallel Mining of Closed Sequential Patterns. ACM SIGKDD’05:562–567
Flouri T, Iliopoulos C, Park K, Pissis S (2012) GapMis-OMP: Pairwise Short-Read Alignment on Multi-core Architectures. Artificial Intelligence Applications and Innovations 382:593–601
Fournier-Viger P, Gomariz A, Campos M, Thomas R (2014) Fast Vertical Mining of Sequential Patterns Using Co-occurrence Information. PAKDD’14:40–52
Gouda K, Hassaan M, Zaki M (2010) Prism: An Effective Approach for Frequent Sequence Mining via Prime-Block Encoding. J Comput Syst Sci 76(1):88–102
Han J, Pei J, Yin Y (2000a) Mining Frequent Patterns Without Candiyear Generation. ACM SIGMOD:1–12
Han J, Pei J, Asl BM, Chen Q, Dayal U, Hsu M (2000b) Freespan: Frequent Pattern-Projected Sequential Pattern Mining. KDD’00:355–359
Huynh B, Vo B (2015) Using Multi-Core Processors for Mining Frequent Sequential Patterns. ICIC Express Letters 9(11):3071–3079
Laurent A, Négrevergne B, Sicard N, Termier A (2012) Efficient Parallel Mining of Gradual Patterns on Multicore Processors. Advances in Knowledge Discovery and Management 398:137–151
Liu L, Li E, Zhang Y, Tang Z (2007) Optimization of Frequent Itemset Mining on Multiple-Core Processor. VLDB ’07:1275–1285
Lo D, Khoo SC, Liu C (2008) Mining and Ranking Generators of Sequential Patterns. SDM’08:553–564
Masseglia F, Cathala F, Poncelet P (1998) The PSP Approach for Mining Sequential Patterns. PKDD’98:176–184
Mannila H, Toivonen H, Verkamo AI (1997) Discovery of Frequent Episodes in Event Sequences. Data Min Knowl Disc:259–289
Negrevergne B, Termier A, Méhaut JF, Uno T (2010) Discovering Closed Frequent Itemsets on Multicore: Parallelizing Computations and Optimizing Memory Accesses. HPCS’10 IEEE:521–528
Negrevergne B, Termier A, Rousset MC, Méhaut J F (2014) Para Miner: A Generic Pattern Mining Algorithm for Multi-Core Architectures. Data Min Knowl Disc 28(3):593–633. http://link.springer.com/article/10.1007/s10618-013-0313-2
Nguyen D, Vo B, Le B (2014) Efficient Strategies for Parallel Mining Class Association Rules. Expert Systems with Applications 41(10):4716–4729
Pham T, Luo J, Vo B (2013) An Effective Algorithm for Mining Closed Sequential Patterns and Their Minimal Generators based on Prefix Trees. Int J Intell Inf Database Syst 7(4):324–339
Pham T, Luo J, Hong TP, Vo B (2014) An Efficient Method for Mining Non-Redundant Sequential Rules using Attributed Prefix Trees. Eng Appl Artif Intell 32:88–99
Pei J, Han J, Asl BM, Wang J, Pinto H, Chen Q, Dayal U, Hsu M (2004) Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach. IEEE Trans Knowl Data Eng 16(11):1424–1440
Raza K (2013) Application of Data Mining In Bioinformatics. Indian J Comput Sci Engineer 1(2):114–118
Sánchez F, Cabarcas F, Ramirez A, Valero M (2010) Long DNA Sequence Comparison on Multicore Architectures. Euro-Par 2010 - Parallel Process 6272:247–259
Schlegel B, Karnagel T, Kiefer T, Lehner W (2013) Scalable frequent itemset mining on many-core processors. In: The 9th International Workshop on Data Management on New Hardware ACM Article No. 3
Tran T, Le B, Vo B (2015) Combination of Dynamic Bit Vectors and Transaction Information for Mining Frequent Closed Sequences Efficiently. Eng Appl Artif Intell 38:183–189
Van T, Vo B, Le B (2014) IMSRPreTree: An Improved Algorithm for Mining Sequential Rules based on The Prefix-Tree. Vietnam. J Comput Sci 1(2):97–105
Vijayarani S, Deepa S (2014) An Efficient Algorithm for Sequence Generation in Data Mining. Int J Cybernetics & Inf 3(1):21–30
Vo B, Hong TP, Le B (2012) DBV-Miner: A dynamic bit-vector approach for fast mining frequent closed itemsets. Expert Systems With Applications 39(8):7196–7206
Wang W, Yang J (2005) Mining Sequential Patterns from Large Data Sets. Adv Database Syst 28:1–161
Wang J, Han J (2004) BIDE: Efficient Mining of Frequent Closed Sequences. In: ICDE ’04:79–90
Wanga CS, Lee AJT (2009) Mining Inter-Sequence Patterns. Expert Systems with Aplications 36 (4):8649–8658
Weichbroth P, Owoc M, Pleszkun M (2012) Web User Navigation Patterns Discovery from WWW Server Log Files. FedCSIS’12:1177–1176
Yan X, Han J, Afshar R (2003) CloSpan: Mining Closed Sequential Patterns in Large Datasets. In: SDM’03:166–177
Yu KM, Wu SH (2011) An Efficient Load Balancing Multi-Core Frequent Patterns Mining Algorithm. In: TrustCom’11:1408–1412
Zaki J (2001a) SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning Journal 42:31–60
Zaki J (2001b) Parallel Sequence Mining on Shared-Memory Machines. J Parallel Distrib Comput 61(3):401–426
Zaki J, Wang TL, Toivonen TT (2002) BIOKDD01: Workshop on Data Mining in Bioinformatics. In: ACM SIGKDD Explorations, 3(2):71–73
Zubi ZS, Raiani MSE (2014) Using Web Logs Dataset Via Web Mining for User Behavior Understanding. Int J Comput Comm 8:103–111
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Huynh, B., Vo, B. & Snasel, V. An efficient method for mining frequent sequential patterns using multi-Core processors. Appl Intell 46, 703–716 (2017). https://doi.org/10.1007/s10489-016-0859-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-016-0859-y