Skip to main content
Log in

An efficient method for mining frequent sequential patterns using multi-Core processors

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The problem of mining frequent sequential patterns (FSPs) has attracted a great deal of research attention. Although there are many efficient algorithms for mining FSPs, the mining time is still high, especially for large or dense datasets. Parallel processing has been widely applied to improve processing speed for various problems. Some parallel algorithms have been proposed, but most of them have problems related to synchronization and load balancing. Based on a multi-core processor architecture, this paper proposes a load-balancing parallel approach called Parallel Dynamic Bit Vector Sequential Pattern Mining (pDBV-SPM) for mining FSPs from huge datasets using the dynamic bit vector data structure for fast determining support values. In the pDBV-SPM approach, the support count is sorted in ascending order before the set of frequent 1-sequences is partitioned into parts, each of which is assigned to a task on a processor so that most of the nodes in the leftmost branches will be infrequent and thus pruned during the search; this strategy helps to better balance the search tree. Experiments are conducted to verify the effectiveness of pDBV-SPM. The experimental results show that the proposed algorithm outperforms PIB-PRISM for mining FSPs in terms of mining time and memory usage.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  1. Agrawal R, Srikant R (1995) Mining Sequential Patterns. ICDE’95:3–14

  2. Agrawal R, Srikant R (1996a) Mining Sequential Patterns: Generalizations and Performance Improvements. EDBT’96:3–17

  3. Andrew B (2008) Multi-Core Processor Architecture Explained. http://software.intel.com/en-us/articles/multi-core-processor-architecture-explained. Accessed 20 Aug 2014

  4. Ayres J, Gehrke J, Yiu T, Flannick J (2002) Sequential Pattern Mining using a Bitmap Representaion. SIGKDD’02:1–7

  5. Casali A, Ernst C (2013) Extracting Correlated Patterns on Multicore Architectures. CD-ARES’13:118–133

  6. Cong S, Han J, Padua D (2005) Parallel Mining of Closed Sequential Patterns. ACM SIGKDD’05:562–567

  7. Flouri T, Iliopoulos C, Park K, Pissis S (2012) GapMis-OMP: Pairwise Short-Read Alignment on Multi-core Architectures. Artificial Intelligence Applications and Innovations 382:593–601

    Article  Google Scholar 

  8. Fournier-Viger P, Gomariz A, Campos M, Thomas R (2014) Fast Vertical Mining of Sequential Patterns Using Co-occurrence Information. PAKDD’14:40–52

  9. Gouda K, Hassaan M, Zaki M (2010) Prism: An Effective Approach for Frequent Sequence Mining via Prime-Block Encoding. J Comput Syst Sci 76(1):88–102

    Article  MathSciNet  MATH  Google Scholar 

  10. Han J, Pei J, Yin Y (2000a) Mining Frequent Patterns Without Candiyear Generation. ACM SIGMOD:1–12

  11. Han J, Pei J, Asl BM, Chen Q, Dayal U, Hsu M (2000b) Freespan: Frequent Pattern-Projected Sequential Pattern Mining. KDD’00:355–359

  12. Huynh B, Vo B (2015) Using Multi-Core Processors for Mining Frequent Sequential Patterns. ICIC Express Letters 9(11):3071–3079

    Google Scholar 

  13. Laurent A, Négrevergne B, Sicard N, Termier A (2012) Efficient Parallel Mining of Gradual Patterns on Multicore Processors. Advances in Knowledge Discovery and Management 398:137–151

    Article  Google Scholar 

  14. Liu L, Li E, Zhang Y, Tang Z (2007) Optimization of Frequent Itemset Mining on Multiple-Core Processor. VLDB ’07:1275–1285

  15. Lo D, Khoo SC, Liu C (2008) Mining and Ranking Generators of Sequential Patterns. SDM’08:553–564

  16. Masseglia F, Cathala F, Poncelet P (1998) The PSP Approach for Mining Sequential Patterns. PKDD’98:176–184

  17. Mannila H, Toivonen H, Verkamo AI (1997) Discovery of Frequent Episodes in Event Sequences. Data Min Knowl Disc:259–289

  18. Negrevergne B, Termier A, Méhaut JF, Uno T (2010) Discovering Closed Frequent Itemsets on Multicore: Parallelizing Computations and Optimizing Memory Accesses. HPCS’10 IEEE:521–528

  19. Negrevergne B, Termier A, Rousset MC, Méhaut J F (2014) Para Miner: A Generic Pattern Mining Algorithm for Multi-Core Architectures. Data Min Knowl Disc 28(3):593–633. http://link.springer.com/article/10.1007/s10618-013-0313-2

    Article  MATH  Google Scholar 

  20. Nguyen D, Vo B, Le B (2014) Efficient Strategies for Parallel Mining Class Association Rules. Expert Systems with Applications 41(10):4716–4729

    Article  Google Scholar 

  21. Pham T, Luo J, Vo B (2013) An Effective Algorithm for Mining Closed Sequential Patterns and Their Minimal Generators based on Prefix Trees. Int J Intell Inf Database Syst 7(4):324–339

    Google Scholar 

  22. Pham T, Luo J, Hong TP, Vo B (2014) An Efficient Method for Mining Non-Redundant Sequential Rules using Attributed Prefix Trees. Eng Appl Artif Intell 32:88–99

    Article  Google Scholar 

  23. Pei J, Han J, Asl BM, Wang J, Pinto H, Chen Q, Dayal U, Hsu M (2004) Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach. IEEE Trans Knowl Data Eng 16(11):1424–1440

    Article  Google Scholar 

  24. Raza K (2013) Application of Data Mining In Bioinformatics. Indian J Comput Sci Engineer 1(2):114–118

    Google Scholar 

  25. Sánchez F, Cabarcas F, Ramirez A, Valero M (2010) Long DNA Sequence Comparison on Multicore Architectures. Euro-Par 2010 - Parallel Process 6272:247–259

    Article  Google Scholar 

  26. Schlegel B, Karnagel T, Kiefer T, Lehner W (2013) Scalable frequent itemset mining on many-core processors. In: The 9th International Workshop on Data Management on New Hardware ACM Article No. 3

  27. Tran T, Le B, Vo B (2015) Combination of Dynamic Bit Vectors and Transaction Information for Mining Frequent Closed Sequences Efficiently. Eng Appl Artif Intell 38:183–189

    Article  Google Scholar 

  28. Van T, Vo B, Le B (2014) IMSRPreTree: An Improved Algorithm for Mining Sequential Rules based on The Prefix-Tree. Vietnam. J Comput Sci 1(2):97–105

  29. Vijayarani S, Deepa S (2014) An Efficient Algorithm for Sequence Generation in Data Mining. Int J Cybernetics & Inf 3(1):21–30

    Article  Google Scholar 

  30. Vo B, Hong TP, Le B (2012) DBV-Miner: A dynamic bit-vector approach for fast mining frequent closed itemsets. Expert Systems With Applications 39(8):7196–7206

    Article  Google Scholar 

  31. Wang W, Yang J (2005) Mining Sequential Patterns from Large Data Sets. Adv Database Syst 28:1–161

    Article  MATH  Google Scholar 

  32. Wang J, Han J (2004) BIDE: Efficient Mining of Frequent Closed Sequences. In: ICDE ’04:79–90

  33. Wanga CS, Lee AJT (2009) Mining Inter-Sequence Patterns. Expert Systems with Aplications 36 (4):8649–8658

    Article  Google Scholar 

  34. Weichbroth P, Owoc M, Pleszkun M (2012) Web User Navigation Patterns Discovery from WWW Server Log Files. FedCSIS’12:1177–1176

  35. Yan X, Han J, Afshar R (2003) CloSpan: Mining Closed Sequential Patterns in Large Datasets. In: SDM’03:166–177

  36. Yu KM, Wu SH (2011) An Efficient Load Balancing Multi-Core Frequent Patterns Mining Algorithm. In: TrustCom’11:1408–1412

  37. Zaki J (2001a) SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning Journal 42:31–60

    Article  MATH  Google Scholar 

  38. Zaki J (2001b) Parallel Sequence Mining on Shared-Memory Machines. J Parallel Distrib Comput 61(3):401–426

    Article  MATH  Google Scholar 

  39. Zaki J, Wang TL, Toivonen TT (2002) BIOKDD01: Workshop on Data Mining in Bioinformatics. In: ACM SIGKDD Explorations, 3(2):71–73

  40. Zubi ZS, Raiani MSE (2014) Using Web Logs Dataset Via Web Mining for User Behavior Understanding. Int J Comput Comm 8:103–111

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bay Vo.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huynh, B., Vo, B. & Snasel, V. An efficient method for mining frequent sequential patterns using multi-Core processors. Appl Intell 46, 703–716 (2017). https://doi.org/10.1007/s10489-016-0859-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-016-0859-y

Keywords

Navigation