Skip to main content

Advertisement

Log in

Hierarchical Pattern Mining with the Automata Processor

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Mining complex patterns with hierarchical structures becomes more and more important to understand the underlying information in large and unstructured databases. When compared with a set-mining problem or a string-mining problem, the computation complexity to recognize a pattern with hierarchical structure, and the large associated search space, make hierarchical pattern mining (HPM) extremely expensive on conventional processor architectures. We propose a flexible, hardware-accelerated framework for mining hierarchical patterns with Apriori-based algorithms, which leads to multi-pass pruning strategies but exposes massive parallelism. Under this framework, we implemented two widely used HPM techniques, sequential pattern mining (SPM) and disjunctive rule mining (DRM) on the Automata Processor (AP), a hardware implementation of non-deterministic finite automata (NFAs). Two automaton-design strategies for matching and counting different types of hierarchical patterns, called linear design and reduction design, are proposed in this paper. To generalize automaton structure for SPM, the linear design strategy is proposed by flattening sequential patterns to plain strings to produce automaton design space and to minimize the overhead of reconfiguration. Up to 90\(\times \) and 29\(\times \) speedups are achieved by the AP-accelerated algorithm on six real-world datasets, when compared with the optimized multicore CPU and GPU GSP implementations, respectively. The proposed CPU-AP solution also outperforms the state-of-the-art PrefixSpan and SPADE algorithms on a multicore CPU by up to 452\(\times \) and 49\(\times \) speedups. The AP advantage grows further with larger datasets. For DRM, the reduction design strategy is adopted by applying reduction operation of AND, with on-chip Boolean units, on several parallel sub-structures for recognizing disjunctive items. This strategy allows implicit OR reduction on alternative items within a disjunctive item by utilizing bit-wise parallelism feature of the on-chip state units. The experiments show up to 614\(\times \) speedups of the proposed CPU-AP DRM solution over a sequential CPU algorithm on two real-world datasets. The experiments also show significant increase of CPU matching-and-counting time when increasing d-rule size or the number of alternative items. However, in practical cases, the AP solution runs hundreds of times faster in matching and counting than the CPU solution, and keeps constant processing time despite the increasing complexity of disjunctive rules.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Frequent itemset mining dataset repository. http://fimi.ua.ac.be/data/

  2. (2015) Micron Automata Processor website. http://www.micronautomata.com/documentation

  3. Aggarwal, C.C., Han, J. (eds.): Frequent Pattern Mining. Springer, Cham (2014)

    MATH  Google Scholar 

  4. Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proc. of the International Conference on Data Engineering (ICDE), IEEE, pp. 3–14 (1995)

  5. Agrawal, R., Imieliński ,T., Swami, A.: Mining association rules between sets of items in large databases. In: Proc. of SIGMOD ’93 (1993)

  6. Bo, C. et al.: Entity resolution acceleration using microns automata processor. In: Proc. of the International Conference on Big Data (BigData) (2016)

  7. Chiang, D.A., Wang, Y.F., Wang, Y.H., Chen, Z.Y., Hsu, M.H.: Mining disjunctive consequent association rules. Appl. Soft Comput. 11(2), 2129–2133 (2011)

    Article  Google Scholar 

  8. Cong, S., Han, J., Hoeflinger, J., Padua, D.: A sampling-based framework for parallel data mining. In: Proc. of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), ACM (2005)

  9. Dlugosch, P., et al.: An efficient and scalable semiconductor architecture for parallel automata processing. IEEE Trans. Parallel Distrib. Syst. 25(12), 3088–3098 (2014)

    Article  Google Scholar 

  10. Fang, W. et al.: Frequent itemset mining on graphics processors. In: Proc. International Workshop on Data Management on New Hardware (DaMoN) (2009)

  11. Fournier-Viger, P., et al.: Spmf: a Java open-source pattern mining library. J. Mach. Learn. Res. 15, 3569–3573 (2014)

    MATH  Google Scholar 

  12. Guralnik, V., Karypis, G.: Parallel tree-projection-based sequence mining algorithms. Parallel Comput. 30(4), 443–472 (2004)

    Article  Google Scholar 

  13. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proc. of SIGMOD ’00, ACM (2000)

  14. Hryniów, K.: Parallel pattern mining-application of gsp algorithm for graphics processing units. In: Proc. of the International Carpathian Control Conference (ICCC), IEEE, pp. 233–236 (2012)

  15. Nanavati, A.A., Chitrapura, K.P., Joshi, S., Krishnapuram, R.: Mining generalised disjunctive association rules. In: Proc. of the Tenth International Conference on Information and Knowledge Management (CIKM), ACM, New York, NY, USA, pp. 482–489 (2001)

  16. Noyes, H.: Micron automata processor architecture: Reconfigurable and massively parallel automata processing. In: Proc. of the Fifth International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, keynote presentation (2014)

  17. Pei, J., et al.: Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE Trans. Knowl. Data Eng. (TKDE) 16(11), 1424–1440 (2004)

    Article  Google Scholar 

  18. Roy, I., Aluru, S.: Discovering motifs in biological sequences using the micron automata processor. IEEE/ACM Trans. Comput. Biol. Bioinform. 13(1), 99–111 (2016)

    Article  Google Scholar 

  19. Sampaio, M.C., Cardoso, O.H.B., Santos, GPD., Hattori, L.: Mining disjunctive association rules (2008)

  20. Shintani, T., Kitsuregawa, M.: Mining algorithms for sequential patterns in parallel: Hash based approach. In: Proc. of the Second Pacific\(-\)Asia Conference on Knowledge Discovery and Data mining, pp. 283–294 (1998)

  21. Srikant, R., Agrawal, R.: Mining sequential patterns: Generalizations and performance improvements. In: Proc. of the International Conference on Extending Database Technology (EDBT) (1996)

  22. Wang, K., Qi, Y., Fox, J., Stan, M., Skadron, K.: Association rule mining with the micron automata processor. In: Proc. of the IEEE International Parallel and Distributed Processing Symposium (IPDPS) (2015)

  23. Wang, K., Sadredini, E., Skadron, K.: Sequential pattern mining with the micron automata processor. In: Proc. of the ACM International Conference on Computing Frontiers, ACM, New York, NY, USA, CF ’16 (2016a)

  24. Wang, M.H., et al.: Using the automata processor for fast pattern recognition in high energy physics experimentsa proof of concept. Nucl. Instrum. Methods Phys. Res. Sect. A: Accel., Spectrom., Detect. Assoc. Equip. 832, 219–230 (2016b)

    Article  Google Scholar 

  25. Zaki, M.J.: Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. (TKDE) 12(3), 372–390 (2000)

    Article  Google Scholar 

  26. Zaki, M.J.: Parallel sequence mining on shared-memory machines. J. Parallel Distrib. Comput. 61(3), 401–426 (2001a)

    Article  MATH  Google Scholar 

  27. Zaki, M.J.: Spade: an efficient algorithm for mining frequent sequences. Mach. Learn. 42(1–2), 31–60 (2001b)

    Article  MATH  Google Scholar 

  28. Zhang, F., Zhang, Y., Bakos, J.D.: Accelerating frequent itemset mining on graphics processing units. J. Supercomput. 66(1), 94–117 (2013)

    Article  Google Scholar 

  29. Zu, Y. et al.: GPU-based NFA implementation for memory efficient high speed regular expression matching. In: Proc. of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), ACM, pp. 129–140 (2012)

Download references

Acknowledgements

This work was supported in part by the Virginia CIT CRCF program under Grant No. MF14S-021-IT; by C-FAR, one of the six SRC STARnet Centers, sponsored by MARCO and DARPA; NSF Grant EF-1124931; and a grant from Micron Technology.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ke Wang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, K., Sadredini, E. & Skadron, K. Hierarchical Pattern Mining with the Automata Processor. Int J Parallel Prog 46, 376–411 (2018). https://doi.org/10.1007/s10766-017-0489-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-017-0489-y

Keywords

Navigation