ABSTRACT
Frequency counting of complex patterns such as subtrees is more challenging than for simple itemsets and sequences, as the number of possible candidate patterns in a tree is much higher than one-dimensional data structures, with dramatically higher processing times. In this paper, we propose a new and scalable solution for frequent subtree mining (FTM) on the Automata Processor (AP), a new and highly parallel accelerator architecture. We present a multi-stage pruning framework on the AP, called AP-FTM, to reduce the search space of FTM candidates. This achieves up to 353X speedup at the cost of a small reduction in accuracy, on four real-world and synthetic datasets, when compared with PatternMatcher, a practical and exact CPU solution. To provide a fully accurate and still scalable solution, we propose a hybrid method to combine AP-FTM with a CPU exact-matching approach, and achieve up to 262X speedup over PatternMatcher on a challenging database. We also develop a GPU algorithm for FTM, but show that the AP also outperforms this. The results on a synthetic database show the AP advantage grows further with larger datasets.
- A. Agarwal, B. Xie, I. Vovsha, O. Rambow, and R. Passonneau. 2011. Sentiment analysis of twitter data. In Proceedings of the Workshop on Languages in Social Media. Association for Computational Linguistics. Google ScholarDigital Library
- C. Bo, K. Wang, J. Fox, and K. Skadron. 2016. Entity Resolution Acceleration using the Automata Processor. In Proceedings of the IEEE International Conference on Big Data. IEEE.Google Scholar
- Y. Chi and J. Kok. 2001. Frequent subtree mining-an overview. Fundamenta Informaticae 21. Google ScholarDigital Library
- P. Dlugosch, D. Brown, P. Glendenning, M. Leventhal, and H. Noyes. 2014. An efficient and scalable semiconductor architecture for parallel automata processing. IEEE Transactions on Parallel and Distributed Systems (TPDS) 25, 12.Google ScholarCross Ref
- R. Iváncsy and I. Vajk. 2007. Automata Theory Approach for Solving Frequent Pattern Discovery Problems. International Journal of Computer, Electrical, Automation, Control and Information Engineering, World Academy of Science, Engineering and Technology 1, 8.Google Scholar
- H. Tan, F. Hadzic, T. S Dillon, E. Chang, and L. Feng. 2008. Tree model guided candidate generation for mining frequent subtrees from XML documents. ACM Transactions on Knowledge Discovery from Data (TKDD). Google ScholarDigital Library
- S. Tatikonda and S. Parthasarathy. 2009. Mining tree-structured data on multicore systems. Very Large Data Base (VLDB), ACM. Google ScholarDigital Library
- S. Tatikonda, S. Parthasarathy, and T. Kurc. 2006. TRIPS and TIDES: new algorithms for tree mining. The Conference on Information and Knowledge Management (CIKM), ACM. Google ScholarDigital Library
- T. Tracy II, Y. Fu, I. Roy, E. Jonas, and P. Glendenning. 2016. Towards machine learning on the Automata Processor. In International Conference on High Performance Computing. Springer.Google Scholar
- J. Wadden, V. Dang, N. Brunelle, T. Tracy II, D. Guo, E. Sadredini, K. Wang, C. Bo, G. Robins, M. Stan, and K. Skadron. 2016. ANMLzoo: a benchmark suite for exploring bottlenecks in Automata Processing engines and architectures. In in Workload Characterization (IISWC). IEEE.Google Scholar
- C. Wang, M. Hong, J. Pei, H. Zhou, W. Wang, and B. Shi. 2004. Efficient pattern-growth methods for frequent tree pattern mining. In Proc. of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD). Springer.Google Scholar
- K. Wang, Y. Qi, J. J Fox, M. R Stan, and K. Skadron. 2015. Association rule mining with the Micron Automata Processor. In IEEE International Parallel Distributed Processing Symposium (IPDPS). Google ScholarDigital Library
- K. Wang, E. Sadredini, and K. Skadron. Hierarchical Pattern Mining with the Micron Automata Processor. In International Journal of Parallel Programming (IJPP). 2017.Google Scholar
- K. Wang, E. Sadredini, and K. Skadron. 2016. Sequential Pattern Mining with the Micron Automata Processor. In ACM International Conference on Computing Frontiers (CF). Google ScholarDigital Library
- M. J. Zaki. 2002. Efficiently mining frequent trees in a forest. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarDigital Library
- M. J. Zaki. 2005. Efficiently mining frequent trees in a forest: Algorithms and applications. IEEE Transactions on Knowledge and Data Engineering (TKDE) 17, 8. Google ScholarDigital Library
Index Terms
- Frequent subtree mining on the automata processor: challenges and opportunities
Recommendations
Frequent Subtree Mining - An Overview
Advances in Mining Graphs, Trees and SequencesMining frequent subtrees from databases of labeled trees is a new research field that has many practical applications in areas such as computer networks, Web mining, bioinformatics, XML document mining, etc. These applications share a requirement for ...
Frequent Subtree Mining - An Overview
Advances in Mining Graphs, Trees and SequencesMining frequent subtrees from databases of labeled trees is a new research field that has many practical applications in areas such as computer networks, Web mining, bioinformatics, XML document mining, etc. These applications share a requirement for ...
Mining Frequent Embedded Subtree from Tree-Like Databases
ICICIS '11: Proceedings of the 2011 International Conference on Internet Computing and Information ServicesMining frequent sub tree from databases of labeled trees is a new research field that has many practical applications in areas such as computer networks, Web mining, bioinformatics, XML document mining, etc. These applications share a requirement for the ...
Comments