skip to main content
10.1145/3079079.3079084acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Frequent subtree mining on the automata processor: challenges and opportunities

Published:14 June 2017Publication History

ABSTRACT

Frequency counting of complex patterns such as subtrees is more challenging than for simple itemsets and sequences, as the number of possible candidate patterns in a tree is much higher than one-dimensional data structures, with dramatically higher processing times. In this paper, we propose a new and scalable solution for frequent subtree mining (FTM) on the Automata Processor (AP), a new and highly parallel accelerator architecture. We present a multi-stage pruning framework on the AP, called AP-FTM, to reduce the search space of FTM candidates. This achieves up to 353X speedup at the cost of a small reduction in accuracy, on four real-world and synthetic datasets, when compared with PatternMatcher, a practical and exact CPU solution. To provide a fully accurate and still scalable solution, we propose a hybrid method to combine AP-FTM with a CPU exact-matching approach, and achieve up to 262X speedup over PatternMatcher on a challenging database. We also develop a GPU algorithm for FTM, but show that the AP also outperforms this. The results on a synthetic database show the AP advantage grows further with larger datasets.

References

  1. A. Agarwal, B. Xie, I. Vovsha, O. Rambow, and R. Passonneau. 2011. Sentiment analysis of twitter data. In Proceedings of the Workshop on Languages in Social Media. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Bo, K. Wang, J. Fox, and K. Skadron. 2016. Entity Resolution Acceleration using the Automata Processor. In Proceedings of the IEEE International Conference on Big Data. IEEE.Google ScholarGoogle Scholar
  3. Y. Chi and J. Kok. 2001. Frequent subtree mining-an overview. Fundamenta Informaticae 21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. Dlugosch, D. Brown, P. Glendenning, M. Leventhal, and H. Noyes. 2014. An efficient and scalable semiconductor architecture for parallel automata processing. IEEE Transactions on Parallel and Distributed Systems (TPDS) 25, 12.Google ScholarGoogle ScholarCross RefCross Ref
  5. R. Iváncsy and I. Vajk. 2007. Automata Theory Approach for Solving Frequent Pattern Discovery Problems. International Journal of Computer, Electrical, Automation, Control and Information Engineering, World Academy of Science, Engineering and Technology 1, 8.Google ScholarGoogle Scholar
  6. H. Tan, F. Hadzic, T. S Dillon, E. Chang, and L. Feng. 2008. Tree model guided candidate generation for mining frequent subtrees from XML documents. ACM Transactions on Knowledge Discovery from Data (TKDD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Tatikonda and S. Parthasarathy. 2009. Mining tree-structured data on multicore systems. Very Large Data Base (VLDB), ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Tatikonda, S. Parthasarathy, and T. Kurc. 2006. TRIPS and TIDES: new algorithms for tree mining. The Conference on Information and Knowledge Management (CIKM), ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. Tracy II, Y. Fu, I. Roy, E. Jonas, and P. Glendenning. 2016. Towards machine learning on the Automata Processor. In International Conference on High Performance Computing. Springer.Google ScholarGoogle Scholar
  10. J. Wadden, V. Dang, N. Brunelle, T. Tracy II, D. Guo, E. Sadredini, K. Wang, C. Bo, G. Robins, M. Stan, and K. Skadron. 2016. ANMLzoo: a benchmark suite for exploring bottlenecks in Automata Processing engines and architectures. In in Workload Characterization (IISWC). IEEE.Google ScholarGoogle Scholar
  11. C. Wang, M. Hong, J. Pei, H. Zhou, W. Wang, and B. Shi. 2004. Efficient pattern-growth methods for frequent tree pattern mining. In Proc. of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD). Springer.Google ScholarGoogle Scholar
  12. K. Wang, Y. Qi, J. J Fox, M. R Stan, and K. Skadron. 2015. Association rule mining with the Micron Automata Processor. In IEEE International Parallel Distributed Processing Symposium (IPDPS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. K. Wang, E. Sadredini, and K. Skadron. Hierarchical Pattern Mining with the Micron Automata Processor. In International Journal of Parallel Programming (IJPP). 2017.Google ScholarGoogle Scholar
  14. K. Wang, E. Sadredini, and K. Skadron. 2016. Sequential Pattern Mining with the Micron Automata Processor. In ACM International Conference on Computing Frontiers (CF). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. J. Zaki. 2002. Efficiently mining frequent trees in a forest. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. J. Zaki. 2005. Efficiently mining frequent trees in a forest: Algorithms and applications. IEEE Transactions on Knowledge and Data Engineering (TKDE) 17, 8. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Frequent subtree mining on the automata processor: challenges and opportunities

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ICS '17: Proceedings of the International Conference on Supercomputing
        June 2017
        300 pages
        ISBN:9781450350204
        DOI:10.1145/3079079
        • General Chairs:
        • William D. Gropp,
        • Pete Beckman,
        • Program Chairs:
        • Zhiyuan Li,
        • Francisco J. Cazorla

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 14 June 2017

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate584of2,055submissions,28%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader