Parallel Mining of Frequent Subtree Patterns

Qu, Wenwen; Yan, Da; Guo, Guimu; Wang, Xiaoling; Zou, Lei; Zhou, Yang

doi:10.1007/978-3-030-61133-0_2

Wenwen Qu¹³,
Da Yan ORCID: orcid.org/0000-0002-4653-0408¹²,
Guimu Guo¹²,
Xiaoling Wang¹³,
Lei Zou¹⁴ &
…
Yang Zhou¹⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1281))

Included in the following conference series:

432 Accesses
2 Citations

Abstract

Mining frequent subtree patterns in a tree database (or, forest) is useful in domains such as bioinformatics and mining semi-structured data. We consider the problem of mining embedded subtrees in a database of rooted, labeled, and ordered trees. We compare two existing serial mining algorithms, PrefixTreeSpan and TreeMiner, and adapt them for parallel execution using PrefixFPM, our general-purpose framework for frequent pattern mining that is designed to effectively utilize the CPU cores in a multicore machine. Our experiments show that TreeMiner is faster than its successor PrefixTreeSpan when a limited number of CPU cores are used, as the total mining workloads is smaller; however, PrefixTreeSpan has a much higher speedup ratio and can beat TreeMiner when given enough CPU cores.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Tree Generator. https://github.com/zakimjz/TreeGen
Treebank. http://aiweb.cs.washington.edu/research/projects/xmltk/xmldata/www/repository.html#treebank
Aggarwal, C.C., Han, J. (eds.): Frequent Pattern Mining. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-319-07821-2
Book MATH Google Scholar
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa, S.: Efficient substructure discovery from large semi-structured data. In: Grossman, R.L., Han, J., Kumar, V., Mannila, H., Motwani, R. (eds.) SDM, pp. 158–174. SIAM (2002)
Google Scholar
Cheng, J., Ke, Y., Ng, W., Lu, A.: FG-index: towards verification-free query processing on graph databases. In: SIGMOD, pp. 857–872 (2007)
Google Scholar
Chi, Y., Xia, Y., Yang, Y., Muntz, R.R.: Mining closed and maximal frequent subtrees from databases of labeled rooted trees. IEEE Trans. Knowl. Data Eng. 17(2), 190–202 (2005)
Article Google Scholar
Cooley, R., Mobasher, B., Srivastava, J.: Web mining: information and pattern discovery on the world wide web. In: ICTAI, pp. 558–567. IEEE Computer Society (1997)
Google Scholar
Kudo, T., Maeda, E., Matsumoto, Y.: An application of boosting to graph classification. In: NIPS, pp. 729–736 (2004)
Google Scholar
Pei, J., et al.: PrefixSpan: mining sequential patterns by prefix-projected growth. In: Proceedings of the 17th International Conference on Data Engineering, Heidelberg, Germany, 2–6 April 2001, pp. 215–224 (2001)
Google Scholar
Shapiro, B.A., Zhang, K.: Comparing multiple RNA secondary structures using tree comparisons. Comput. Appl. Biosci. 6(4), 309–318 (1990)
Google Scholar
Wang, C., Hong, M., Pei, J., Zhou, H., Wang, W., Shi, B.: Efficient pattern-growth methods for frequent tree pattern mining. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 441–451. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24775-3_54
Chapter Google Scholar
Yan, D., Qu, W., Guo, G., Wang, X.: PrefixFPM: a parallel framework for general-purpose frequent pattern mining. In: ICDE (2020)
Google Scholar
Zaki, M.J.: Efficiently mining frequent trees in a forest. In: SIGKDD, pp. 71–80 (2002)
Google Scholar
Zou, L., Lu, Y., Zhang, H., Hu, R.: PrefixTreeESpan: a pattern growth algorithm for mining embedded subtrees. In: Aberer, K., Peng, Z., Rundensteiner, E.A., Zhang, Y., Li, X. (eds.) WISE 2006. LNCS, vol. 4255, pp. 499–505. Springer, Heidelberg (2006). https://doi.org/10.1007/11912873_51
Chapter Google Scholar

Download references

Acknowledgments

This work was partially supported by NSF OAC-1755464 and DGE-1723250.

Author information

Authors and Affiliations

The University of Alabama at Birmingham, Birmingham, USA
Da Yan & Guimu Guo
East China Normal University, Shanghai, China
Wenwen Qu & Xiaoling Wang
Peking University, Beijing, China
Lei Zou
Auburn University, Auburn, USA
Yang Zhou

Authors

Wenwen Qu
View author publications
You can also search for this author in PubMed Google Scholar
Da Yan
View author publications
You can also search for this author in PubMed Google Scholar
Guimu Guo
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoling Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lei Zou
View author publications
You can also search for this author in PubMed Google Scholar
Yang Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Da Yan .

Editor information

Editors and Affiliations

University of Technology Sydney, Sydney, NSW, Australia
Lu Qin
The University of New South Wales, Sydney, NSW, Australia
Wenjie Zhang
University of Technology Sydney, Sydney, NSW, Australia
Ying Zhang
The University of New South Wales, Sydney, NSW, Australia
You Peng
National Institute of Informatics, Tokyo, Japan
Hiroyuki Kato
The University of New South Wales, Sydney, NSW, Australia
Wei Wang
Osaka University, Osaka, Japan
Chuan Xiao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qu, W., Yan, D., Guo, G., Wang, X., Zou, L., Zhou, Y. (2020). Parallel Mining of Frequent Subtree Patterns. In: Qin, L., et al. Software Foundations for Data Interoperability and Large Scale Graph Data Analytics. SFDI LSGDA 2020 2020. Communications in Computer and Information Science, vol 1281. Springer, Cham. https://doi.org/10.1007/978-3-030-61133-0_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-61133-0_2
Published: 06 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61132-3
Online ISBN: 978-3-030-61133-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics