Leveraging Homomorphisms and Bitmaps to Enable the Mining of Embedded Patterns from Large Data Trees

Wu, Xiaoying; Theodoratos, Dimitri

doi:10.1007/978-3-319-18120-2_1

Xiaoying Wu¹⁷ &
Dimitri Theodoratos¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9049))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1926 Accesses
4 Citations

Abstract

Finding interesting tree patterns hidden in large datasets is an important research area that has many practical applications. Along the years, research has evolved from mining induced patterns to mining embedded patterns. Embedded patterns allow for discovering useful relationships which cannot be captured by induced patterns. Unfortunately, previous contributions have focused almost exclusively on mining patterns from a set of small trees. The problem of mining embedded patterns from large data trees has been neglected. This is mainly due to the complexity of this task related to the problem of unordered tree embedding test being NP-Complete. However, mining embedded patterns from large trees is important for many modern applications that arise naturally and in particular with the explosion of big data.

In this paper, we address the problem of mining unordered frequent embedded tree patterns from large trees. We propose a novel approach that exploits efficient homomorphic pattern matching algorithms to compute pattern support incrementally and avoids the costly enumeration of all pattern matchings required by previous approaches. A further originality of our approach is that matching information of already computed patterns is materialized as bitmaps. This technique not only minimizes the memory consumption but also reduces CPU costs by translating pattern evaluation to bitwise operations. An extensive experimental evaluation shows that our approach not only mines embedded patterns from real datasets up to several orders of magnitude faster than state-of-the-art tree mining algorithms applied to large data trees but also scales well empowering the extraction of patterns from large datasets where previous approaches fail.

The research of this author was supported by the National Natural Science Foundation of China under Grant No. 61202035 and 61272110.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa, S.: Efficient substructure discovery from large semi-structured data. In: SDM (2002)
Google Scholar
Asai, T., Arimura, H., Uno, T., Nakano, S.: Discovering frequent substructures in large unordered trees. In: Grieser, G., Tanaka, Y., Yamamoto, A. (eds.) DS 2003. LNCS (LNAI), vol. 2843, pp. 47–61. Springer, Heidelberg (2003)
Chapter Google Scholar
Bruno, N., Koudas, N., Srivastava, D.: Holistic twig joins: optimal XML pattern matching. In: SIGMOD (2002)
Google Scholar
Chi, Y., Muntz, R.R., Nijssen, S., Kok, J.N.: Frequent subtree mining - an overview. Fundam. Inform. 66(1–2) (2005)
Google Scholar
Chi, Y., Xia, Y., Yang, Y., Muntz, R.R.: Mining closed and maximal frequent subtrees from databases of labeled rooted trees. IEEE Trans. Knowl. Data Eng. 17(2) (2005)
Google Scholar
Chi, Y., Yang, Y., Muntz, R.R.: Canonical forms for labelled trees and their applications in frequent subtree mining. Knowl. Inf. Syst. 8(2) (2005)
Google Scholar
Dries, A., Nijssen, S.: Mining patterns in networks using homomorphism. In: SDM (2012)
Google Scholar
Feng, Z., Hsu, W., Lee, M.-L.: Efficient pattern discovery for semistructured data. In: ICTAI (2005)
Google Scholar
Goethals, B., Hoekx, E., den Bussche, J.V.: Mining tree queries in a graph. In: KDD (2005)
Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD Conference (2000)
Google Scholar
Hido, S., Kawano, H.: Amiot: Induced ordered tree mining in tree-structured databases. In: ICDM (2005)
Google Scholar
Kilpeläinen, P., Mannila, H.: Ordered and unordered tree inclusion. SIAM J. Comput. 24(2), 340–356 (1995)
Article MATH MathSciNet Google Scholar
Miklau, G., Suciu, D.: Containment and equivalence for a fragment of xpath. J. ACM 51(1), 2–45 (2004)
Article MathSciNet Google Scholar
Mozafari, B., Zeng, K., D’Antoni, L., Zaniolo, C.: High-performance complex event processing over hierarchical data. ACM Trans. Database Syst. 38(4), 21 (2013)
Article MathSciNet Google Scholar
Nijssen, S., Kok, J.N.: Efficient discovery of frequent unordered trees (2003)
Google Scholar
Nijssen, S., Kok, J.N.: A quickstart in frequent structure mining can make a difference. In: KDD (2004)
Google Scholar
Ogden, P., Thomas, D.B., Pietzuch, P.: Scalable XML query processing using parallel pushdown transducers. PVLDB 6(14), 1738–1749 (2013)
Google Scholar
Tan, H., Hadzic, F., Dillon, T.S., Chang, E., Feng, L.: Tree model guided candidate generation for mining frequent subtrees from xml documents. TKDD 2(2) (2008)
Google Scholar
Tatikonda, S., Parthasarathy, S., Kurç, T.M.: Trips and tides: new algorithms for tree mining. In: CIKM (2006)
Google Scholar
Termier, A., Rousset, M.-C., Sebag, M.: Treefinder: a first step towards xml data mining. In ICDM (2002)
Google Scholar
Termier, A., Rousset, M.-C., Sebag, M., Ohara, K., Washio, T., Motoda, H.: Dryadeparent, an efficient and robust closed attribute tree mining algorithm. IEEE Trans. Knowl. Data Eng. 20(3) (2008)
Google Scholar
Wang, C., Hong, M., Pei, J., Zhou, H., Wang, W., Shi, B.-L.: Efficient pattern-growth methods for frequent tree pattern mining. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 441–451. Springer, Heidelberg (2004)
Chapter Google Scholar
Wang, K., Liu, H.: Discovering structural association of semistructured data. IEEE Trans. Knowl. Data Eng. 12(3) (2000)
Google Scholar
Wu, X., Souldatos, S., Theodoratos, D., Dalamagas, T., Vassiliou, Y., Sellis, T.K.: Processing and evaluating partial tree pattern queries on xml data. IEEE Trans. Knowl. Data Eng. 24(12), 2244–2259 (2012)
Article Google Scholar
Wu, X., Theodoratos, D., Kementsietsidis, A.: Configuring bitmap materialized views for optimizing xml queries. World Wide Web, pp. 1–26 (2014)
Google Scholar
Wu, X., Theodoratos, D., Wang, W.H.: Answering XML queries using materialized views revisited. In: CIKM (2009)
Google Scholar
Wu, X., Theodoratos, D., Wang, W.H., Sellis, T.: Optimizing XML queries: Bitmapped materialized views vs. indexes. Inf. Syst. 38(6), 863–884 (2013)
Article Google Scholar
Xiao, Y., Yao, J.-F., Li, Z., Dunham, M.H.: Efficient data mining for maximal frequent subtrees. In: ICDM (2003)
Google Scholar
Zaki, M.J.: Efficiently mining frequent embedded unordered trees. Fundam. Inform. 66(1–2) (2005)
Google Scholar
Zaki, M.J.: Efficiently mining frequent trees in a forest: Algorithms and applications. IEEE Trans. Knowl. Data Eng. 17(8) (2005)
Google Scholar
Zaki, M.J., Hsiao. C.-J.: Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans. Knowl. Data Eng. 17(4) (2005)
Google Scholar
Zhu, F., Qu, Q., Lo, D., Yan, X., Han, J., Yu, P.S.: Mining top-k large structural patterns in a massive network. PVLDB 4(11) (2011)
Google Scholar
Zou, L., Lu, Y.S., Zhang, H., Hu, R.: PrefixTreeESpan: a pattern growth algorithm for mining embedded subtrees. In: WISE (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

State Key Laboratory of Software Engineering, Wuhan University, Wuhan, China
Xiaoying Wu
New Jersey Institute of Technology, Newark, USA
Dimitri Theodoratos

Authors

Xiaoying Wu
View author publications
You can also search for this author in PubMed Google Scholar
Dimitri Theodoratos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoying Wu .

Editor information

Editors and Affiliations

Universität München, München, Germany
Matthias Renz
University of Southern California, Los Angeles, USA
Cyrus Shahabi
University of Queensland, Brisbane, Australia
Xiaofang Zhou
Monash University, Clayton, Australia
Muhammad Aamir Cheema

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, X., Theodoratos, D. (2015). Leveraging Homomorphisms and Bitmaps to Enable the Mining of Embedded Patterns from Large Data Trees. In: Renz, M., Shahabi, C., Zhou, X., Cheema, M. (eds) Database Systems for Advanced Applications. DASFAA 2015. Lecture Notes in Computer Science(), vol 9049. Springer, Cham. https://doi.org/10.1007/978-3-319-18120-2_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-18120-2_1
Published: 09 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18119-6
Online ISBN: 978-3-319-18120-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics