Abstract
Finding interesting tree patterns hidden in large datasets is an important research area that has many practical applications. Along the years, research has evolved from mining induced patterns to mining embedded patterns. Embedded patterns allow for discovering useful relationships which cannot be captured by induced patterns. Unfortunately, previous contributions have focused almost exclusively on mining patterns from a set of small trees. The problem of mining embedded patterns from large data trees has been neglected. This is mainly due to the complexity of this task related to the problem of unordered tree embedding test being NP-Complete. However, mining embedded patterns from large trees is important for many modern applications that arise naturally and in particular with the explosion of big data.
In this paper, we address the problem of mining unordered frequent embedded tree patterns from large trees. We propose a novel approach that exploits efficient homomorphic pattern matching algorithms to compute pattern support incrementally and avoids the costly enumeration of all pattern matchings required by previous approaches. A further originality of our approach is that matching information of already computed patterns is materialized as bitmaps. This technique not only minimizes the memory consumption but also reduces CPU costs by translating pattern evaluation to bitwise operations. An extensive experimental evaluation shows that our approach not only mines embedded patterns from real datasets up to several orders of magnitude faster than state-of-the-art tree mining algorithms applied to large data trees but also scales well empowering the extraction of patterns from large datasets where previous approaches fail.
The research of this author was supported by the National Natural Science Foundation of China under Grant No. 61202035 and 61272110.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa, S.: Efficient substructure discovery from large semi-structured data. In: SDM (2002)
Asai, T., Arimura, H., Uno, T., Nakano, S.: Discovering frequent substructures in large unordered trees. In: Grieser, G., Tanaka, Y., Yamamoto, A. (eds.) DS 2003. LNCS (LNAI), vol. 2843, pp. 47–61. Springer, Heidelberg (2003)
Bruno, N., Koudas, N., Srivastava, D.: Holistic twig joins: optimal XML pattern matching. In: SIGMOD (2002)
Chi, Y., Muntz, R.R., Nijssen, S., Kok, J.N.: Frequent subtree mining - an overview. Fundam. Inform. 66(1–2) (2005)
Chi, Y., Xia, Y., Yang, Y., Muntz, R.R.: Mining closed and maximal frequent subtrees from databases of labeled rooted trees. IEEE Trans. Knowl. Data Eng. 17(2) (2005)
Chi, Y., Yang, Y., Muntz, R.R.: Canonical forms for labelled trees and their applications in frequent subtree mining. Knowl. Inf. Syst. 8(2) (2005)
Dries, A., Nijssen, S.: Mining patterns in networks using homomorphism. In: SDM (2012)
Feng, Z., Hsu, W., Lee, M.-L.: Efficient pattern discovery for semistructured data. In: ICTAI (2005)
Goethals, B., Hoekx, E., den Bussche, J.V.: Mining tree queries in a graph. In: KDD (2005)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD Conference (2000)
Hido, S., Kawano, H.: Amiot: Induced ordered tree mining in tree-structured databases. In: ICDM (2005)
Kilpeläinen, P., Mannila, H.: Ordered and unordered tree inclusion. SIAM J. Comput. 24(2), 340–356 (1995)
Miklau, G., Suciu, D.: Containment and equivalence for a fragment of xpath. J. ACM 51(1), 2–45 (2004)
Mozafari, B., Zeng, K., D’Antoni, L., Zaniolo, C.: High-performance complex event processing over hierarchical data. ACM Trans. Database Syst. 38(4), 21 (2013)
Nijssen, S., Kok, J.N.: Efficient discovery of frequent unordered trees (2003)
Nijssen, S., Kok, J.N.: A quickstart in frequent structure mining can make a difference. In: KDD (2004)
Ogden, P., Thomas, D.B., Pietzuch, P.: Scalable XML query processing using parallel pushdown transducers. PVLDB 6(14), 1738–1749 (2013)
Tan, H., Hadzic, F., Dillon, T.S., Chang, E., Feng, L.: Tree model guided candidate generation for mining frequent subtrees from xml documents. TKDD 2(2) (2008)
Tatikonda, S., Parthasarathy, S., Kurç, T.M.: Trips and tides: new algorithms for tree mining. In: CIKM (2006)
Termier, A., Rousset, M.-C., Sebag, M.: Treefinder: a first step towards xml data mining. In ICDM (2002)
Termier, A., Rousset, M.-C., Sebag, M., Ohara, K., Washio, T., Motoda, H.: Dryadeparent, an efficient and robust closed attribute tree mining algorithm. IEEE Trans. Knowl. Data Eng. 20(3) (2008)
Wang, C., Hong, M., Pei, J., Zhou, H., Wang, W., Shi, B.-L.: Efficient pattern-growth methods for frequent tree pattern mining. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 441–451. Springer, Heidelberg (2004)
Wang, K., Liu, H.: Discovering structural association of semistructured data. IEEE Trans. Knowl. Data Eng. 12(3) (2000)
Wu, X., Souldatos, S., Theodoratos, D., Dalamagas, T., Vassiliou, Y., Sellis, T.K.: Processing and evaluating partial tree pattern queries on xml data. IEEE Trans. Knowl. Data Eng. 24(12), 2244–2259 (2012)
Wu, X., Theodoratos, D., Kementsietsidis, A.: Configuring bitmap materialized views for optimizing xml queries. World Wide Web, pp. 1–26 (2014)
Wu, X., Theodoratos, D., Wang, W.H.: Answering XML queries using materialized views revisited. In: CIKM (2009)
Wu, X., Theodoratos, D., Wang, W.H., Sellis, T.: Optimizing XML queries: Bitmapped materialized views vs. indexes. Inf. Syst. 38(6), 863–884 (2013)
Xiao, Y., Yao, J.-F., Li, Z., Dunham, M.H.: Efficient data mining for maximal frequent subtrees. In: ICDM (2003)
Zaki, M.J.: Efficiently mining frequent embedded unordered trees. Fundam. Inform. 66(1–2) (2005)
Zaki, M.J.: Efficiently mining frequent trees in a forest: Algorithms and applications. IEEE Trans. Knowl. Data Eng. 17(8) (2005)
Zaki, M.J., Hsiao. C.-J.: Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans. Knowl. Data Eng. 17(4) (2005)
Zhu, F., Qu, Q., Lo, D., Yan, X., Han, J., Yu, P.S.: Mining top-k large structural patterns in a massive network. PVLDB 4(11) (2011)
Zou, L., Lu, Y.S., Zhang, H., Hu, R.: PrefixTreeESpan: a pattern growth algorithm for mining embedded subtrees. In: WISE (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Wu, X., Theodoratos, D. (2015). Leveraging Homomorphisms and Bitmaps to Enable the Mining of Embedded Patterns from Large Data Trees. In: Renz, M., Shahabi, C., Zhou, X., Cheema, M. (eds) Database Systems for Advanced Applications. DASFAA 2015. Lecture Notes in Computer Science(), vol 9049. Springer, Cham. https://doi.org/10.1007/978-3-319-18120-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-18120-2_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18119-6
Online ISBN: 978-3-319-18120-2
eBook Packages: Computer ScienceComputer Science (R0)