Efficiently Discovering Most-Specific Mixed Patterns from Large Data Trees

Wu, Xiaoying; Theodoratos, Dimitri

doi:10.1007/978-3-319-55753-3_18

Xiaoying Wu¹⁸ &
Dimitri Theodoratos¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10177))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

2687 Accesses

Abstract

Discovering informative tree patterns hidden in large datasets is an important research area that has many practical applications. Along the years, research has evolved from mining induced patterns to mining embedded patterns. Mixed patterns allow extracting all the information extracted by embedded or induced patterns but also more detailed information which cannot be extracted by the other two. Unfortunately, the problem of extracting unconstrained mixed patterns from data trees has not been addressed up to now.

In this paper, we address the problem of mining unordered frequent mixed patterns from large trees. We propose a novel approach that non-redundantly extracts most-specific mixed patterns. Our approach utilizes effective pruning techniques to reduce the pattern search space. It exploits efficient homomorphic pattern matching algorithms to compute pattern support incrementally and avoids the costly enumeration of all pattern matchings required by older approaches. An extensive experimental evaluation shows that our approach not only mines mixed patterns from real and synthetic datasets up to several orders of magnitude faster than older state-of-the-art embedded tree mining algorithms applied to large data trees but also scales well empowering the extraction of informative mixed patterns from large datasets for which no previous approaches exist.

X. Wu—The research of this author was supported by the National Natural Science Foundation of China under Grant No. 61202035 and 61272110.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Even though there is documented relationship between diabetes, high LDL and sugar levels and high blood pressure, this specific example dataset is fictitious.
2.
http://xml-benchmark.org.

References

Baca, R., Krátký, M., Ling, T.W., Lu, J.: Optimal and efficient generalized twig pattern processing: a combination of preorder and postorder filterings. VLDB J. 22(3), 369–393 (2013)
Article Google Scholar
Bruno, N., Koudas, N., Srivastava, D.: Holistic twig joins: optimal XML pattern matching. In: SIGMOD (2002)
Google Scholar
Chi, Y., Xia, Y., Yang, Y., Muntz, R.R.: Mining closed and maximal frequent subtrees from databases of labeled rooted trees. IEEE Trans. Knowl. Data Eng. 17(2), 190–202 (2005)
Article Google Scholar
Dries, A., Nijssen, S.: Mining patterns in networks using homomorphism. In: SDM (2012)
Google Scholar
Kibriya, A.M., Ramon, J.: Nearly exact mining of frequent trees in large networks. Data Min. Knowl. Disc. 27(3), 478–504 (2013)
Article MathSciNet MATH Google Scholar
Kilpeläinen, P., Mannila, H.: Ordered and unordered tree inclusion. SIAM J. Comput. 24(2), 340–356 (1995)
Article MathSciNet MATH Google Scholar
Kim, S., Kim, H., Weninger, T., Han, J., Kim, H.D.: Authorship classification: a discriminative syntactic tree mining approach. In: SIGIR, pp. 455–464 (2011)
Google Scholar
Knijf, J.D.: Fat-miner: mining frequent attribute trees. In: SAC, pp. 417–422 (2007)
Google Scholar
Miklau, G., Suciu, D.: Containment and equivalence for a fragment of XPath. J. ACM 51(1), 2–45 (2004)
Article MathSciNet MATH Google Scholar
Tan, H., Hadzic, F., Dillon, T.S., Chang, E., Feng, L.: Tree model guided candidate generation for mining frequent subtrees from XML documents. TKDD 2(2), 9 (2008)
Article Google Scholar
Tatikonda, S., Parthasarathy, S., Kurç, T.M.: TRIPS and TIDES: new algorithms for tree mining. In: CIKM (2006)
Google Scholar
Termier, A., Rousset, M.-C., Sebag, M., TreeFinder: a first step towards XML data mining. In: ICDM (2002)
Google Scholar
Wu, X., Theodoratos, D.: Leveraging homomorphisms and bitmaps to enable the mining of embedded patterns from large data trees. In: Renz, M., Shahabi, C., Zhou, X., Cheema, M.A. (eds.) DASFAA 2015. LNCS, vol. 9049, pp. 3–20. Springer, Cham (2015). doi:10.1007/978-3-319-18120-2_1
Google Scholar
Wu, X., Theodoratos, D., Peng, Z.: Efficiently mining homomorphic patterns from large data trees. In: Navathe, S.B., Wu, W., Shekhar, S., Du, X., Wang, X.S., Xiong, H. (eds.) DASFAA 2016. LNCS, vol. 9642, pp. 180–196. Springer, Cham (2016). doi:10.1007/978-3-319-32025-0_12
Chapter Google Scholar
Wu, X., Theodoratos, D., Wang, W.H., Sellis, T.: Optimizing XML queries: bitmapped materialized views vs. indexes. Inf. Syst. 38(6), 863–884 (2013)
Article Google Scholar
Zaki, M.J.: Efficiently mining frequent embedded unordered trees. Fundamenta Informaticae 66(1–2), 35–52 (2005)
MathSciNet MATH Google Scholar
Zaki, M.J.: Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Trans. Knowl. Data Eng. 17(8), 1021–1035 (2005)
Article Google Scholar
Zhang, S., Du, Z., Wang, J.T.: New techniques for mining frequent patterns in unordered trees. IEEE Trans. Cybern. 45(6), 1113–1125 (2015)
Article Google Scholar

Download references

Author information

Authors and Affiliations

State Key Laboratory of Software Engineering, Wuhan University, Wuhan, China
Xiaoying Wu
New Jersey Institute of Technology, Newark, USA
Dimitri Theodoratos

Authors

Xiaoying Wu
View author publications
You can also search for this author in PubMed Google Scholar
Dimitri Theodoratos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dimitri Theodoratos .

Editor information

Editors and Affiliations

Arizona State University , Tempe - Phoenix, Arizona, USA
Selçuk Candan
Hong Kong University of Science and Tech , Hong Kong, China
Lei Chen
Aalborg University , Aalborg, Denmark
Torben Bach Pedersen
University of New South Wales , Sydney, New South Wales, Australia
Lijun Chang
The University of Queensland , Brisbane, Queensland, Australia
Wen Hua

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, X., Theodoratos, D. (2017). Efficiently Discovering Most-Specific Mixed Patterns from Large Data Trees. In: Candan, S., Chen, L., Pedersen, T., Chang, L., Hua, W. (eds) Database Systems for Advanced Applications. DASFAA 2017. Lecture Notes in Computer Science(), vol 10177. Springer, Cham. https://doi.org/10.1007/978-3-319-55753-3_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-55753-3_18
Published: 22 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55752-6
Online ISBN: 978-3-319-55753-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics