Mining transactional tree databases under homeomorphism

Haghir Chehreghani, Mostafa; Haghir Chehreghani, Morteza

doi:10.1007/s11227-025-06997-2

Mining transactional tree databases under homeomorphism

Published: 22 February 2025

Volume 81, article number 530, (2025)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Mostafa Haghir Chehreghani¹ &
Morteza Haghir Chehreghani²

93 Accesses
Explore all metrics

Abstract

A key task in mining tree-structured data is finding frequent embedded tree patterns, which has two settings: the transactional setting and the per-occurrence setting. In the transactional setting, which is the focus of this paper, the crucial step is to decide whether a tree pattern is subtree homeomorphic to a database tree. Our extensive study on the properties of real-world tree-structured datasets reveals that while many vertices in a database tree may have the same label, no two vertices on the same path are identically labeled. In this paper, we exploit this property and propose a novel and efficient method for deciding whether a tree pattern is subtree homeomorphic to a database tree. Our algorithm is based on a compact data structure called EMET, which stores all information required for subtree homeomorphism. We propose an efficient algorithm to generate EMETs of larger patterns using EMETs of the smaller ones. Based on the proposed subtree homeomorphism method, we introduce TTM, an effective algorithm for finding frequent tree patterns from rooted ordered trees. We evaluate the efficiency of TTM on several real-world and synthetic datasets and show that it outperforms well-known existing algorithms by an order of magnitude.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Fig. 5

Fig. 6

Transactional Tree Mining

Mining rooted ordered trees under subtree homeomorphism

Article 19 October 2015

Efficiently Discovering Most-Specific Mixed Patterns from Large Data Trees

Data availability

The CSLOGS datasets, along with the tree generator program used to produce synthetic datasets, are publicly available on the internet. The NASA and Prions datasets were provided through email communications (please see the Acknowledgments section).

Code availability

The release of the code is limited by licensing constraints.

Notes

EMET is an abbreviation for EMbedding Encoder for Transactional tree mining.
TTM is an abbreviation for Transactional Tree Miner.

References

Aggarwal CC (2014) Applications of frequent pattern mining, Springer, Cham. pp. 443–467. https://doi.org/10.1007/978-3-319-07821-2_18
Zaki MJ, Aggarwal CC (2006) XRules: an effective algorithm for structural classification of XML data. Mach Learn 62(1–2):137–170
Article MATH Google Scholar
Chalmers R, Almeroth K (2001) Modeling the branching characteristics and efficiency gains of global multicast trees. In: Proceedings of the 20th IEEE International Conference on Computer Communications (INFOCOM), pp. 449–458
Chalmers RC, Member S, Almeroth KC (2003) On the topology of multicast trees. IEEE/ACM Trans Network 11:153–165
Article MATH Google Scholar
Sidhu AS, Dillon TS, Chang E (2006) Protein ontology. In: Ma, Z., Chen, J.Y. (eds.) Database Modeling in Biology: Practices and Challenges, pp. 39–60
Punin JR, Krishnamoorthy MS, Zaki MJ (2001) LOGML: log markup language for web usage mining. In: WEBKDD 2001 - Mining Web Log Data Across All Customers Touch Points, Third International Workshop, San Francisco, CA, USA, August 26, 2001, Revised Papers, pp. 88–112 https://doi.org/10.1007/3-540-45640-6_5
Zaki MJ (2005) Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Trans Knowl Data Eng 17(8):1021–1035
Article MATH Google Scholar
Chehreghani MH, Bruynooghe M (2016) Mining rooted ordered trees under subtree homeomorphism. Data Min Knowl Discov 30(5):1249–1272. https://doi.org/10.1007/s10618-015-0439-5
Article MathSciNet MATH Google Scholar
Tan H, Hadzic F, Dillon TS, Chang E, Feng L (2008) Tree model guided candidate generation for mining frequent subtrees from XML documents. ACM Trans Knowl Discov Data (TKDD) 2(2):43. https://doi.org/10.1145/1376815.1376818
Article MATH Google Scholar
Chehreghani MH, Chehreghani MH (2016) Transactional tree mining. In: Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2016, Riva del Garda, Italy, September 19-23, 2016, Proceedings, Part I, pp. 182–198. https://doi.org/10.1007/978-3-319-46128-1_12
Diestel R (2010) Graph theory, 4th Edition
Zaki MJ (2005) Efficiently mining frequent embedded unordered trees. Fund Inform 66(1–2):33–52
MathSciNet MATH Google Scholar
Wu X, Theodoratos D (2018) Efficient discovery of embedded patterns from large attributed trees. In: Database Systems for Advanced Applications - 23rd International conference, DASFAA 2018, Gold Coast, QLD, Australia, May 21–24, 2018, Proceedings, Part II, pp. 558–576. https://doi.org/10.1007/978-3-319-91458-9_34
Asai T, Abe K, Kawasoe S, Arimura H, Satamoto H, Arikawa S (2002) Efficient substructure discovery from large semi-structured data. In: Proceedings of the Second SIAM International Conference on Data Mining (SDM), pp. 158–174
Chi Y, Yang Y, Muntz RR (2003) Indexing and mining free trees. In: Proceedings of the Third IEEE International Conference on Data Mining (ICDM), pp 509–512
Chi Y, Yang Y, Xia Y, Muntz RR (2004) Cmtreeminer: mining both closed and maximal frequent subtrees. In: Proceedings of the 8th Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pp 63–73
Wang C, Hong M, Pei J, Zhou H, Wang W, Shi B (2004) Efficient pattern-growth methods for frequent tree pattern mining. In: Proceedings of the 8th Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pp 441–451
Tatikonda S, Parthasarathy S, Kurc TM (2006) TRIPS and TIDES: new algorithms for tree mining. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM), pp 455–464
Chehreghani MH, Chehreghani MH, Lucas C, Rahgozar M (2011) OInduced: an efficient algorithm for mining induced patterns from rooted ordered trees. IEEE Trans Syst, Man, Cybernet, Part A 41(5):1013–1025
Article MATH Google Scholar
Chehreghani MH (2011) Efficiently mining unordered trees. In: Proceedings of the 11th IEEE International Conference on Data Mining (ICDM), pp 111–120
Pham HS, Nijssen S, Mens K, Di Nucci D, Molderez T, De Roover C, Fabry J, Zaytsev V (2019) Mining patterns in source code using tree mining algorithms. In: Kralj Novak P, Šmuc T, Džeroski S (eds) Discovery Science. Springer, Cham, pp 471–480
Chapter Google Scholar
Yusuke S, Tetsuhiro M, Takayoshi S, Tomoyuki U, Satoshi M, Tetsuji K (2019) Enumeration of maximally frequent ordered tree patterns with height-constrained variables for trees. Trans Inform Process Society of Japan Math Model Appl (TOM) 12(3):78–88
MATH Google Scholar
Wu X, Theodoratos D, Sellis T (2018) From homomorphisms to embeddings: a novel approach for mining embedded patterns from large tree data. Big Data Res 14:37–53. https://doi.org/10.1016/j.bdr.2018.08.001
Article MATH Google Scholar
Chehreghani MH, Abdessalem T, Bifet A, Bouzbila M (2020) Sampling informative patterns from large single networks. Futur Gener Comput Syst. https://doi.org/10.1016/j.future.2020.01.042
Article MATH Google Scholar
Geerts F, Goethals B, Mielikainen T (2004) Tiling databases. In: Proceedings of the 7th International Conference on Discovery Science (DS), pp. 278–289
Cook DJ, Holder LB, Djoko S (1995) Knowledge discovery from structural data. J Intell Inf Syst 5(3):229–248
Article MATH Google Scholar
Coleman TF, Moré JJ (1984) Estimation of sparse hessian matrices and graph coloring problems. Math Program 28(3):243–270. https://doi.org/10.1007/BF02612334
Article MathSciNet MATH Google Scholar
Peng H, Zhang D (2023) Cfgm: an algorithm for closed frequent graph patterns mining. Inf Sci 625:327–341. https://doi.org/10.1016/j.ins.2022.12.089
Article MATH Google Scholar
Qu W, Yan D, Guo G, Wang X, Zou L, Zhou Y (2020) Parallel mining of frequent subtree patterns. In: Qin, L., Zhang, W., Zhang, Y., Peng, Y., Kato, H., Wang, W., Xiao, C. (eds.) Software Foundations for Data Interoperability and Large Scale Graph Data Analytics - 4th International Workshop, SFDI 2020, and 2nd International Workshop, LSGDA 2020, Held in Conjunction with VLDB 2020, Tokyo, Japan, September 4, 2020, Proceedings. Communications in Computer and Information Science, vol. 1281, pp. 18–32
Yan D, Qu W, Guo G, Wang X (2020) Prefixfpm: A parallel framework for general-purpose frequent pattern mining. In: 36th IEEE International Conference on Data Engineering, ICDE 2020, Dallas, TX, USA, April 20-24, 2020, pp. 1938–1941. https://doi.org/10.1109/ICDE48307.2020.00208
Yan D, Qu W, Guo G, Wang X, Zhou Y (2022) PrefixFPM: a parallel framework for general-purpose mining of frequent and closed patterns. VLDB J 31(2):253–286. https://doi.org/10.1007/S00778-021-00687-0
Article MATH Google Scholar
Petegem CV, Demeyere K, Maertens R, Strijbol N, Wever B, Mesuere B, Dawyndt P (2024) Mining patterns in syntax trees to automate code reviews of student solutions for programming exercises. CoRR https://doi.org/10.48550/ARXIV.2405.01579 arXiv:2405.01579
Hosseininasab A, Hoeve W-J, Cire AA (2024) Memory-efficient sequential pattern mining with hybrid tries. J Mach Learn Res 25(227):1–29
MathSciNet MATH Google Scholar
Ying R, Fu T, Wang A, You J, Wang Y, Leskovec J (2024) Representation Learning for Frequent Subgraph Mining. arXiv:abs/2402.14367
Chehreghani MH (2022) Half a decade of graph convolutional networks. Nat Mach Intell 4(3):192–193. https://doi.org/10.1038/S42256-022-00466-8
Article MATH Google Scholar
Vendrov I, Kiros R, Fidler S, Urtasun R (2016) Order-embeddings of images and language. In: Bengio, Y., LeCun, Y. (eds.) 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2–4, 2016, Conference Track Proceedings. arXiv:abs/1511.06361
Cao H, Mamoulis N, Cheung DW (2005) Mining frequent spatio-temporal sequential patterns. In: Fifth IEEE International Conference on Data Mining (ICDM’05), p 8. https://doi.org/10.1109/ICDM.2005.95
Verhein F (2009) Mining complex spatio-temporal sequence patterns, pp 605–616. https://doi.org/10.1137/1.9781611972795.52
Koutsaki E, Vardakis G, Papadakis N (2023) Spatiotemporal data mining problems and methods. Analytics 2(2):485–508. https://doi.org/10.3390/analytics2020027
Article MATH Google Scholar
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) VLDB’94, Proceedings of 20th International Conference on Very Large Data Bases, September 12-15, 1994, Santiago de Chile, Chile, pp. 487–499. http://www.vldb.org/conf/1994/P487.PDF
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Chen, W., Naughton, J.F., Bernstein, P.A. (eds.) Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, May 16–18, Dallas, Texas, USA, pp. 1–12. https://doi.org/10.1145/342009.335372
Dietz PF (1982) Maintaining order in a linked list. In: Proceedings of the 14th ACM Symposium on Theory of Computing (STOC), pp 122–127
Zaki MJ (2002) Efficiently mining frequent trees in a forest. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp 71–80
Bille P, Gortz IL (2011) The tree inclusion problem: in linear space and faster. ACM Trans Algor 7(3):1–47
Article MathSciNet MATH Google Scholar
Chehreghani MH, Chehreghani MH, Lucas C, Rahgozar M, Ghadimi E (2009) Efficient rule based structural algorithms for classification of tree structured data. Intell Data Anal 13(1):165–188. https://doi.org/10.3233/IDA-2009-0361
Article MATH Google Scholar
Bifet A, Gavaldà R (2008) cMining adaptively frequent closed unlabeled rooted trees in data streams. In: Li, Y., Liu, B., Sarawagi, S. (eds.) Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada, USA, August 24-27, 2008, pp. 34–42. https://doi.org/10.1145/1401890.1401900
Giannella C, Han J, Yan X, Yu PS (2003) Mining frequent patterns in data streams at multiple time granularities. Next Generat Data Min 212:191–212
MATH Google Scholar
Tatikonda S, Parthasarathy S (2009) Mining tree-structured data on multicore systems. Proceed VLDB Endowm (PVLDB) 2(1):694–705
Article MATH Google Scholar

Download references

Acknowledgements

We are thankful to Prof. Mohammed Javeed Zaki for providing the TreeMinerD code, the CSLOGS datasets and the TreeGenerator program, to Dr Henry Tan for providing the MB3Miner-T code, to Professor Jun-Hong Cui for providing the NASA dataset and to Dr Fedja Hadzic for providing the Prions dataset. Parts of this work were performed, while the second author was at Xerox Research Centre Europe, later known as Naver Labs Europe.

Funding

Not applicable.

Author information

Authors and Affiliations

Department of Computer Engineering, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran
Mostafa Haghir Chehreghani
Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg, Gothenburg, Sweden
Morteza Haghir Chehreghani

Authors

Mostafa Haghir Chehreghani
View author publications
You can also search for this author inPubMed Google Scholar
Morteza Haghir Chehreghani
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Mostafa Haghir Chehreghani developed ideas, implemented algorithms, ran experiments, analyzed results, and wrote paper.

Morteza Haghir Chehreghani improved ideas and wrote paper.

Corresponding author

Correspondence to Mostafa Haghir Chehreghani.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Haghir Chehreghani, M., Haghir Chehreghani, M. Mining transactional tree databases under homeomorphism. J Supercomput 81, 530 (2025). https://doi.org/10.1007/s11227-025-06997-2

Download citation

Accepted: 27 January 2025
Published: 22 February 2025
DOI: https://doi.org/10.1007/s11227-025-06997-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining transactional tree databases under homeomorphism

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Transactional Tree Mining

Mining rooted ordered trees under subtree homeomorphism

Efficiently Discovering Most-Specific Mixed Patterns from Large Data Trees

Data availability

Code availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now