Article

A quickstart in frequent structure mining can make a difference

Authors:

Siegfried Nijssen,

Joost N. KokAuthors Info & Claims

KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 647 - 652

https://doi.org/10.1145/1014052.1014134

Published: 22 August 2004 Publication History

Abstract

Given a database, structure mining algorithms search for substructures that satisfy constraints such as minimum frequency, minimum confidence, minimum interest and maximum frequency. Examples of substructures include graphs, trees and paths. For these substructures many mining algorithms have been proposed. In order to make graph mining more efficient, we investigate the use of the "quickstart principle", which is based on the fact that these classes of structures are contained in each other, thus allowing for the development of structure mining algorithms that split the search into steps of increasing complexity. We introduce the GrAph/Sequence/Tree extractiON (Gaston) algorithm that implements this idea by searching first for frequent paths, then frequent free trees and finally cyclic graphs. We investigate two alternatives for computing the frequency of structures and present experimental results to relate these alternatives.

References

[1]

T. Asai, H. Arimura, T. Uno, and S. Nakano. Discovering frequent substructures in large unordered trees. Technical Report University of Kyushuu, (216), 2003.

[2]

Y. Chi, Y. Yang, R. R. Muntz. HybridTreeMiner: An Efficient Algorithm for Mining Frequent Rooted Trees and Free Trees Using Canonical Forms. In Proceedings of the 16th International Conference on Scientific and Statistical Database Management (SSDBM), 2004.

Digital Library

[3]

L. Dehaspe, H. Toivonen, and R. D. King. Finding frequent substructures in chemical compounds. In Proceedings of the SIGKDD, pages 30--36, 1998.

[4]

H. Hofer, C. Borgelt, and M. R. Berthold. Large scale mining of molecular fragments with wildcards. In Advances in Intelligent Data Analysis V, pages 380--389, 2003.

Digital Library

[5]

J. Huan, W. Wang, and J. Prins. Efficient mining of frequent subgraphs in the presence of isomorphism. In Proceedings of the ICDM, 2003.

Digital Library

[6]

A. Inokuchi, T. Washio, and H. Motoda. Complete mining of frequent patterns from graphs: Mining graph data. In Machine Learning 50(3), pages 321--354, 2003.

Digital Library

[7]

M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proceedings of the ICDM, pages 313--320, 2001.

Digital Library

[8]

B. D. McKay. Practical graph isomorphism. 30:45--87, 1981.

[9]

S. Nakano and T. Uno. A simple constant time enumeration algorithm for free trees. In IPSJ SIGNotes ALgorithms, number 091--002, 2003.

[10]

National Cancer Institute (NCI). DTP/2D and 3D structural information, http://cactus.nci.nih.gov/ncidb2/download.html. 1999.

[11]

S. Nijssen and J. N. Kok. Efficient discovery of frequent unordered trees. In First International Workshop on Mining Graphs, Trees and Sequences, pages 55--64, 2003.

[12]

L. D. Raedt and S. Kramer. The level-wise version space algorithm and its application to molecular fragment finding. In Proceedings of the Seventeenth IJCAI, pages 853--859, 2001.

Digital Library

[13]

U. Ruckert and S. Kramer. Frequent free tree discovery in graph data. In Special Track on Data Mining, ACM Symposium on Applied Computing, pages 564--570, 2004.

Digital Library

[14]

X. Yan and J. Han. CloseGraph: Mining closed frequent graph patterns. In Proceedings of the SIGKDD, pages 286--295, 2003.

Digital Library

[15]

M. Zaki. Efficiently mining frequent trees in a forest. In Proceedings of the SIGKDD, pages 71--80, 2002.

Digital Library

Cited By

Ortega-Vallbona RPalomino-Schätzlein MTolosa LBenfenati EEcker GGozalbes RSerrano-Candelas E(2024)Computational Strategies for Assessing Adverse Outcome Pathways: Hepatic Steatosis as a Case StudyInternational Journal of Molecular Sciences10.3390/ijms25201115425:20(11154)Online publication date: 17-Oct-2024
https://doi.org/10.3390/ijms252011154
邱文(2024)Frequent Itemset Mining in the Graph Data FieldComputer Science and Application10.12677/CSA.2024.14101714:01(158-172)Online publication date: 2024
https://doi.org/10.12677/CSA.2024.141017
Haghir Chehreghani M(2024)A Review on the Impact of Data Representation on Model ExplainabilityACM Computing Surveys10.1145/366217856:10(1-21)Online publication date: 22-Jun-2024
https://dl.acm.org/doi/10.1145/3662178
Show More Cited By

Index Terms

A quickstart in frequent structure mining can make a difference
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Frequent pattern mining on stream data using Hadoop CanTree-GTree

The need for knowledge discovery from real-time stream data is continuously increasing nowadays and processing of transactions for mining patterns needs efficient data structures and algorithms. We propose a time-efficient Hadoop CanTree-GTree algorithm,...
A Binary Decision Diagram to discover low threshold support frequent itemsets
DEXA '07: Proceedings of the 18th International Conference on Database and Expert Systems Applications

Discovering association rules that identify relationships among sets of items is an important problem in data mining. Finding frequent itemsets is computationally the most expensive step in association rule discovery and therefore it has grasped ...
High-Efficiency Algorithm for Mining Maximal Frequent Item Sets Based on Matrix
CICN '12: Proceedings of the 2012 Fourth International Conference on Computational Intelligence and Communication Networks

Association Rule Mining is an important data mining technique and Maximal frequent item sets mining is an essential step in the process of Association rule. Here presented is BM-MFI, a new algorithm based on matrix, for mining maximal frequent item ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining

August 2004

874 pages

ISBN:1581138881

DOI:10.1145/1014052

General Chairs:
Won Kim
Cyber Database Solutions
,
Ronny Kohavi
Amazon.com
,
Program Chairs:
Johannes Gehrke
Cornell University
,
William DuMouchel
AT&T Labs Research

Copyright © 2004 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 August 2004

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

KDD04

Sponsor:

KDD04: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 22 - 25, 2004

WA, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

324
Total Citations
View Citations
1,332
Total Downloads

Downloads (Last 12 months)24
Downloads (Last 6 weeks)6

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ortega-Vallbona RPalomino-Schätzlein MTolosa LBenfenati EEcker GGozalbes RSerrano-Candelas E(2024)Computational Strategies for Assessing Adverse Outcome Pathways: Hepatic Steatosis as a Case StudyInternational Journal of Molecular Sciences10.3390/ijms25201115425:20(11154)Online publication date: 17-Oct-2024
https://doi.org/10.3390/ijms252011154
邱文(2024)Frequent Itemset Mining in the Graph Data FieldComputer Science and Application10.12677/CSA.2024.14101714:01(158-172)Online publication date: 2024
https://doi.org/10.12677/CSA.2024.141017
Haghir Chehreghani M(2024)A Review on the Impact of Data Representation on Model ExplainabilityACM Computing Surveys10.1145/366217856:10(1-21)Online publication date: 22-Jun-2024
https://dl.acm.org/doi/10.1145/3662178
OBrien DDyer RNguyen TRajan HRoychoudhury APaiva AAbreu RStorey M(2024)Data-Driven Evidence-Based Syntactic Sugar DesignProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639580(1-12)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639580
Huang KCui YYe QZhao YZhao XTian YZheng KHu HZhou X(2024)TED$^+$: Towards Discovering Top-k Edge-Diversified Patterns in a Graph DatabaseIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.3312566(1-14)Online publication date: 2024
https://doi.org/10.1109/TKDE.2023.3312566
Chen XCai JChen GGan WBroustet A(2024)FCSG-Miner: Frequent closed subgraph mining in multi-graphsInformation Sciences10.1016/j.ins.2024.120363(120363)Online publication date: Feb-2024
https://doi.org/10.1016/j.ins.2024.120363
Talele CTalele DAundhia CShah NKumari MSadhu P(2024)In Silico Toxicological Protocols Optimization for the Prediction of Toxicity of DrugsBiosystems, Biomedical & Drug Delivery Systems10.1007/978-981-97-2596-0_10(197-223)Online publication date: 14-Jun-2024
https://doi.org/10.1007/978-981-97-2596-0_10
Huang KHu HYe QTian KZheng BZhou X(2023)TED: Towards Discovering Top-k Edge-Diversified Patterns in a Graph DatabaseProceedings of the ACM on Management of Data10.1145/35887361:1(1-26)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3588736
Zeng JU LYan XLi YHan MTang B(2023)Extracting Top-$k$ Frequent and Diversified Patterns in Knowledge GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3233594(1-18)Online publication date: 2023
https://doi.org/10.1109/TKDE.2022.3233594
Lou CGu YTang Y(2023)Identification of Structural Alerts by Machine Learning and Their Applications in ToxicologyMachine Learning and Deep Learning in Computational Toxicology10.1007/978-3-031-20730-3_20(479-495)Online publication date: 8-Feb-2023
https://doi.org/10.1007/978-3-031-20730-3_20
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten