skip to main content
10.1145/1014052.1014134acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

A quickstart in frequent structure mining can make a difference

Published: 22 August 2004 Publication History

Abstract

Given a database, structure mining algorithms search for substructures that satisfy constraints such as minimum frequency, minimum confidence, minimum interest and maximum frequency. Examples of substructures include graphs, trees and paths. For these substructures many mining algorithms have been proposed. In order to make graph mining more efficient, we investigate the use of the "quickstart principle", which is based on the fact that these classes of structures are contained in each other, thus allowing for the development of structure mining algorithms that split the search into steps of increasing complexity. We introduce the GrAph/Sequence/Tree extractiON (Gaston) algorithm that implements this idea by searching first for frequent paths, then frequent free trees and finally cyclic graphs. We investigate two alternatives for computing the frequency of structures and present experimental results to relate these alternatives.

References

[1]
T. Asai, H. Arimura, T. Uno, and S. Nakano. Discovering frequent substructures in large unordered trees. Technical Report University of Kyushuu, (216), 2003.
[2]
Y. Chi, Y. Yang, R. R. Muntz. HybridTreeMiner: An Efficient Algorithm for Mining Frequent Rooted Trees and Free Trees Using Canonical Forms. In Proceedings of the 16th International Conference on Scientific and Statistical Database Management (SSDBM), 2004.
[3]
L. Dehaspe, H. Toivonen, and R. D. King. Finding frequent substructures in chemical compounds. In Proceedings of the SIGKDD, pages 30--36, 1998.
[4]
H. Hofer, C. Borgelt, and M. R. Berthold. Large scale mining of molecular fragments with wildcards. In Advances in Intelligent Data Analysis V, pages 380--389, 2003.
[5]
J. Huan, W. Wang, and J. Prins. Efficient mining of frequent subgraphs in the presence of isomorphism. In Proceedings of the ICDM, 2003.
[6]
A. Inokuchi, T. Washio, and H. Motoda. Complete mining of frequent patterns from graphs: Mining graph data. In Machine Learning 50(3), pages 321--354, 2003.
[7]
M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proceedings of the ICDM, pages 313--320, 2001.
[8]
B. D. McKay. Practical graph isomorphism. 30:45--87, 1981.
[9]
S. Nakano and T. Uno. A simple constant time enumeration algorithm for free trees. In IPSJ SIGNotes ALgorithms, number 091--002, 2003.
[10]
National Cancer Institute (NCI). DTP/2D and 3D structural information, http://cactus.nci.nih.gov/ncidb2/download.html. 1999.
[11]
S. Nijssen and J. N. Kok. Efficient discovery of frequent unordered trees. In First International Workshop on Mining Graphs, Trees and Sequences, pages 55--64, 2003.
[12]
L. D. Raedt and S. Kramer. The level-wise version space algorithm and its application to molecular fragment finding. In Proceedings of the Seventeenth IJCAI, pages 853--859, 2001.
[13]
U. Ruckert and S. Kramer. Frequent free tree discovery in graph data. In Special Track on Data Mining, ACM Symposium on Applied Computing, pages 564--570, 2004.
[14]
X. Yan and J. Han. CloseGraph: Mining closed frequent graph patterns. In Proceedings of the SIGKDD, pages 286--295, 2003.
[15]
M. Zaki. Efficiently mining frequent trees in a forest. In Proceedings of the SIGKDD, pages 71--80, 2002.

Cited By

View all
  • (2024)Computational Strategies for Assessing Adverse Outcome Pathways: Hepatic Steatosis as a Case StudyInternational Journal of Molecular Sciences10.3390/ijms25201115425:20(11154)Online publication date: 17-Oct-2024
  • (2024)Frequent Itemset Mining in the Graph Data FieldComputer Science and Application10.12677/CSA.2024.14101714:01(158-172)Online publication date: 2024
  • (2024)A Review on the Impact of Data Representation on Model ExplainabilityACM Computing Surveys10.1145/366217856:10(1-21)Online publication date: 22-Jun-2024
  • Show More Cited By

Index Terms

  1. A quickstart in frequent structure mining can make a difference

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2004
    874 pages
    ISBN:1581138881
    DOI:10.1145/1014052
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 August 2004

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. frequent item sets
    2. graphs
    3. semi-structures
    4. structures

    Qualifiers

    • Article

    Conference

    KDD04

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)24
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Computational Strategies for Assessing Adverse Outcome Pathways: Hepatic Steatosis as a Case StudyInternational Journal of Molecular Sciences10.3390/ijms25201115425:20(11154)Online publication date: 17-Oct-2024
    • (2024)Frequent Itemset Mining in the Graph Data FieldComputer Science and Application10.12677/CSA.2024.14101714:01(158-172)Online publication date: 2024
    • (2024)A Review on the Impact of Data Representation on Model ExplainabilityACM Computing Surveys10.1145/366217856:10(1-21)Online publication date: 22-Jun-2024
    • (2024)Data-Driven Evidence-Based Syntactic Sugar DesignProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639580(1-12)Online publication date: 20-May-2024
    • (2024)TED$^+$: Towards Discovering Top-k Edge-Diversified Patterns in a Graph DatabaseIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.3312566(1-14)Online publication date: 2024
    • (2024)FCSG-Miner: Frequent closed subgraph mining in multi-graphsInformation Sciences10.1016/j.ins.2024.120363(120363)Online publication date: Feb-2024
    • (2024)In Silico Toxicological Protocols Optimization for the Prediction of Toxicity of DrugsBiosystems, Biomedical & Drug Delivery Systems10.1007/978-981-97-2596-0_10(197-223)Online publication date: 14-Jun-2024
    • (2023)TED: Towards Discovering Top-k Edge-Diversified Patterns in a Graph DatabaseProceedings of the ACM on Management of Data10.1145/35887361:1(1-26)Online publication date: 30-May-2023
    • (2023)Extracting Top-$k$ Frequent and Diversified Patterns in Knowledge GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3233594(1-18)Online publication date: 2023
    • (2023)Identification of Structural Alerts by Machine Learning and Their Applications in ToxicologyMachine Learning and Deep Learning in Computational Toxicology10.1007/978-3-031-20730-3_20(479-495)Online publication date: 8-Feb-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media