skip to main content
10.1145/1081870.1081908acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Mining closed relational graphs with connectivity constraints

Published: 21 August 2005 Publication History

Abstract

Relational graphs are widely used in modeling large scale networks such as biological networks and social networks. In this kind of graph, connectivity becomes critical in identifying highly associated groups and clusters. In this paper, we investigate the issues of mining closed frequent graphs with connectivity constraints in massive relational graphs where each graph has around 10K nodes and 1M edges. We adopt the concept of edge connectivity and apply the results from graph theory, to speed up the mining process. Two approaches are developed to handle different mining requests: CloseCut, a pattern-growth approach, and splat, a pattern-reduction approach. We have applied these methods in biological datasets and found the discovered patterns interesting.

References

[1]
C. Borgelt and M. Berthold. Mining molecular fragments: Finding relevant substructures of molecules. In Proc. 2002 Int. Conf. on Data Mining (ICDM'02), pages 211--218, 2002.]]
[2]
D. Burdick, M. Calimlim, and J. Gehrke. MAFIA: A maximal frequent itemset algorithm for transactional databases. In Proc. 2001 Int. Conf. Data Engineering (ICDE'01), pages 443--452, 2001.]]
[3]
A. Butte, P. Tamayo, D. Slonim, T. Golub, and I. Kohane. Discovering functional relationships between rna expression and chemotherapeutic susceptibility. In Proc. of the National Academy of Science, volume 97, pages 12182--12186, 2000.]]
[4]
C. Chekuri, A. Goldberg, D. Karger, M. Levine, and C. Stein. Experimental study of minimum cut algorithms. In Proc. of the Eighth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'97), pages 324--333, 1997.]]
[5]
M. Eisen, P. Spellman, P. Brown, and D. Botstein. Cluster analysis and display of genome-wide expression patterns. In Proc. of the National Academy of Science, volume 95, pages 14863--14868, 1998.]]
[6]
G. Flake, S. Lawrence, and C. Giles. Efficient identification of web communities. In Proc. 2000 ACM Int. Conf. Knowledge Discovery and Data Mining (KDD'00), pages 150--160, 2000.]]
[7]
L. Holder, D. Cook, and S. Djoko. Substructure discovery in the subdue system. In Proc. AAAI'94 Workshop on Knowledge Discovery in Databases (KDD'94), pages 169--180, 1994.]]
[8]
J. Huan, W. Wang, D. Bandyopadhyay, J. Snoeyink, J. Prins, and A. Tropsha. Mining spatial motifs from protein structure graphs. In Proc. of the 8th Annual Int. Conf. on Research in Computational Molecular Biology (RECOMB'04), pages 308--315.]]
[9]
A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In Proc. 2000 European Symp. Principle of Data Mining and Knowledge Discovery (PKDD'00), pages 13--23, 1998.]]
[10]
M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proc. 2001 Int. Conf. Data Mining (ICDM'01), pages 313--320, 2001.]]
[11]
T. Mielikainen. Intersecting data to closed sets with constraints. In Proc. of the First ICDM Workshop on Frequent Itemset Mining Implementation (FIMI'03), 2003.]]
[12]
F. Pan, G. Cong, A. Tung, J. Yang, and M. Zaki. Carpenter: Finding closed patterns in long biological datasets. In Proc. 2003 ACM Int. Conf. Knowledge Discovery and Data Mining (KDD'03), 2003.]]
[13]
J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 22(8):888--905, 2000.]]
[14]
V. Spirin and L. Mirny. Protein complexes and functional modules in molecular networks. In Proc. of the National Academy of Science, volume 100, pages 12123--12128, 2003.]]
[15]
M. Stoer and F. Wagner. A simple min-cut algorithm. Journal of the ACM, 44:585--591, 1997.]]
[16]
P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. Dmitrovsky, E. Lander, and T. Golub. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. In Proc. of the National Academy of Science, volume 96, pages 2907--2912, 1999.]]
[17]
N. Vanetik, E. Gudes, and S. E. Shimony. Computing frequent graph patterns from semistructured data. In Proc. 2002 Int. Conf. on Data Mining (ICDM'02), pages 458--465, 2002.]]
[18]
J. Wang, J. Han, and J. Pei. Closet+: Searching for the best strategies for mining frequent closed itemsets. In Proc. 2003 ACM Int. Conf. Knowledge Discovery and Data Mining (KDD'03), pages 236--245, 2003.]]
[19]
T. Washio and H. Motoda. State of the art of graph-based data mining. SIGKDD Explorations, 5:59--68, 2003.]]
[20]
D. West. Introduction to Graph Theory. Prentice Hall, Cambridge, MA, 2000.]]
[21]
Z. Wu and R. Leahy. An optimal graph theoretic approach to data clustering: Theory and its application to image segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 15:1101--1113, 1993.]]
[22]
X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. In Proc. 2002 Int. Conf. on Data Mining (ICDM'02), pages 721--724, 2002.]]
[23]
X. Yan and J. Han. Closegraph: Mining closed frequent graph patterns. In Proc. 2003 ACM Int. Conf. Knowledge Discovery and Data Mining (KDD'03), pages 286--295, 2003.]]
[24]
X. Yan, P. Yu, and J. Han. Graph indexing: A frequent structure-based approach. In Proc. 2004 ACM Int. Conf. Management of Data (SIGMOD'04), pages 335--346, 2004.]]
[25]
M. Zaki and K. Gouda. Fast vertical mining using diffsets. In Proc. 2003 ACM Int. Conf. Knowledge Discovery and Data Mining (KDD'03), pages 326--335, 2003.]]

Cited By

View all

Index Terms

  1. Mining closed relational graphs with connectivity constraints

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
    August 2005
    844 pages
    ISBN:159593135X
    DOI:10.1145/1081870
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 August 2005

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. closed pattern
    2. connectivity
    3. graph

    Qualifiers

    • Article

    Conference

    KDD05

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Maximal Quasi-Cliques Mining in Uncertain GraphsIEEE Transactions on Big Data10.1109/TBDATA.2021.30933559:1(37-50)Online publication date: 1-Feb-2023
    • (2023)A near-optimal approach to edge connectivity-based hierarchical graph decompositionThe VLDB Journal10.1007/s00778-023-00797-x33:1(49-71)Online publication date: 6-May-2023
    • (2022)A near-optimal approach to edge connectivity-based hierarchical graph decompositionProceedings of the VLDB Endowment10.14778/3514061.351406315:6(1146-1158)Online publication date: 22-Jun-2022
    • (2022)Multi-relation Graph SummarizationACM Transactions on Knowledge Discovery from Data10.1145/349456116:5(1-30)Online publication date: 9-Mar-2022
    • (2022)SATMargin: Practical Maximal Frequent Subgraph Mining via Margin Space SamplingProceedings of the ACM Web Conference 202210.1145/3485447.3512196(1495-1505)Online publication date: 25-Apr-2022
    • (2022)Maximal Balanced Signed Biclique Enumeration in Signed Bipartite Graphs2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00187(1887-1899)Online publication date: May-2022
    • (2022)Related Work on CSMs and SolutionsCohesive Subgraph Search Over Large Heterogeneous Information Networks10.1007/978-3-030-97568-5_6(57-60)Online publication date: 23-Feb-2022
    • (2021)Personalized Influential Community Search in Large Networks: A K-ECC-Based ModelDiscrete Dynamics in Nature and Society10.1155/2021/53639462021(1-10)Online publication date: 29-Nov-2021
    • (2021)Frequent Subgraph Mining Algorithms in Static and Temporal Graph-Transaction Settings: A SurveyIEEE Transactions on Big Data10.1109/TBDATA.2021.3072001(1-1)Online publication date: 2021
    • (2021)Motifs in Biological NetworksRecent Advances in Biological Network Analysis10.1007/978-3-030-57173-3_5(101-123)Online publication date: 14-Jan-2021
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media