skip to main content
10.1145/1376616.1376662acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Mining significant graph patterns by leap search

Published: 09 June 2008 Publication History

Abstract

With ever-increasing amounts of graph data from disparate sources, there has been a strong need for exploiting significant graph patterns with user-specified objective functions. Most objective functions are not antimonotonic, which could fail all of frequency-centric graph mining algorithms. In this paper, we give the first comprehensive study on general mining method aiming to find most significant patterns directly. Our new mining framework, called LEAP (Descending Leap Mine), is developed to exploit the correlation between structural similarity and significance similarity in a way that the most significant pattern could be identified quickly by searching dissimilar graph patterns. Two novel concepts, structural leap search and frequency descending mining, are proposed to support leap search in graph pattern space. Our new mining method revealed that the widely adopted branch-and-bound search in data mining literature is indeed not the best, thus sketching a new picture on scalable graph pattern discovery. Empirical results show that LEAP achieves orders of magnitude speedup in comparison with the state-of-the-art method. Furthermore, graph classifiers built on mined patterns outperform the up-to-date graph kernel method in terms of efficiency and accuracy, demonstrating the high promise of such patterns.

References

[1]
R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proc. of SIGMOD, pages 207--216, 1993.]]
[2]
S. Bay and M. Pazzani. Detecting group differences: Mining contrast sets. Data Mining and Knowledge Discovery, 5:213--246, 2001.]]
[3]
B. Bringmann and A. Zimmermann. Tree2 - decision trees for tree structured data. In Proc. of 2005 European Symp. Principle of Data Mining and Knowledge Discovery, pages 46--58, 2005.]]
[4]
C. Chang and C. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~ cjlin/libsvm.]]
[5]
H. Cheng, X. Yan, J. Han, and C. Hsu. Discriminative frequent pattern analysis for e®ective classification. In Proc. of ICDE, pages 716--725, 2007.]]
[6]
J. Cheng, Y. Ke, W. Ng, and A. Lu. Fg-index: towards veri¯cation-free query processing on graph databases. In Proc. of SIGMOD, pages 857--868, 2007.]]
[7]
M. Deshpande, M. Kuramochi, N. Wale, and G. Karypis. Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans. on Knowledge and Data Engineering, 17:1036--1050, 2005.]]
[8]
G. Dong and J. Li. Efficient mining of emerging patterns: Discovering trends and differences. In Proc. of SIGKDD, pages 15--18, 1999.]]
[9]
H. Fröhlich, J. Wegner, F. Sieker, and A. Zell. Optimal assignment kernels for attributed molecular graphs. In Proc. of ICDM, pages 225--232, 2005.]]
[10]
M. Hasan, V. Chaoji, S. Salem, J. Besson, and M. Zaki. ORIGAMI: Mining representative orthogonal graph patterns. In Proc. of ICDM, pages 153--162, 2007.]]
[11]
H. He and A. Singh. Graphrank: Statistical modeling and mining of significant subgraphs in the feature space. In Proc. of ICDM, pages 885--890, 2006.]]
[12]
A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In Proc. of 2000 European Symp. Principle of Data Mining and Knowledge Discovery, pages 13--23, 2000.]]
[13]
M. Kamber and R. Shinghal. Evaluating the interestingness of characteristic rules. In Proc. of SIGKDD, pages 263--266, 1996.]]
[14]
B. Kelley, R. Sharan, R. Karp, E. Sittler, D. Root, B. Stockwell, and T. Ideker. Conserved pathways within bacteria and yeast as revealed by global protein network alignment. Proc Natl Acad Sci U S A, 100:11394--9, 2003.]]
[15]
S. Kramer, L. Raedt, and C. Helma. Molecular feature mining in HIV data. In Proc. of SIGKDD, pages 136--143, 2001.]]
[16]
M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proc. of ICDM, pages 313--320, 2001.]]
[17]
S. Morishita and J. Sese. Traversing itemset lattices with statistical metric pruning. In Proc. of SIGMOD, pages 226 -- 236, 2000.]]
[18]
F. Pennerath and A. Napoli. Mining frequent most informative subgraphs. In the 5th Int. Workshop on Mining and Learning with Graphs, 2007.]]
[19]
G. Piatetsky-Shapiro. Discovery, analysis and presentation of strong rules. Knowledge Discovery in Databases, MIT press, pages 229--248, 1991.]]
[20]
T. Scheffer and S. Wrobel. Finding the most interesting patterns in a database quickly by using sequential sampling. J. of Machine Learning Research, 3:833--862, 2002.]]
[21]
R. Sokal and F. Rohlf. Biometry: the principles and practice of statistics in biological research. W. H. Freeman, New York, 1994.]]
[22]
P. Tan, V. Kumar, and J. Srivastava. Selecting the right interestingness measure for association patterns. In Proc. of SIGKDD, pages 32 -- 41, 2002.]]
[23]
N. Wale and G. Karypis. Comparison of descriptor spaces for chemical compound retrieval and classification. In Proc. of ICDM, pages 678--689, 2006.]]
[24]
G. I. Webb. Opus: An efficient admissible algorithm for unordered search. J. of Artificial Intelligence Research, 3:431--465, 1995.]]
[25]
G. I. Webb. Discovering significant patterns. Machine Learning, 68:1 -- 33, 2007.]]
[26]
X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. In Proc. of ICDM, 2002.]]
[27]
X. Yan and J. Han. CloseGraph: Mining closed frequent graph patterns. In Proc. of SIGKDD, pages 286--295, 2003.]]
[28]
X. Yan, P. S. Yu, and J. Han. Graph indexing: A frequent structure-based approach. In Proc. of SIGMOD, pages 335--346, 2004.]]

Cited By

View all
  • (2024)State of the Art and Potentialities of Graph-level LearningACM Computing Surveys10.1145/369586357:2(1-40)Online publication date: 10-Oct-2024
  • (2024)View-based Explanations for Graph Neural NetworksProceedings of the ACM on Management of Data10.1145/36392952:1(1-27)Online publication date: 26-Mar-2024
  • (2024)Descriptive Kernel Convolution Network with Improved Random Walk KernelProceedings of the ACM Web Conference 202410.1145/3589334.3645405(457-468)Online publication date: 13-May-2024
  • Show More Cited By

Index Terms

  1. Mining significant graph patterns by leap search

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data
    June 2008
    1396 pages
    ISBN:9781605581026
    DOI:10.1145/1376616
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 June 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. classification
    2. graph
    3. optimality
    4. pattern

    Qualifiers

    • Research-article

    Conference

    SIGMOD/PODS '08
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)66
    • Downloads (Last 6 weeks)13
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)State of the Art and Potentialities of Graph-level LearningACM Computing Surveys10.1145/369586357:2(1-40)Online publication date: 10-Oct-2024
    • (2024)View-based Explanations for Graph Neural NetworksProceedings of the ACM on Management of Data10.1145/36392952:1(1-27)Online publication date: 26-Mar-2024
    • (2024)Descriptive Kernel Convolution Network with Improved Random Walk KernelProceedings of the ACM Web Conference 202410.1145/3589334.3645405(457-468)Online publication date: 13-May-2024
    • (2024)TED$^+$: Towards Discovering Top-k Edge-Diversified Patterns in a Graph DatabaseIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.3312566(1-14)Online publication date: 2024
    • (2024)L2XGNN: learning to explain graph neural networksMachine Learning10.1007/s10994-024-06576-1Online publication date: 12-Jul-2024
    • (2024)Mining Frequent Geo-Subgraphs in a Knowledge GraphWeb and Big Data10.1007/978-981-97-2303-4_2(16-31)Online publication date: 29-May-2024
    • (2023)WL meet VCProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619459(25275-25302)Online publication date: 23-Jul-2023
    • (2023)SageProceedings of the VLDB Endowment10.14778/3565838.356584415:13(3897-3910)Online publication date: 20-Jan-2023
    • (2023)TED: Towards Discovering Top-k Edge-Diversified Patterns in a Graph DatabaseProceedings of the ACM on Management of Data10.1145/35887361:1(1-26)Online publication date: 30-May-2023
    • (2023)Extracting Top-$k$ Frequent and Diversified Patterns in Knowledge GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3233594(1-18)Online publication date: 2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media