research-article

Mining significant graph patterns by leap search

Authors:

Philip S. YuAuthors Info & Claims

SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data

Pages 433 - 444

https://doi.org/10.1145/1376616.1376662

Published: 09 June 2008 Publication History

Abstract

With ever-increasing amounts of graph data from disparate sources, there has been a strong need for exploiting significant graph patterns with user-specified objective functions. Most objective functions are not antimonotonic, which could fail all of frequency-centric graph mining algorithms. In this paper, we give the first comprehensive study on general mining method aiming to find most significant patterns directly. Our new mining framework, called LEAP (Descending Leap Mine), is developed to exploit the correlation between structural similarity and significance similarity in a way that the most significant pattern could be identified quickly by searching dissimilar graph patterns. Two novel concepts, structural leap search and frequency descending mining, are proposed to support leap search in graph pattern space. Our new mining method revealed that the widely adopted branch-and-bound search in data mining literature is indeed not the best, thus sketching a new picture on scalable graph pattern discovery. Empirical results show that LEAP achieves orders of magnitude speedup in comparison with the state-of-the-art method. Furthermore, graph classifiers built on mined patterns outperform the up-to-date graph kernel method in terms of efficiency and accuracy, demonstrating the high promise of such patterns.

References

[1]

R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proc. of SIGMOD, pages 207--216, 1993.]]

Digital Library

[2]

S. Bay and M. Pazzani. Detecting group differences: Mining contrast sets. Data Mining and Knowledge Discovery, 5:213--246, 2001.]]

Digital Library

[3]

B. Bringmann and A. Zimmermann. Tree2 - decision trees for tree structured data. In Proc. of 2005 European Symp. Principle of Data Mining and Knowledge Discovery, pages 46--58, 2005.]]

[4]

C. Chang and C. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~ cjlin/libsvm.]]

Digital Library

[5]

H. Cheng, X. Yan, J. Han, and C. Hsu. Discriminative frequent pattern analysis for e®ective classification. In Proc. of ICDE, pages 716--725, 2007.]]

[6]

J. Cheng, Y. Ke, W. Ng, and A. Lu. Fg-index: towards veri¯cation-free query processing on graph databases. In Proc. of SIGMOD, pages 857--868, 2007.]]

Digital Library

[7]

M. Deshpande, M. Kuramochi, N. Wale, and G. Karypis. Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans. on Knowledge and Data Engineering, 17:1036--1050, 2005.]]

Digital Library

[8]

G. Dong and J. Li. Efficient mining of emerging patterns: Discovering trends and differences. In Proc. of SIGKDD, pages 15--18, 1999.]]

Digital Library

[9]

H. Fröhlich, J. Wegner, F. Sieker, and A. Zell. Optimal assignment kernels for attributed molecular graphs. In Proc. of ICDM, pages 225--232, 2005.]]

Digital Library

[10]

M. Hasan, V. Chaoji, S. Salem, J. Besson, and M. Zaki. ORIGAMI: Mining representative orthogonal graph patterns. In Proc. of ICDM, pages 153--162, 2007.]]

Digital Library

[11]

H. He and A. Singh. Graphrank: Statistical modeling and mining of significant subgraphs in the feature space. In Proc. of ICDM, pages 885--890, 2006.]]

Digital Library

[12]

A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In Proc. of 2000 European Symp. Principle of Data Mining and Knowledge Discovery, pages 13--23, 2000.]]

Digital Library

[13]

M. Kamber and R. Shinghal. Evaluating the interestingness of characteristic rules. In Proc. of SIGKDD, pages 263--266, 1996.]]

[14]

B. Kelley, R. Sharan, R. Karp, E. Sittler, D. Root, B. Stockwell, and T. Ideker. Conserved pathways within bacteria and yeast as revealed by global protein network alignment. Proc Natl Acad Sci U S A, 100:11394--9, 2003.]]

[15]

S. Kramer, L. Raedt, and C. Helma. Molecular feature mining in HIV data. In Proc. of SIGKDD, pages 136--143, 2001.]]

Digital Library

[16]

M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proc. of ICDM, pages 313--320, 2001.]]

Digital Library

[17]

S. Morishita and J. Sese. Traversing itemset lattices with statistical metric pruning. In Proc. of SIGMOD, pages 226 -- 236, 2000.]]

Digital Library

[18]

F. Pennerath and A. Napoli. Mining frequent most informative subgraphs. In the 5th Int. Workshop on Mining and Learning with Graphs, 2007.]]

[19]

G. Piatetsky-Shapiro. Discovery, analysis and presentation of strong rules. Knowledge Discovery in Databases, MIT press, pages 229--248, 1991.]]

[20]

T. Scheffer and S. Wrobel. Finding the most interesting patterns in a database quickly by using sequential sampling. J. of Machine Learning Research, 3:833--862, 2002.]]

Digital Library

[21]

R. Sokal and F. Rohlf. Biometry: the principles and practice of statistics in biological research. W. H. Freeman, New York, 1994.]]

[22]

P. Tan, V. Kumar, and J. Srivastava. Selecting the right interestingness measure for association patterns. In Proc. of SIGKDD, pages 32 -- 41, 2002.]]

Digital Library

[23]

N. Wale and G. Karypis. Comparison of descriptor spaces for chemical compound retrieval and classification. In Proc. of ICDM, pages 678--689, 2006.]]

Digital Library

[24]

G. I. Webb. Opus: An efficient admissible algorithm for unordered search. J. of Artificial Intelligence Research, 3:431--465, 1995.]]

Digital Library

[25]

G. I. Webb. Discovering significant patterns. Machine Learning, 68:1 -- 33, 2007.]]

Digital Library

[26]

X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. In Proc. of ICDM, 2002.]]

Digital Library

[27]

X. Yan and J. Han. CloseGraph: Mining closed frequent graph patterns. In Proc. of SIGKDD, pages 286--295, 2003.]]

Digital Library

[28]

X. Yan, P. S. Yu, and J. Han. Graph indexing: A frequent structure-based approach. In Proc. of SIGMOD, pages 335--346, 2004.]]

Digital Library

Cited By

Yang ZZhang GWu JYang JSheng QXue SZhou CAggarwal CPeng HHu WHancock ELiò P(2024)State of the Art and Potentialities of Graph-level LearningACM Computing Surveys10.1145/369586357:2(1-40)Online publication date: 10-Oct-2024
https://dl.acm.org/doi/10.1145/3695863
Chen TQiu DWu YKhan AKe XGao Y(2024)View-based Explanations for Graph Neural NetworksProceedings of the ACM on Management of Data10.1145/36392952:1(1-27)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3639295
Lee MZhao LAkoglu LChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Descriptive Kernel Convolution Network with Improved Random Walk KernelProceedings of the ACM Web Conference 202410.1145/3589334.3645405(457-468)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645405
Show More Cited By

Index Terms

Mining significant graph patterns by leap search
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Towards proximity pattern mining in large graphs
SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data

Mining graph patterns in large networks is critical to a variety of applications such as malware detection and biological module discovery. However, frequent subgraphs are often ineffective to capture association existing in these applications, due to ...
Mining Frequent Subgraph Patterns from Uncertain Graph Data

In many real applications, graph data is subject to uncertainties due to incompleteness and imprecision of data. Mining such uncertain graph data is semantically different from and computationally more challenging than mining conventional exact graph ...
Mining hybrid sequential patterns and sequential rules

The problem addressed in this paper is to discover the frequently occurred sequential patterns from databases. Basically, the existing studies on finding sequential patterns can be roughly classified into two main categories. In the first category, the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data

June 2008

1396 pages

ISBN:9781605581026

DOI:10.1145/1376616

General Chairs:
Laks V. S. Lakshmanan
University of British Columbia, Canada
,
Raymond T. Ng
University of British Columbia, Canada
,
Dennis Shasha
New York University, USA

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGMOD/PODS '08

Sponsor:

SIGMOD/PODS '08: SIGMOD/PODS '08 - International Conference on Management of Data

June 9 - 12, 2008

Vancouver, Canada

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

234
Total Citations
View Citations
2,046
Total Downloads

Downloads (Last 12 months)66
Downloads (Last 6 weeks)13

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yang ZZhang GWu JYang JSheng QXue SZhou CAggarwal CPeng HHu WHancock ELiò P(2024)State of the Art and Potentialities of Graph-level LearningACM Computing Surveys10.1145/369586357:2(1-40)Online publication date: 10-Oct-2024
https://dl.acm.org/doi/10.1145/3695863
Chen TQiu DWu YKhan AKe XGao Y(2024)View-based Explanations for Graph Neural NetworksProceedings of the ACM on Management of Data10.1145/36392952:1(1-27)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3639295
Lee MZhao LAkoglu LChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Descriptive Kernel Convolution Network with Improved Random Walk KernelProceedings of the ACM Web Conference 202410.1145/3589334.3645405(457-468)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645405
Huang KCui YYe QZhao YZhao XTian YZheng KHu HZhou X(2024)TED$^+$: Towards Discovering Top-k Edge-Diversified Patterns in a Graph DatabaseIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.3312566(1-14)Online publication date: 2024
https://doi.org/10.1109/TKDE.2023.3312566
Serra GNiepert M(2024)L2XGNN: learning to explain graph neural networksMachine Learning10.1007/s10994-024-06576-1Online publication date: 12-Jul-2024
https://doi.org/10.1007/s10994-024-06576-1
Wu YHuang JWu DJensen CLu K(2024)Mining Frequent Geo-Subgraphs in a Knowledge GraphWeb and Big Data10.1007/978-981-97-2303-4_2(16-31)Online publication date: 29-May-2024
https://doi.org/10.1007/978-981-97-2303-4_2
Morris CGeerts FTönshoff JGrohe MKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)WL meet VCProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619459(25275-25302)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619459
Lee ENoh SSeo J(2023)SageProceedings of the VLDB Endowment10.14778/3565838.356584415:13(3897-3910)Online publication date: 20-Jan-2023
https://dl.acm.org/doi/10.14778/3565838.3565844
Huang KHu HYe QTian KZheng BZhou X(2023)TED: Towards Discovering Top-k Edge-Diversified Patterns in a Graph DatabaseProceedings of the ACM on Management of Data10.1145/35887361:1(1-26)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3588736
Zeng JU LYan XLi YHan MTang B(2023)Extracting Top-$k$ Frequent and Diversified Patterns in Knowledge GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3233594(1-18)Online publication date: 2023
https://doi.org/10.1109/TKDE.2022.3233594
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten