Mining Graph Patterns

Cheng, Hong; Yan, Xifeng; Han, Jiawei

doi:10.1007/978-3-319-07821-2_13

Hong Cheng³,
Xifeng Yan⁴ &
Jiawei Han⁵

5922 Accesses
13 Citations

Abstract

Graph pattern mining becomes increasingly crucial to applications in a variety of domains including bioinformatics, cheminformatics, social network analysis, computer vision and multimedia. In this chapter, we first examine the existing frequent subgraph mining algorithms and discuss their computational bottleneck. Then we introduce recent studies on mining various types of graph patterns, including significant, representative and dense subgraph patterns. We also discuss the mining tasks in new problem settings such as a graph stream and an uncertain graph model. These new mining algorithms represent the state-of-the-art graph mining techniques: they not only avoid the exponential size of mining result, but also improve the applicability of graph patterns significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

C. C. Aggarwal, Y. Li, P. S. Yu, and R. Jin. On dense pattern mining in graph streams. PVLDB, 3(1):975–984, 2010.
Google Scholar
M. Al Hasan, V. Chaoji, S. Salem, J. Besson, and M. J. Zaki. ORIGAMI: Mining representative orthogonal graph patterns. In Proc. 2007 Int. Conf. Data Mining (ICDM'07), pages 153–162, 2007.
Google Scholar
A. Angel, N. Koudas, N. Sarkas, and D. Srivastava. Dense subgraph maintenance under streaming edge weight updates for real-time story identification. PVLDB, 3(5):574–585, 2012.
Google Scholar
T. Asai, K. Abe, S. Kawasoe, H. Arimura, H. Satamoto, and S. Arikawa. Efficient substructure discovery from large semi-structured data. In Proc. 2002 SIAM Int. Conf. Data Mining (SDM'02), pages 158–174, 2002.
Google Scholar
B. Bahmani, R. Kumar, and S. Vassilvitskii. Densest subgraph in streaming and MapReduce. PVLDB, 5(5):454–465, 2012.
Google Scholar
A. Bifet, G. Holmes, B. Pfahringer, and R. Gavalda. Mining frequent closed graphs on evolving data streams. In KDD, pages 591–599, 2011.
Google Scholar
C. Borgelt and M. R. Berthold. Mining molecular fragments: Finding relevant substructures of molecules. In Proc. 2002 Int. Conf. Data Mining (ICDM'02), pages 211–218, 2002.
Google Scholar
B. Bringmann and S. Nijssen. What is frequent in a single graph? In Proc. 2008 Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD'08), pages 858–863, 2008.
Google Scholar
H. Cheng, X. Yan, J. Han, and C.-W. Hsu. Discriminative frequent pattern analysis for effective classification. In Proc. 2007 Int. Conf. Data Engineering (ICDE'07), pages 716–725, 2007.
Google Scholar
J. Cheng, Y. Ke, A. Fu, J. X. Yu, and L. Zhu. Finding maximal cliques in massive networks by H*-graph. In SIGMOD, pages 447–458, 2010.
Google Scholar
J. Cheng, Y. Ke, S. Chu, and M. T. Ozsu. Efficient core decomposition in massive networks. In ICDE, pages 51–62, 2011.
Google Scholar
J. Cheng, L. Zhu, Y. Ke, and S. Chu. Fast algorithms for Maximal Clique Enumeration with Limited Memory. In KDD, pages 1240–1248, 2012.
Google Scholar
Y. Chi, Y. Xia, Y. Yang, and R. Muntz. Mining closed and maximal frequent subtrees from databases of labeled rooted trees. IEEE Trans. Knowledge and Data Eng., 17:190–202, 2005.
Article Google Scholar
L. Dehaspe, H. Toivonen, and R. King. Finding frequent substructures in chemical compounds. In Proc. 1998 Int. Conf. Knowledge Discovery and Data Mining (KDD'98), pages 30–36, 1998.
Google Scholar
M. Deshpande, M. Kuramochi, N. Wale, and G. Karypis. Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans. on Knowledge and Data Engineering, 17:1036–1050, 2005.
Article Google Scholar
M. Fiedler and C. Borgelt. Support computation for mining frequent subgraphs in a single graph. In Proc. 5th Int. Workshop on Mining and Learning with Graphs (MLG'07), 2007.
Google Scholar
Y. Freund and R. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. In Proc. 2nd European Conf. Computational Learning Theory, pages 23–27, 1995.
Google Scholar
D. Gibson, R. Kumar, and A. Tomkins. Discovering large dense subgraphs in massive graphs. In VLDB, pages 721–732, 2005.
Google Scholar
H. He and A. K. Singh. Efficient algorithms for mining significant substructures in graphs with quality guarantees. In Proc. 2007 Int. Conf. Data Mining (ICDM'07), pages 163–172, 2007.
Google Scholar
L. B. Holder, D. J. Cook, and S. Djoko. Substructure discovery in the subdue system. In Proc. AAAI'94 Workshop Knowledge Discovery in Databases (KDD'94), pages 169–180, 1994.
Google Scholar
J. Huan, W. Wang, and J. Prins. Efficient mining of frequent subgraph in the presence of isomorphism. In Proc. 2003 Int. Conf. Data Mining (ICDM'03), pages 549–552, 2003.
Google Scholar
J. Huan, W. Wang, D. Bandyopadhyay, J. Snoeyink, J. Prins, and A. Tropsha. Mining spatial motifs from protein structure graphs. In Proc. 8th Int. Conf. Research in Computational Molecular Biology (RECOMB), pages 308–315, 2004.
Google Scholar
J. Huan, W. Wang, J. Prins, and J. Yang. SPIN: Mining maximal frequent subgraphs from graph databases. In Proc. 2004 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD'04), pages 581–586, 2004.
Google Scholar
A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In Proc. 2000 European Symp. Principle of Data Mining and Knowledge Discovery (PKDD'00), pages 13–23, 1998.
Google Scholar
R. Jin, C. Wang, D. Polshakov, S. Parthasarathy, and G. Agrawal. Discovering frequent topological structures from graph datasets. In Proc. 2005 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD'05), pages 606–611, 2005.
Google Scholar
R. Jin, L. Liu, and C. C. Aggarwal. Discovering highly reliable subgraphs in uncertain graphs. In KDD, pages 992–1000, 2011.
Google Scholar
M. Koyuturk, A. Grama, and W. Szpankowski. An efficient algorithm for detecting frequent subgraphs in biological networks. Bioinformatics, 20:I200–I207, 2004.
Google Scholar
T. Kudo, E. Maeda, and Y. Matsumoto. An application of boosting to graph classification. In Advances in Neural Information Processing Systems 18 (NIPS'04), 2004.
Google Scholar
M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proc. 2001 Int. Conf. Data Mining (ICDM'01), pages 313–320, 2001.
Google Scholar
M. Kuramochi and G. Karypis. Finding frequent patterns in a large sparse graph. Data Mining and Knowledge Discovery, 11:243–271, 2005.
Article MathSciNet Google Scholar
S. Nijssen and J. Kok. A quickstart in frequent structure mining can make a difference. In Proc. 2004 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD'04), pages 647–652, 2004.
Google Scholar
J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proc. 2001 Int. Conf. Data Engineering (ICDE'01), pages 215–224, 2001.
Google Scholar
S. Ranu and A. K. Singh. GraphSig: A scalable approach to mining significant subgraphs in large graph databases. In Proc. 2009 Int. Conf. Data Engineering (ICDE'09), pages 844–855, 2009.
Google Scholar
H. Saigo, N. Krämer, and K. Tsuda. Partial least squares regression for graph mining. In Proc. 2008 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD'08), pages 578–586, 2008.
Google Scholar
L. Thomas, S. Valluri, and K. Karlapalem. MARGIN: Maximal frequent subgraph mining. In Proc. 2006 Int. Conf. on Data Mining (ICDM'06), pages 1097–1101, 2006.
Google Scholar
C. E. Tsourakakis, F. Bonchi, A. Gionis, F. Gullo, and M. A. Tsiarli. Denser than the densest subgraph: Extracting optimal quasi-cliques with quality guarantees. In KDD, pages 104–112, 2013.
Google Scholar
K. Tsuda. Entire regularization paths for graph data. In Proc. 2007 Int. Conf. Machine Learning (ICML'07), pages 919–926, 2007.
Google Scholar
N. Vanetik, E. Gudes, and S. E. Shimony. Computing frequent graph patterns from semistructured data. In Proc. 2002 Int. Conf. on Data Mining (ICDM'02), pages 458–465, 2002.
Google Scholar
J. Wang and J. Cheng. Truss decomposition in massive networks. PVLDB, 5(9):812–823, 2012.
Google Scholar
C. Wang, W. Wang, J. Pei, Y. Zhu, and B. Shi. Scalable mining of large disk-base graph databases. In Proc. 2004 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD'04), pages 316–325, 2004.
Google Scholar
N. Wang, J. Zhang, K. L. Tan, A. K. H. Tung. On Triangulation-based Dense Neighborhood Graphs Discovery. PVLDB, 4(2):58–68, 2010.
Google Scholar
J. Wang, J. Cheng, and A. Fu. Redundancy-aware maximal cliques Redundancy-aware Maximal Cliques. In KDD, pages 122–130, 2013.
Google Scholar
T. Washio and H. Motoda. State of the art of graph-based data mining. SIGKDD Explorations, 5:59–68, 2003.
Article Google Scholar
J. Xiang, C. Guo, and A. Aboulnaga. Scalable maximum clique computation using MapReduce. In ICDE, pages 74–85, 2013.
Google Scholar
X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. In Proc. 2002 Int. Conf. Data Mining (ICDM'02), pages 721–724, 2002.
Google Scholar
X. Yan and J. Han. CloseGraph: Mining closed frequent graph patterns. In Proc. 2003 ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD'03), pages 286–295, 2003.
Google Scholar
X. Yan and J. Han. Discovery of frequent substructures. In D. Cook and L. Holder (eds.), Mining Graph Data, pages 99–115, John Wiley Sons, 2007.
Google Scholar
X. Yan, P. S. Yu, and J. Han. Graph indexing: A frequent structure-based approach. In Proc. 2004 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'04), pages 335–346, 2004.
Google Scholar
X. Yan, X. J. Zhou, and J. Han. Mining closed relational graphs with connectivity constraints. In Proc. 2005 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD'05), pages 324–333, 2005.
Google Scholar
X. Yan, H. Cheng, J. Han, and P. S. Yu. Mining significant graph patterns by scalable leap search. In Proc. 2008 ACM SIGMOD Int. Conf. on Management of Data (SIGMOD'08), pages 433–444, 2008.
Google Scholar
M. J. Zaki. Efficiently mining frequent trees in a forest. In Proc. 2002 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD'02), pages 71–80, 2002.
Google Scholar
Y. Zhang and S. Parthasarathy. Extracting analyzing and visualizing triangle k-core motifs within networks. In ICDE, pages 1049–1060, 2012.
Google Scholar
Z. Zou, H. Gao, and J. Li. Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics. In KDD, pages 633–642, 2010.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong, China
Hong Cheng
Department of Computer Science, University of California at Santa Barbara, Santa Barbara, USA
Xifeng Yan
Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, USA
Jiawei Han

Authors

Hong Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Xifeng Yan
View author publications
You can also search for this author in PubMed Google Scholar
Jiawei Han
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hong Cheng .

Editor information

Editors and Affiliations

IBM, Yorktown Heights, New York, USA
Charu C. Aggarwal
University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
Jiawei Han

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Cheng, H., Yan, X., Han, J. (2014). Mining Graph Patterns. In: Aggarwal, C., Han, J. (eds) Frequent Pattern Mining. Springer, Cham. https://doi.org/10.1007/978-3-319-07821-2_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-07821-2_13
Published: 30 August 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07820-5
Online ISBN: 978-3-319-07821-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics