AGraP: an algorithm for mining frequent patterns in a single graph using inexact matching

Flores-Garrido, Marisol; Carrasco-Ochoa, Jesús-Ariel; Martínez-Trinidad, José Fco.

doi:10.1007/s10115-014-0747-x

AGraP: an algorithm for mining frequent patterns in a single graph using inexact matching

Regular Paper
Published: 06 May 2014

Volume 44, pages 385–406, (2015)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Marisol Flores-Garrido¹,
Jesús-Ariel Carrasco-Ochoa¹ &
José Fco. Martínez-Trinidad¹

614 Accesses
12 Citations
Explore all metrics

Abstract

Frequent graph mining algorithms commonly use graph isomorphism to identify occurrences of a given pattern, but in the last years, a few works have focused on the case where a pattern could differ from its occurrences, which can be important to analyze noisy data. These later algorithms allow differences in labels and structural differences in edges, but to the best of our knowledge, none of them considers structural differences in vertices. How can we identify occurrences that differ by one (or several) nodes from the pattern they represent? Our work approaches the problem of frequent graph pattern mining with two main characteristics. First, we use inexact matching, allowing structural differences in both edges and vertices. Second, we focus on the problem of mining patterns in a single graph, a problem that has been less explored than the case in which patterns are mined from a graph collection. In this paper, we introduce two similarity functions to compare graphs using inexact matching and an algorithm, AGraP, able to identify patterns that can have structural differences with respect to their occurrences. Our experimental results show that AGraP is able to find patterns that cannot be found by other state-of-the-art algorithms. Additionally, we show that the patterns mined by AGraP are useful in classification tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Graph Clustering via Inexact Patterns

Mining Frequent Graph Patterns Considering Both Different Importance and Rarity of Graph Elements

Mining Graph Patterns

References

Acosta-Mendoza N, Gago-Alonso A, Medina-Pagola JE (2012) Frequent approximate subgraphs as features for graph-based image classification. Knowl-Based Syst 27:381–392
Article Google Scholar
Aggarwal CC, Wang H (2010) Managing and mining graph data. Springer, Berlin
Book Google Scholar
Alguliev R, Aliguliyev R, Ganjaliyev F (2011) Extracting a heterogeneous social network of academic researchers on the web based on information retrieved from multiple sources. Am J Oper Res 2(1):33–38
Article Google Scholar
Anchuri P, Zaki MJ, Barkol O, Bergman R, Felder Y, Golan S, Sityon A (2012) Graph mining for discovering infrastructure patterns in configuration management databases. Knowl Inf Syst 33(3):491–522
Article Google Scholar
Borgelt C, Berthold MR (2002) Mining molecular fragments: finding relevant substructures of molecules. In: Proceedings of the 2002 IEEE international conference on data mining. Maebashi City, Japan, pp 51–58
Bringmann B, Nijssen S (2008) What is frequent in a single graph?. In: Washio T, Suzuki E, Ting K, Inokuchi A (eds) Advances in knowledge discovery and data mining, lecture notes in computer science, Springer, Berlin 5012:858–863
Bunke H, Shearer K (1998) A graph distance metric based on the maximal common subgraph. Pattern Recognit Lett 19(3–4):255–259
Article Google Scholar
Bunke H, Riesen K (2011) Recent advances in graph-based pattern recognition with applications in document analysis. Pattern Recognit 44:1057–1067
Article Google Scholar
Chen C, Yan X, Zhu F, Han J (2007) gApprox: mining frequent approximate patterns from a massive network. In: ICDM, IEEE computer society, pp 445–450
Cook D, Holder L (2007) Mining graph data. Wiley-Interscience, New York
Google Scholar
Dehmer M, Emmert-Streib F (2007) Comparing large graphs efficiently by margins of feature vectors. Appl Math Comput 188(2):1699–1710
Article Google Scholar
Deshpande M, Kuramochi M, Wale N, Karypis G (2005) Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans Knowl Data Eng 17(8):1036–1050
Article Google Scholar
Fiedler M, Borgelt C (2007) Support computation for mining frequent subgraphs in a single graph. In: 5th International workshop on mining and learning with graphs, pp 25–30
Gago-Alonso A, Medina-Pagola J, Carrasco-Ochoa J, Martnez-Trinidad J (2008) Mining frequent connected subgraphs reducing the number of candidates. In: Daelemans W, Goethals B, Morik K (eds) Machine learning and knowledge discovery in databases, lecture notes in computer science, Springer, Berlin, 5211:365–376
Gao X, Xiao B, Tao D, Li X (2010) A survey of graph edit distance. Pattern Anal Appl 13(1):113–129
Article Google Scholar
Gärtner T (2002) Exponential an geometric kernels for graphs. In: NIPS*02 workshop on unreal data, principles of modeling nonvectorial data
Gärtner T, Flach P, Wrobel, S (2003) On graph kernels: hardness results and efficient alternatives. In: Conference on learning theory, pp 129–143
Hellal A, Romdhane LB (2013) Nodar: mining globally distributed substructures from a single labeled graph. J Intell Inf Syst 40(1):1–15
Article Google Scholar
Hidovic D, Pelillo M (2004) Metrics for attributed graphs based on the maximal similarity common subgraph. In: IJPRAI, pp 299–313
Holder LB (1988) Substructure discovery in subdue. Technical Report UILU-ENG-88-2220, Department of Computer Science, University of Illinois, Urbana
Holder L, Cook D, Djoko S (1994) Substructure discovery in the SUBDUE system. In: Proceedings of the AAAI workshop on knowledge discovery in databases, pp 169–180
Huan J, Wang W, Bandyopadhyay D, Snoeyink J, Prins J, Tropsha A (2004), Mining spatial motifs from protein structure graphs. In: Proceedings of the 8th annual international conference on research in computational, molecular biology (RECOMB04), pp 308–315
Jia Y, Zhang J, Huan J (2011) An efficient graph-mining method for complicated and noisy data with real-world applications. Knowl Inf Syst 28(2):423–447
Article Google Scholar
Kondor R, Lafferty J (2002) Diffusion kernels on graphs and other discrete input spaces. In: International conference on machine learning (ICML)
Kuramochi M, Karypis G (2005) Finding frequent patterns in a large sparse graph. Data Min Knowl Discov 11(3):243–271
Article Google Scholar
Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In Proceedings of the international conference data mining (ICDM01), pp 313–320
Kuramochi M, Karypis G (2004) GREW—a scalable frequent subgraph discovery algorithm. In: Proceedings of the fourth IEEE international conference on data mining, pp 439–442
Leskovec J (2009) Stanford large network dataset collection. http://snap.stanford.edu/data/
Lin ZJ, Lyu MR, King I (2012) Matchsim: a novel similarity measure based on maximum neighborhood matching. Knowl Inf Syst 32(1):141–166
Article Google Scholar
López-Presa JL (2009) Efficient algorithms for graph isomorphism testing. PhD Thesis at the Universidad Rey Juan Carlos, Madrid, España
Lu W, Janssen J, Milios E, Japkowicz N, Zhang Y (2007) Node similarity in the citation graph. Knowl Inf Syst 11(1):105–129
Article Google Scholar
Moody J (2001) Peer influence groups: identifying dense clusters in large networks. Soc Netw 23:261–283
Article Google Scholar
National Center for Biotechnology Information. PubChem Database (2004). http://pubchem.ncbi.nlm.nih.gov (Retrieved Nov. 7, 2013)
Nijssen S, Kok JN (2004) A quickstart in frequent structure mining can make a difference. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 647–652
Ramon J, Gärtner T (2003) Expressivity versus efficiency of graph kernels. In: Proceedings of the first international workshop on mining graphs, trees and sequences, pp 65–74
Ranu S, Singh AK (2009) GraphSig: a scalable approach to mining significant subgraphs in large graph databases. In: IEEE 25th international conference on data, engineering, pp 844–855
Riesen K, Bunke, H (2008) IAM graph database repository for graph based pattern recognition and machine learning. In: Proceedings of the international workshop on structural syntatctic and statistical pattern recognition, lecture notes in computer science pp 287–297
Saeedy ME, Kalnis P (2011) GraMi: generalized frequent pattern mining in a single large graph. Technical Report at the Division of Mathematical and Computer Sciences and Engineering, King Abdullah University of Science and Technology
Sanfeliu A, Fu KS (1983) A distance measure between attributed relational graphs for pattern recognition. IEEE Trans Syst Man Cybern 13(3):353–362
Article Google Scholar
Shervashidze N, Vishwanathan SVN, Petri T, Mehlhorn K, Borgwardt K (2009) Efficient graphlet kernels for large graph comparison. In: Proceedings of international conference on artificial intelligence and statistics
Smola AJ, Kondor IR (2003) Kernels and regularization on graphs. In: Schölkopf B, Warmuth MK (eds) Proceedings of the annual conference computational learning theory, pp 144–158
Tan PN, Steinbach M, Kumar V (2006) Introduction to data mining. Pearson International Edition, Pearson Addison Wesley
Vishwanathan SVN, Borgwardt K, Kondor I, Schraudolph N (2008) Graph kernels. J Mach Learn Res 9:1–41
Google Scholar
Wale N, Watson IA, Karypis G (2008) Comparison of descriptor spaces for chemical compound retrieval and classification. Knowl Inf Syst 14(3):347–375
Article Google Scholar
Xiao Y, Dong H, Wu W, Xiong M, Wang W, Shi B (2008) Structure-based graph distance measures of high degree of precision. Pattern Recognit 41(12):3547–3561
Article Google Scholar
Yan X, Han J (2002) gSpan: graph-based substructure pattern mining. In: Proceedings of the 2002 IEEE international conference on data mining (ICDM ’02)
Yan X, Yu PS, Han J (2004) Graph indexing: a frequent structure-based approach. In: Proceedings of the SIGMOD conference, pp 335–346
Zhang S, Yang J, Cheedella V (2007) Monkey: approximate graph mining based on spanning trees. In: Proceedings of the IEEE 23rd international conference on data engineering (ICDE 2007), pp 1247–1249
Zou Z, Li J, Gao H, Zhang S (2009) Frequent subgraph pattern mining on uncertain graph data. In: Proceedings of the 18th ACM conference on information and knowledge management (CIKM ’09), pp 583–592

Download references

Acknowledgments

This work was partly supported by the National Council of Science and Technology of Mexico (CONACyT) through the project grants CB2008-106443 and CB2008-106366; and the scholarship grant 256879.

Author information

Authors and Affiliations

Instituto Nacional de Astrofísica, Óptica y Electrónica, Luis Enrique Erro No.1, Sta. Maria Tonantzintla, ZIP 72840, Puebla, México
Marisol Flores-Garrido, Jesús-Ariel Carrasco-Ochoa & José Fco. Martínez-Trinidad

Authors

Marisol Flores-Garrido
View author publications
You can also search for this author in PubMed Google Scholar
Jesús-Ariel Carrasco-Ochoa
View author publications
You can also search for this author in PubMed Google Scholar
José Fco. Martínez-Trinidad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marisol Flores-Garrido.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Flores-Garrido, M., Carrasco-Ochoa, JA. & Martínez-Trinidad, J.F. AGraP: an algorithm for mining frequent patterns in a single graph using inexact matching. Knowl Inf Syst 44, 385–406 (2015). https://doi.org/10.1007/s10115-014-0747-x

Download citation

Received: 13 June 2013
Accepted: 12 April 2014
Published: 06 May 2014
Issue Date: August 2015
DOI: https://doi.org/10.1007/s10115-014-0747-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AGraP: an algorithm for mining frequent patterns in a single graph using inexact matching

Abstract

Access this article

Similar content being viewed by others

Graph Clustering via Inexact Patterns

Mining Frequent Graph Patterns Considering Both Different Importance and Rarity of Graph Elements

Mining Graph Patterns

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

AGraP: an algorithm for mining frequent patterns in a single graph using inexact matching

Abstract

Access this article

Similar content being viewed by others

Graph Clustering via Inexact Patterns

Mining Frequent Graph Patterns Considering Both Different Importance and Rarity of Graph Elements

Mining Graph Patterns

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation