Abstract
Mining subgraph patterns is an active area of research due to its wide-ranging applications. Examples include frequent subgraph mining, discriminative subgraph mining, statistically significant subgraphs. Existing research has primarily focused on mining all subgraph patterns in the database. However, due to the exponential subgraph search space, the number of patterns mined, typically, is too large for any human-mediated analysis. Consequently, deriving insights from the mined patterns is hard for domain scientists. In addition, subgraph pattern mining is posed in multiple forms: the function that models if a subgraph is a pattern varies based on the application and the database could be over multiple graphs or a single, large graph. In this paper, we ask the following question: Given a subgraph importance function and a budget k, which are the k subgraph patterns that best represent all other patterns of interest? We show that the problem is NP-hard, and propose a generic framework called Resling that adapts to arbitrary subgraph importance functions and generalizable to both transactional graph databases as well as single, large graphs. Resling derives its power by structuring the search space in the form of an edit map, where each subgraph is a node, and two subgraphs are connected if they have an edit distance of one. We rank nodes in the edit map through two random walk based algorithms: vertex-reinforced random walks ( Resling -VR) and negative-reinforced random walks( Resling -NR). Experiments show that Resling-VR is up to 20 times more representative of the pattern space and two orders of magnitude faster than the state-of-the-art techniques. Resling-NR further improves the running time while maintaining comparable or better performance in representative power.
Similar content being viewed by others
Notes
We slightly abuse the term “undirected graph” here. Although the edges are undirected, the sign of the edge weight depends on the direction.
References
Ranu S, Singh AK (2012) Indexing and mining topological patterns for drug discovery. In: EDBT, pp 562–565
Ranu S, Hoang M, Singh A (2013) Mining discriminative subgraphs from global-state networks. In: KDD, pp 509–517
Chaoji V, Ranu S, Rastogi R, Bhatt R (2012) Recommendations to boost content spread in social networks. In: WWW, pp 529–538
Banerjee P, Ranu S, Raghavan S (2014) Inferring uncertain trajectories from partial observations. In: ICDM, pp 30–39
Banerjee P, Yawalkar P, Ranu S (2016) Mantra: a scalable approach to mining temporally anomalous sub-trajectories. In: KDD, pp 1415–1424
Yan X, Han J (2002) Gspan: graph-based substructure pattern mining. In: ICDM, p 721. ISBN: 0-7695-1754-4
Kuramochi M, Karypis G (2005) Finding frequent patterns in a large sparse graph*. Data Min Knowl Discov 11(3):243–271
Elseidy M, Abdelhamid E, Skiadopoulos S, Kalnis P (2014) Grami: frequent subgraph and pattern mining in a single large graph. PVLDB 7(7):517–528
Gurukar S, Ranu S, Ravindran B (2015) Commit: A scalable approach to mining communication motifs from dynamic networks. In: SIGMOD, pp 475–489
Thoma M, Cheng H, Gretton A, Han J, Kriegel H-P, Smola A, Song L, Yu PS, Yan X, Borgwardt K (2009) Near-optimal supervised feature selection among frequent subgraphs. In: SDM 2009, pp 1076–1087
Hasan MA, Zaki MJ (2009) Output space sampling for graph patterns. PVLDB 2(1):730–741
Ranu S, Singh AK (2009) Graphsig: a scalable approach to mining significant subgraphs in large graph databases. In: ICDE
Ranu S, Calhoun BT, Singh AK, Swamidass SJ (2011) Probabilistic substructure mining from small-molecule screens. Mol Inf 30(9):809–815
Ranu S, Singh AK (2009) Mining statistically significant molecular substructures for efficient molecular classification. J Chem Inf Model 49:2537–2550
Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: ICDM
Nijssen S, Kok JN (2004) The Gaston tool for frequent subgraph mining. In: Proceedings of the international workshop on graph-based tools
Yan X, Cheng H, Han J, Yu PS (2008) Mining significant graph patterns by scalable leap search. In: SIGMOD
Jin N, Young C, Wang W (2010) Gaia: graph classification using evolutionary computation. In: SIGMOD
Cheng H, Lo D, Zhou Y, Wang X, Yan X (2009) Identifying bug signatures using discriminative graph mining, In: Proceedings of the eighteenth international symposium on software testing and analysis, pp 141–152
Dutkowski J, Ideker T (2011) Protein networks as logic functions in development and cancer. PLoS Comput Biol 7:e1002180
Hasan MA, Chaoji V, Salem S, Besson J, Zaki MJ (2007) Origami: mining representative orthogonal graph patterns. In: ICDM, pp 153–162
Yan X, Han J (2003) Closegraph: mining closed frequent graph patterns. In: KDD, pp 286–295
Zeng Z, Tung AKH, Wang J, Feng J, Zhou L (2009) Comparing stars: on approximating graph edit distance. PVLDB 2(1):25–36
Zhang S, Yang J, Li S (2009) Ring: an integrated method for frequent representative subgraph mining, In: ICDM, pp 1082–1087
Natarajan D, Ranu S (2016) A scalable and generic framework to mine top-k representative subgraph patterns. In: 2016 IEEE 16th international conference on data mining (ICDM). IEEE, pp 370–379
Metwally A, Agrawal D, El Abbadi A (2005) Efficient computation of frequent and top-k elements in data streams. In: ICDT, pp 398–412
Ranu S, Hoang M, Singh A (2014) Answering top-k representative queries on graph databases. In: SIGMOD, pp 1163–1174
Drosou M, Pitoura E (2012) Disc diversity: result diversification based on dissimilarity and coverage. PVLDB 6(1):13–24
Cornuejols G, Fisher ML, Nemhauser GL (1977) Location of bank accounts to optimize float: an analytic study of exact and approximate algorithms. Manag Sci 23(8):789–810
He H, Singh AK (2006) Closure-tree: an index structure for graph queries. In: ICDE
Page L, Brin S, Motwani R, Winograd T (1998) The pagerank citation ranking: bringing order to the web. In: WWW, pp 161–172
Pemantle R (1992) Vertex-reinforced random walk. Probab Theory Relat Fields 92(1):117–136
Badrinath R, Madhavan CEV (2012) Diversity in ranking using negative reinforcement. In: Proceedings of the ACM SIGKDD workshop on mining data semantics, vol 11, no 1–11, p 6
Mei Q, Guo J, Radev D (2010) Divrank: the interplay of prestige and diversity in information networks. In: KDD
Huan J, Wang W, Prins J, Yang J (2004) Spin: mining maximal frequent subgraphs from graph databases. In: KDD, pp 581–586
Thomas L, Valluri S, Karlapalem K (2006) Margin: maximal frequent subgraph mining. In: ICDM, pp 1097–1101
Krishnan A, Padmanabhan D, Ranu S, Mehta S (2016) Select, link and rank: diversified query expansion and entity ranking using wikipedia. In: International conference on web information systems engineering, pp 157–173
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Natarajan, D., Ranu, S. Resling: a scalable and generic framework to mine top-k representative subgraph patterns. Knowl Inf Syst 54, 123–149 (2018). https://doi.org/10.1007/s10115-017-1129-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-017-1129-y