Resling: a scalable and generic framework to mine top-k representative subgraph patterns

Natarajan, Dheepikaa; Ranu, Sayan

doi:10.1007/s10115-017-1129-y

Resling: a scalable and generic framework to mine top-k representative subgraph patterns

Regular Paper
Published: 08 November 2017

Volume 54, pages 123–149, (2018)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

461 Accesses
6 Citations
Explore all metrics

Abstract

Mining subgraph patterns is an active area of research due to its wide-ranging applications. Examples include frequent subgraph mining, discriminative subgraph mining, statistically significant subgraphs. Existing research has primarily focused on mining all subgraph patterns in the database. However, due to the exponential subgraph search space, the number of patterns mined, typically, is too large for any human-mediated analysis. Consequently, deriving insights from the mined patterns is hard for domain scientists. In addition, subgraph pattern mining is posed in multiple forms: the function that models if a subgraph is a pattern varies based on the application and the database could be over multiple graphs or a single, large graph. In this paper, we ask the following question: Given a subgraph importance function and a budget k, which are the k subgraph patterns that best represent all other patterns of interest? We show that the problem is NP-hard, and propose a generic framework called Resling that adapts to arbitrary subgraph importance functions and generalizable to both transactional graph databases as well as single, large graphs. Resling derives its power by structuring the search space in the form of an edit map, where each subgraph is a node, and two subgraphs are connected if they have an edit distance of one. We rank nodes in the edit map through two random walk based algorithms: vertex-reinforced random walks ( Resling -VR) and negative-reinforced random walks( Resling -NR). Experiments show that Resling-VR is up to 20 times more representative of the pattern space and two orders of magnitude faster than the state-of-the-art techniques. Resling-NR further improves the running time while maintaining comparable or better performance in representative power.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

We slightly abuse the term “undirected graph” here. Although the edges are undirected, the sign of the edge weight depends on the direction.

References

Ranu S, Singh AK (2012) Indexing and mining topological patterns for drug discovery. In: EDBT, pp 562–565
Ranu S, Hoang M, Singh A (2013) Mining discriminative subgraphs from global-state networks. In: KDD, pp 509–517
Chaoji V, Ranu S, Rastogi R, Bhatt R (2012) Recommendations to boost content spread in social networks. In: WWW, pp 529–538
Banerjee P, Ranu S, Raghavan S (2014) Inferring uncertain trajectories from partial observations. In: ICDM, pp 30–39
Banerjee P, Yawalkar P, Ranu S (2016) Mantra: a scalable approach to mining temporally anomalous sub-trajectories. In: KDD, pp 1415–1424
Yan X, Han J (2002) Gspan: graph-based substructure pattern mining. In: ICDM, p 721. ISBN: 0-7695-1754-4
Kuramochi M, Karypis G (2005) Finding frequent patterns in a large sparse graph*. Data Min Knowl Discov 11(3):243–271
Article MathSciNet Google Scholar
Elseidy M, Abdelhamid E, Skiadopoulos S, Kalnis P (2014) Grami: frequent subgraph and pattern mining in a single large graph. PVLDB 7(7):517–528
Google Scholar
Gurukar S, Ranu S, Ravindran B (2015) Commit: A scalable approach to mining communication motifs from dynamic networks. In: SIGMOD, pp 475–489
Thoma M, Cheng H, Gretton A, Han J, Kriegel H-P, Smola A, Song L, Yu PS, Yan X, Borgwardt K (2009) Near-optimal supervised feature selection among frequent subgraphs. In: SDM 2009, pp 1076–1087
Hasan MA, Zaki MJ (2009) Output space sampling for graph patterns. PVLDB 2(1):730–741
Google Scholar
Ranu S, Singh AK (2009) Graphsig: a scalable approach to mining significant subgraphs in large graph databases. In: ICDE
Ranu S, Calhoun BT, Singh AK, Swamidass SJ (2011) Probabilistic substructure mining from small-molecule screens. Mol Inf 30(9):809–815
Article Google Scholar
Ranu S, Singh AK (2009) Mining statistically significant molecular substructures for efficient molecular classification. J Chem Inf Model 49:2537–2550
Article Google Scholar
Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: ICDM
Nijssen S, Kok JN (2004) The Gaston tool for frequent subgraph mining. In: Proceedings of the international workshop on graph-based tools
Yan X, Cheng H, Han J, Yu PS (2008) Mining significant graph patterns by scalable leap search. In: SIGMOD
Jin N, Young C, Wang W (2010) Gaia: graph classification using evolutionary computation. In: SIGMOD
Cheng H, Lo D, Zhou Y, Wang X, Yan X (2009) Identifying bug signatures using discriminative graph mining, In: Proceedings of the eighteenth international symposium on software testing and analysis, pp 141–152
Dutkowski J, Ideker T (2011) Protein networks as logic functions in development and cancer. PLoS Comput Biol 7:e1002180
Article Google Scholar
Hasan MA, Chaoji V, Salem S, Besson J, Zaki MJ (2007) Origami: mining representative orthogonal graph patterns. In: ICDM, pp 153–162
Yan X, Han J (2003) Closegraph: mining closed frequent graph patterns. In: KDD, pp 286–295
Zeng Z, Tung AKH, Wang J, Feng J, Zhou L (2009) Comparing stars: on approximating graph edit distance. PVLDB 2(1):25–36
Google Scholar
Zhang S, Yang J, Li S (2009) Ring: an integrated method for frequent representative subgraph mining, In: ICDM, pp 1082–1087
Natarajan D, Ranu S (2016) A scalable and generic framework to mine top-k representative subgraph patterns. In: 2016 IEEE 16th international conference on data mining (ICDM). IEEE, pp 370–379
Metwally A, Agrawal D, El Abbadi A (2005) Efficient computation of frequent and top-k elements in data streams. In: ICDT, pp 398–412
Ranu S, Hoang M, Singh A (2014) Answering top-k representative queries on graph databases. In: SIGMOD, pp 1163–1174
Drosou M, Pitoura E (2012) Disc diversity: result diversification based on dissimilarity and coverage. PVLDB 6(1):13–24
Google Scholar
Cornuejols G, Fisher ML, Nemhauser GL (1977) Location of bank accounts to optimize float: an analytic study of exact and approximate algorithms. Manag Sci 23(8):789–810
Article MathSciNet MATH Google Scholar
He H, Singh AK (2006) Closure-tree: an index structure for graph queries. In: ICDE
Page L, Brin S, Motwani R, Winograd T (1998) The pagerank citation ranking: bringing order to the web. In: WWW, pp 161–172
Pemantle R (1992) Vertex-reinforced random walk. Probab Theory Relat Fields 92(1):117–136
Article MathSciNet MATH Google Scholar
Badrinath R, Madhavan CEV (2012) Diversity in ranking using negative reinforcement. In: Proceedings of the ACM SIGKDD workshop on mining data semantics, vol 11, no 1–11, p 6
Mei Q, Guo J, Radev D (2010) Divrank: the interplay of prestige and diversity in information networks. In: KDD
Huan J, Wang W, Prins J, Yang J (2004) Spin: mining maximal frequent subgraphs from graph databases. In: KDD, pp 581–586
Thomas L, Valluri S, Karlapalem K (2006) Margin: maximal frequent subgraph mining. In: ICDM, pp 1097–1101
Krishnan A, Padmanabhan D, Ranu S, Mehta S (2016) Select, link and rank: diversified query expansion and entity ranking using wikipedia. In: International conference on web information systems engineering, pp 157–173

Download references

Author information

Authors and Affiliations

Machine Learning Team, Amazon, Seattle, USA
Dheepikaa Natarajan
Department of CSE, IIT Delhi, New Delhi, India
Sayan Ranu

Authors

Dheepikaa Natarajan
View author publications
You can also search for this author in PubMed Google Scholar
Sayan Ranu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sayan Ranu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Natarajan, D., Ranu, S. Resling: a scalable and generic framework to mine top-k representative subgraph patterns. Knowl Inf Syst 54, 123–149 (2018). https://doi.org/10.1007/s10115-017-1129-y

Download citation

Received: 15 March 2017
Revised: 11 July 2017
Accepted: 23 October 2017
Published: 08 November 2017
Issue Date: January 2018
DOI: https://doi.org/10.1007/s10115-017-1129-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Resling: a scalable and generic framework to mine top-k representative subgraph patterns

Abstract

Access this article

Similar content being viewed by others

Clustering graph data: the roadmap to spectral techniques

Graph based anomaly detection and description: a survey

A survey on visualization approaches for exploring association relationships in graph data

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Resling: a scalable and generic framework to mine top-k representative subgraph patterns

Abstract

Access this article

Similar content being viewed by others

Clustering graph data: the roadmap to spectral techniques

Graph based anomaly detection and description: a survey

A survey on visualization approaches for exploring association relationships in graph data

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation