Abstract
Graph pattern matching has been widely used in a broad spectrum of real-world applications, and it has been the subject of several investigations, mainly of its importance and use. In this context, different models along with their appropriate algorithms have been proposed. However, in addition to the excessive processing costs, most of the existing models suffer from the failing query problem due to their limitations on finding meaningful matches. Also, in some scenarios, the number of matches may be enormous, making the inspection a daunting task. In this work, we introduce a new model for graph pattern matching, called relaxed graph simulation (RGS), allowing the relaxation of queries to identify more significant matches and to avoid the empty-set answer problem. We then formalize and study the top-k matching problem based on two function classes, relevance and diversity, for ranking the matches with respect to the proposed model. We also formalize and investigate the diversified top-k matching problem, and we propose a diversification function to balance relevance and diversity. Nonetheless, we provide efficient algorithms based on optimization strategies to compute the top-k and the diversified top-k matches according to the RGS model. Our experimental results, on four real datasets, demonstrate both the effectiveness and the efficiency of the proposed approaches.
Similar content being viewed by others
References
Angel A, Koudas N (2011) Efficient diversity-aware search. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data. ACM, pp 781–792
Augsten N, Barbosa D, Böhlen M, Palpanas T (2010) Tasm: top-k approximate subtree matching. In: 2010 IEEE 26th international conference on data engineering (ICDE). IEEE, pp 353–364
Bloom BH (1970) Space/time trade-offs in hash coding with allowable errors. Commun ACM 13(7):422–426
Borodin A, Lee HC, Ye Y (2012) Max-sum diversification, monotone submodular functions and dynamic updates. In: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on principles of database systems. ACM, pp 155–166
Demidova E, Fankhauser P, Zhou X, Nejdl W (2010) Divq: diversification for keyword search over structured databases. In: Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 331–338
Drosou M, Pitoura E (2012) Disc diversity: result diversification based on dissimilarity and coverage. Proc VLDB Endow 6(1):13–24
Fan W (2012) Graph pattern matching revised for social network analysis. In: Proceedings of the 15th international conference on database theory. ACM, pp 8–21
Fan W, Li J, Ma S, Tang N, Wu Y, Wu Y (2010) Graph pattern matching: from intractable to polynomial time. Proc VLDB Endow 3(1–2):264–275
Fan W, Li J, Ma S, Tang N, Wu Y (2011) Adding regular expressions to graph reachability and pattern queries. In: 2011 IEEE 27th international conference on data engineering (ICDE). IEEE, pp 39–50
Fan W, Li J, Wang X, Wu Y (2012) Query preserving graph compression. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data. ACM, pp 157–168
Fan W, Wang X, Wu Y (2013) Diversified top-k graph pattern matching. Proc VLDB Endow 6(13):1510–1521
Fan B, Andersen DG, Kaminsky M, Mitzenmacher MD (2014a) Cuckoo filter: practically better than bloom. In: Proceedings of the 10th ACM international on conference on emerging networking experiments and technologies. ACM, pp 75–88
Fan W, Wang X, Wu Y (2014b) Querying big graphs within bounded resources. In: Proceedings of the 2014 ACM SIGMOD international conference on Management of data. ACM, pp 301–312
Freeman L (2004) The development of social network analysis. A study in the sociology of science, vol 1. Empirical Press, Vancouver
Gao J, Liu P, Kang X, Zhang L, Wang J (2016) PRS: parallel relaxation simulation for massive graphs. Comput J 59(6):848–860
Gollapudi S, Sharma A (2009) An axiomatic approach for result diversification. In: Proceedings of the 18th international conference on world wide web. ACM, pp 381–390
Gou G, Chirkova R (2008) Efficient algorithms for exact ranked twig-pattern matching over graphs. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data. ACM, pp 581–594
Guo L, Shao F, Botev C, Shanmugasundaram J (2003) Xrank: ranked keyword search over xml documents. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data. ACM, pp 16–27
Henzinger MR, Henzinger TA, Kopke PW (1995) Computing simulations on finite and infinite graphs. In: Proceedings of the 36th annual symposium on foundations of computer science, 1995. IEEE, pp 453–462
Ilyas IF, Beskales G, Soliman MA (2008) A survey of top-k query processing techniques in relational database systems. ACM Comput Surv (CSUR) 40(4):11
Kelley BP, Yuan B, Lewitter F, Sharan R, Stockwell BR, Ideker T (2004) Pathblast: a tool for alignment of protein interaction networks. Nucleic Acids Research 32(suppl-2):W83–W88
Kempe D, Kleinberg J, Tardos É (2003) Maximizing the spread of influence through a social network. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 137–146
Kuby MJ (1987) Programming models for facility dispersion: the p-dispersion and maxisum dispersion problems. Geogr Anal 19(4):315–329
Lee J, Han WS, Kasperovics R, Lee JH (2012) An in-depth comparison of subgraph isomorphism algorithms in graph databases. Proc VLDB Endow 6:133–144
Liang Z, Xu M, Teng M, Niu L (2006) Netalign: a web-based tool for comparison of protein interaction networks. Bioinformatics 22(17):2175–2177
Li J, Cao Y, Ma S (2017) Relaxing graph pattern matching with explanations. In: Proceedings of the 2017 ACM on conference on information and knowledge management. ACM, pp 1677–1686
Liu H, Jin C, Yang B et al (2017) Finding top-k shortest paths with diversity. IEEE Trans Knowl Data Eng 30:488–502
Ma S, Cao Y, Fan W, Huai J, Wo T (2011) Capturing topology in graph pattern matching. Proc VLDB Endow 5(4):310–321
Ma S, Cao Y, Fan W, Huai J, Wo T (2014) Strong simulation: capturing topology in graph pattern matching. ACM Trans Database Syst (TODS) 39(1):4
Morris MR, Teevan J, Panovich K (2010) What do people ask their social networks, and why?: a survey study of status message q&a behavior. In: Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, pp 1739–1748
Newman ME (2001) Clustering and preferential attachment in growing networks. Phys Rev E 64(2):025102
Pagh R, Rodler FF (2004) Cuckoo hashing. J Algorithms 51(2):122–144
Ullmann JR (1976) An algorithm for subgraph isomorphism. J ACM (JACM) 23(1):31–42
Vieira MR, Razente HL, Barioni MC, Hadjieleftheriou M, Srivastava D, Traina C, Tsotras VJ (2011) On query result diversification. In: 2011 IEEE 27th international conference on data engineering. IEEE, pp 1163–1174
Wagner A, Duc TT, Ladwig G, Harth A, Studer R (2012) Top-k linked data query processing. In: Extended semantic web conference. Springer, pp 56–71
Wasserman S, Faust K (1994) Social network analysis: methods and applications, vol 8. Cambridge University Press, Cambridge
Yang Z, Fu AWC, Liu R (2016) Diversified top-k subgraph querying in a large graph. In: Proceedings of the 2016 international conference on management of data. ACM, pp 1167–1182
Zhang S, Yang J, Jin W (2010) Sapper: subgraph indexing and approximate matching in large graphs. Proc VLDB Endow 3(1–2):1185–1194
Zou L, Chen L, Lu Y (2007) Top-k subgraph matching query in a large graph. In: Proceedings of the ACM first Ph. D. workshop in CIKM. ACM, pp 139–146
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Habi, A., Effantin, B. & Kheddouci, H. Diversified top-k search with relaxed graph simulation. Soc. Netw. Anal. Min. 9, 55 (2019). https://doi.org/10.1007/s13278-019-0599-1
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-019-0599-1