Skip to main content
Log in

Diversified top-k search with relaxed graph simulation

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Graph pattern matching has been widely used in a broad spectrum of real-world applications, and it has been the subject of several investigations, mainly of its importance and use. In this context, different models along with their appropriate algorithms have been proposed. However, in addition to the excessive processing costs, most of the existing models suffer from the failing query problem due to their limitations on finding meaningful matches. Also, in some scenarios, the number of matches may be enormous, making the inspection a daunting task. In this work, we introduce a new model for graph pattern matching, called relaxed graph simulation (RGS), allowing the relaxation of queries to identify more significant matches and to avoid the empty-set answer problem. We then formalize and study the top-k matching problem based on two function classes, relevance and diversity, for ranking the matches with respect to the proposed model. We also formalize and investigate the diversified top-k matching problem, and we propose a diversification function to balance relevance and diversity. Nonetheless, we provide efficient algorithms based on optimization strategies to compute the top-k and the diversified top-k matches according to the RGS model. Our experimental results, on four real datasets, demonstrate both the effectiveness and the efficiency of the proposed approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://snap.stanford.edu/

References

  • Angel A, Koudas N (2011) Efficient diversity-aware search. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data. ACM, pp 781–792

  • Augsten N, Barbosa D, Böhlen M, Palpanas T (2010) Tasm: top-k approximate subtree matching. In: 2010 IEEE 26th international conference on data engineering (ICDE). IEEE, pp 353–364

  • Bloom BH (1970) Space/time trade-offs in hash coding with allowable errors. Commun ACM 13(7):422–426

    Article  Google Scholar 

  • Borodin A, Lee HC, Ye Y (2012) Max-sum diversification, monotone submodular functions and dynamic updates. In: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on principles of database systems. ACM, pp 155–166

  • Demidova E, Fankhauser P, Zhou X, Nejdl W (2010) Divq: diversification for keyword search over structured databases. In: Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 331–338

  • Drosou M, Pitoura E (2012) Disc diversity: result diversification based on dissimilarity and coverage. Proc VLDB Endow 6(1):13–24

    Article  Google Scholar 

  • Fan W (2012) Graph pattern matching revised for social network analysis. In: Proceedings of the 15th international conference on database theory. ACM, pp 8–21

  • Fan W, Li J, Ma S, Tang N, Wu Y, Wu Y (2010) Graph pattern matching: from intractable to polynomial time. Proc VLDB Endow 3(1–2):264–275

    Article  Google Scholar 

  • Fan W, Li J, Ma S, Tang N, Wu Y (2011) Adding regular expressions to graph reachability and pattern queries. In: 2011 IEEE 27th international conference on data engineering (ICDE). IEEE, pp 39–50

  • Fan W, Li J, Wang X, Wu Y (2012) Query preserving graph compression. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data. ACM, pp 157–168

  • Fan W, Wang X, Wu Y (2013) Diversified top-k graph pattern matching. Proc VLDB Endow 6(13):1510–1521

    Article  Google Scholar 

  • Fan B, Andersen DG, Kaminsky M, Mitzenmacher MD (2014a) Cuckoo filter: practically better than bloom. In: Proceedings of the 10th ACM international on conference on emerging networking experiments and technologies. ACM, pp 75–88

  • Fan W, Wang X, Wu Y (2014b) Querying big graphs within bounded resources. In: Proceedings of the 2014 ACM SIGMOD international conference on Management of data. ACM, pp 301–312

  • Freeman L (2004) The development of social network analysis. A study in the sociology of science, vol 1. Empirical Press, Vancouver

    Google Scholar 

  • Gao J, Liu P, Kang X, Zhang L, Wang J (2016) PRS: parallel relaxation simulation for massive graphs. Comput J 59(6):848–860

    Article  MathSciNet  Google Scholar 

  • Gollapudi S, Sharma A (2009) An axiomatic approach for result diversification. In: Proceedings of the 18th international conference on world wide web. ACM, pp 381–390

  • Gou G, Chirkova R (2008) Efficient algorithms for exact ranked twig-pattern matching over graphs. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data. ACM, pp 581–594

  • Guo L, Shao F, Botev C, Shanmugasundaram J (2003) Xrank: ranked keyword search over xml documents. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data. ACM, pp 16–27

  • Henzinger MR, Henzinger TA, Kopke PW (1995) Computing simulations on finite and infinite graphs. In: Proceedings of the 36th annual symposium on foundations of computer science, 1995. IEEE, pp 453–462

  • Ilyas IF, Beskales G, Soliman MA (2008) A survey of top-k query processing techniques in relational database systems. ACM Comput Surv (CSUR) 40(4):11

    Article  Google Scholar 

  • Kelley BP, Yuan B, Lewitter F, Sharan R, Stockwell BR, Ideker T (2004) Pathblast: a tool for alignment of protein interaction networks. Nucleic Acids Research 32(suppl-2):W83–W88

    Article  Google Scholar 

  • Kempe D, Kleinberg J, Tardos É (2003) Maximizing the spread of influence through a social network. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 137–146

  • Kuby MJ (1987) Programming models for facility dispersion: the p-dispersion and maxisum dispersion problems. Geogr Anal 19(4):315–329

    Article  Google Scholar 

  • Lee J, Han WS, Kasperovics R, Lee JH (2012) An in-depth comparison of subgraph isomorphism algorithms in graph databases. Proc VLDB Endow 6:133–144

    Article  Google Scholar 

  • Liang Z, Xu M, Teng M, Niu L (2006) Netalign: a web-based tool for comparison of protein interaction networks. Bioinformatics 22(17):2175–2177

    Article  Google Scholar 

  • Li J, Cao Y, Ma S (2017) Relaxing graph pattern matching with explanations. In: Proceedings of the 2017 ACM on conference on information and knowledge management. ACM, pp 1677–1686

  • Liu H, Jin C, Yang B et al (2017) Finding top-k shortest paths with diversity. IEEE Trans Knowl Data Eng 30:488–502

    Article  Google Scholar 

  • Ma S, Cao Y, Fan W, Huai J, Wo T (2011) Capturing topology in graph pattern matching. Proc VLDB Endow 5(4):310–321

    Article  Google Scholar 

  • Ma S, Cao Y, Fan W, Huai J, Wo T (2014) Strong simulation: capturing topology in graph pattern matching. ACM Trans Database Syst (TODS) 39(1):4

    Article  MathSciNet  Google Scholar 

  • Morris MR, Teevan J, Panovich K (2010) What do people ask their social networks, and why?: a survey study of status message q&a behavior. In: Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, pp 1739–1748

  • Newman ME (2001) Clustering and preferential attachment in growing networks. Phys Rev E 64(2):025102

    Article  Google Scholar 

  • Pagh R, Rodler FF (2004) Cuckoo hashing. J Algorithms 51(2):122–144

    Article  MathSciNet  Google Scholar 

  • Ullmann JR (1976) An algorithm for subgraph isomorphism. J ACM (JACM) 23(1):31–42

    Article  MathSciNet  Google Scholar 

  • Vieira MR, Razente HL, Barioni MC, Hadjieleftheriou M, Srivastava D, Traina C, Tsotras VJ (2011) On query result diversification. In: 2011 IEEE 27th international conference on data engineering. IEEE, pp 1163–1174

  • Wagner A, Duc TT, Ladwig G, Harth A, Studer R (2012) Top-k linked data query processing. In: Extended semantic web conference. Springer, pp 56–71

  • Wasserman S, Faust K (1994) Social network analysis: methods and applications, vol 8. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Yang Z, Fu AWC, Liu R (2016) Diversified top-k subgraph querying in a large graph. In: Proceedings of the 2016 international conference on management of data. ACM, pp 1167–1182

  • Zhang S, Yang J, Jin W (2010) Sapper: subgraph indexing and approximate matching in large graphs. Proc VLDB Endow 3(1–2):1185–1194

    Article  Google Scholar 

  • Zou L, Chen L, Lu Y (2007) Top-k subgraph matching query in a large graph. In: Proceedings of the ACM first Ph. D. workshop in CIKM. ACM, pp 139–146

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdelmalek Habi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Habi, A., Effantin, B. & Kheddouci, H. Diversified top-k search with relaxed graph simulation. Soc. Netw. Anal. Min. 9, 55 (2019). https://doi.org/10.1007/s13278-019-0599-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-019-0599-1

Keywords

Navigation