Skip to main content
Log in

Most similar maximal clique query on large graphs

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

This paper studies the most similar maximal clique query (MSMCQ). Given a graph G and a set of nodes Q, MSMCQ is to find the maximal clique of G having the largest similarity with Q. MSMCQ has many real applications including advertising industry, public security, task crowdsourcing and social network, etc. MSMCQ can be studied as a special case of the general set similarity query (SSQ). However, the MCs of G has several specialties from the general sets. Based on the specialties of MCs, we propose a novel index, namely MCIndex. MCIndex outperforms the state-of-the-art SSQ method significantly in terms of the number of candidates and the query time. Specifically, we first construct an inverted index I for all the MCs of G. Since the MCs in a posting list often have a lot of overlaps, MCIndex selects some pivots to cluster the MCs with a small radius. Given a query Q, we compute the distance from the pivots to Q. The clusters of the pivots assured not answer can be pruned by our distance based pruning rule. Since it is NP-hard to construct a minimum MCIndex, we propose to construct a minimal MCIndex on I(v) with an approximation ratio 1 + ln |I(v)|. Since the MCs have properties that are inherent of graph structure, we further propose a SIndex within each cluster of a MCIndex and a structure based pruning rule. SIndex can significantly reduce the number of candidates. Since the sizes of intersections between Q and many MCs need to be computed during the query evaluation, we also propose a binary representation of MCs to improve the efficiency of the intersection size computation. Our extensive experiments confirm the effectiveness and efficiency of our proposed techniques on several real-world datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Hamann M, Röhrs E, Wagner D. Local community detection based on small cliques. Algorithms, 2017, 10(3): 1–22

    Article  MathSciNet  Google Scholar 

  2. Cui W, Xiao Y, Wang H, Wang W. Local search of communities in large graphs. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 2014, 991–1002

  3. Uno T. An efficient algorithm for solving pseudo clique enumeration problem. Algorithmica, 2010, 56(1): 3–16

    Article  MathSciNet  Google Scholar 

  4. Wu Y, Jin R, Li J, Zhang X. Robust local community detection: on free rider effect and its elimination. Proceedings of the VLDB Endowment, 2015, 8(7): 798–809

    Article  Google Scholar 

  5. Wang M, Wang C, Yu J X, Zhang J. Community detection in social networks: an in-depth benchmarking study with a procedure-oriented framework. Proceedings of the VLDB Endowment, 2015, 8(10): 998–1009

    Article  Google Scholar 

  6. Cai H, Zheng V W, Zhu F, Chang K C C, Huang Z. From community detection to community profiling. Proceedings of the VLDB Endowment, 2017, 10(7): 817–828

    Article  Google Scholar 

  7. Palsetia D, Patwary M M A, Hendrix W, Agrawal A, Choudhary A. Clique guided community detection. In: Proceedings of the 2014 IEEE International Conference on Big Data. 2014, 500–509

  8. Boginski V, Butenko S, Pardalos P M. Mining market data: a network approach. Computers & Operations Research, 2006, 33(11): 3171–3184

    Article  Google Scholar 

  9. Berry N, Ko T, Moy T, Smrcka J, Turnley J, Wu B. Emergent clique formation in terrorist recruitment. In: Proceedings of the Workshop on Agent Organizations: Theory and Practice of AAAI’ 04. 2004, 1–8

  10. Kose F, Weckwerth W, Linke T, Fiehn O. Visualizing plant metabolomic correlation network using clique-metabolite matrices. Bioinformatics, 2001, 17: 1198–1208

    Article  Google Scholar 

  11. Cheng J, Zhu L, Ke Y, Chu S. Fast algorithms for maximal clique enumeration with limited memory. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2012, 1240–1248

  12. Makino K, Uno T. New algorithms for enumerating all maximal cliques. In: Proceedings of Scandinavian Workshop on Algorithm Theory. 2004, 260–272

    Chapter  Google Scholar 

  13. Östergård P R. A fast algorithm for the maximum clique problem. Discrete Applied Mathematics, 2002, 120(1–3): 197–207

    Article  MathSciNet  Google Scholar 

  14. Liang X, Lu R, Lin X, Shen X. Security and Privacy in Mobile Social Networks. Springer-Verlag New York, 2013

    Book  Google Scholar 

  15. Sarvari H, Abozinadah E, Mbaziira A, Mccoy D. Constructing and analyzing criminal networks. In: Proceedings of the 2014 IEEE Security and Privacy Workshops. 2014, 84–91

  16. Schall D. Service-Oriented Crowdsourcing. Springer-Verlag New York, 2012

    Book  Google Scholar 

  17. Bacon K, Dewan P. Mixed-initiative friend-list creation. In: Proceedings of the 12th European Conference on Computer Supported Cooperative Work. 2011, 293–312

    Google Scholar 

  18. Cui W, Xiao Y, Wang H, Lu Y, Wang W. Online search of overlapping communities. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. 2013, 277–288

  19. Matsunaga T, Yonemori C, Tomita E, Muramatsu M. Clique-based data mining for related genes in a biomedical database. BMC Bioinformatics, 2009, 10(1): 205

    Article  Google Scholar 

  20. Sarawagi S, Kirpal A. Efficient set joins on similarity predicates. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data. 2004, 743–754

  21. Hadjieleftheriou M, Chandel A, Koudas N, Srivastava D. Fast indexes and algorithms for set similarity selection queries. In: Proceedings of the 24th IEEE International Conference on Data Engineering. 2008, 267–276

  22. Hadjieleftheriou M, Srivastava D. Weighted set-based string similarity. IEEE Data Engineering Bulletin, 2010, 33(1): 25–36

    Google Scholar 

  23. Culpepper J S, Moffat A. Efficient set intersection for inverted indexing. ACM Transactions on Information System, 2010, 29(1): 1–24

    Article  Google Scholar 

  24. Wu H, Li G, Zhou L. Ginix: generalized inverted index for keyword search. Tsinghua Science and Technology, 2013, 18(1): 77–87

    Article  Google Scholar 

  25. Deng D, Li G, Wen H, Feng J. An efficient partition based method for exact set similarity joins. Proceedings of the VLDB Endowment, 2015, 9(4): 360–371

    Article  Google Scholar 

  26. Yuan L, Qin L, Lin X, Chang L, Zhang W. Diversified top-k clique search. In: Proceedings of the 31st IEEE International Conference on Data Engineering. 2015, 387–398

  27. Wang J, Cheng J, Fu A W C. Redundancy-aware maximal cliques. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2013, 122–130

  28. Li C, Lu J, Lu Y. Efficient merging and filtering algorithms for approximate string searches. In: Proceedings of the 24th IEEE International Conference on Data Engineering. 2008, 257–266

  29. Bayardo R J, Ma Y, Srikant R. Scaling up all pairs similarity search. In: Proceedings of the 16th International World Wide Web Conference. 2007, 131–140

  30. Xiao C, Wang W, Lin X, Yu J X. Efficient similarity joins for near duplicate detection. In: Proceedings of the 17th International World Wide Web Conference. 2008, 131–140

  31. Wang J, Li G, Feng J. Can we beat the prefix filtering?: An adaptive framework for similarity join and search. In: Proceedings of the 2012 ACM International Conference on Management of Data. 2012, 85–96

  32. Xiao C, Wang W, Lin X, Shang H. Top-k set similarity joins. In: Proceedings of the 25th IEEE International Conference on Data Engineering. 2009, 916–927

  33. Deng D, Li G, Feng J. A pivotal prefix based filtering algorithm for string similarity search. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 2014, 673–684

  34. Ao N, Zhang F, Wu D, Stones D S, Wang G, Liu X, Liu J, Lin S. Efficient parallel lists intersection and index compression algorithms using graphics processing units. Proceedings of the VLDB Endowment, 2011, 4(8): 470–481

    Article  Google Scholar 

  35. Inoue H, Ohara M, Taura K. Faster set intersection with simd instructions by reducing branch mispredictions. Proceedings of the VLDB Endowment, 2014, 8(3): 293–304

    Article  Google Scholar 

  36. Vernica R, Carey M J, Li C. Efficient parallel set-similarity joins using mapreduce. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010, 495–506

  37. Bolin Ding A C K. Fast set intersection in memory. In: Proceedings of the 37th International Conference on Very Large Databases. 2011, 255–266

    Article  Google Scholar 

  38. Fan Z, Peng Y, Choi B, Xu J, Bhowmick S S. Towards efficient authenticated subgraph query service in outsourced graph databases. IEEE Transactions on Services Computing, 2014, 7(4): 696–713

    Article  Google Scholar 

  39. Chvatal V. A greedy heuristic for the set-covering problem. Mathematics of Operations Research, 1979, 4(3): 233–235

    Article  MathSciNet  Google Scholar 

  40. Eppstein D, Löffler M, Strash D. Listing all maximal cliques in sparse graphs in near-optimal time. Algorithms and Computation, 2010, 403–414

Download references

Acknowledgements

Thanks to the support of NSF of China (61502258, 61806105), Major Technology Innovation Project of Shandong (2018CXGC0703), NSF of Shandong Province (ZR2014FQ007), the Project of Shandong Finance Society (2018SDJR31), Soft Science Fund of Shandong Province (2018RKB01373, 2016RKB01043). Also thanks to the Scientific Research Start-up Fund and Outstanding Young Scholars Program of Social Science at Qilu University of Technology.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yun Peng.

Additional information

Yun Peng is an associate professor of the Research Center of Big Data Application, Qilu University of Technology, China. He received the PhD degree from the Hong Kong Baptist University, China in 2013 and received the BSci and MPhil degrees in computer science from Shandong University and the Harbin Institute of Technology (HIT), China in 2006 and 2008, respectively. His research interests include graph-structured data processing, data mining, and machine learning.

Yitong Xu is a data scientist at Amazon. He received his BA in math and statistics from the University of Minnesota, Twin Cities, USA and MS in data science from the Data Science Institute, Columbia University, USA. His research interests include information retrieval, graph data mining and recommendation system.

Huawei Zhao is a professor of the Qilu University of Technology, China. He received his MS in School of Computer Science, and PhD degrees in Institute of Information & Network Security from Shandong University, China in 2002 and 2006 respectively. His research interests include unstructured data processing, information security, and digital economy.

Zhizheng Zhou is an associate professor of Qilu University of Technology, China. His research fields include financial data analysis and FinTech.

Huimin Han is now a PhD student at the College of Mechanics and Materials, Hohai University, China. Her research interests include Big Data Analysis and Structural Health Monitoring.

Electronic Supplementary Material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Peng, Y., Xu, Y., Zhao, H. et al. Most similar maximal clique query on large graphs. Front. Comput. Sci. 14, 143601 (2020). https://doi.org/10.1007/s11704-019-7235-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-019-7235-0

Keywords

Navigation