Abstract
This paper studies the most similar maximal clique query (MSMCQ). Given a graph G and a set of nodes Q, MSMCQ is to find the maximal clique of G having the largest similarity with Q. MSMCQ has many real applications including advertising industry, public security, task crowdsourcing and social network, etc. MSMCQ can be studied as a special case of the general set similarity query (SSQ). However, the MCs of G has several specialties from the general sets. Based on the specialties of MCs, we propose a novel index, namely MCIndex. MCIndex outperforms the state-of-the-art SSQ method significantly in terms of the number of candidates and the query time. Specifically, we first construct an inverted index I for all the MCs of G. Since the MCs in a posting list often have a lot of overlaps, MCIndex selects some pivots to cluster the MCs with a small radius. Given a query Q, we compute the distance from the pivots to Q. The clusters of the pivots assured not answer can be pruned by our distance based pruning rule. Since it is NP-hard to construct a minimum MCIndex, we propose to construct a minimal MCIndex on I(v) with an approximation ratio 1 + ln |I(v)|. Since the MCs have properties that are inherent of graph structure, we further propose a SIndex within each cluster of a MCIndex and a structure based pruning rule. SIndex can significantly reduce the number of candidates. Since the sizes of intersections between Q and many MCs need to be computed during the query evaluation, we also propose a binary representation of MCs to improve the efficiency of the intersection size computation. Our extensive experiments confirm the effectiveness and efficiency of our proposed techniques on several real-world datasets.
Similar content being viewed by others
References
Hamann M, Röhrs E, Wagner D. Local community detection based on small cliques. Algorithms, 2017, 10(3): 1–22
Cui W, Xiao Y, Wang H, Wang W. Local search of communities in large graphs. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 2014, 991–1002
Uno T. An efficient algorithm for solving pseudo clique enumeration problem. Algorithmica, 2010, 56(1): 3–16
Wu Y, Jin R, Li J, Zhang X. Robust local community detection: on free rider effect and its elimination. Proceedings of the VLDB Endowment, 2015, 8(7): 798–809
Wang M, Wang C, Yu J X, Zhang J. Community detection in social networks: an in-depth benchmarking study with a procedure-oriented framework. Proceedings of the VLDB Endowment, 2015, 8(10): 998–1009
Cai H, Zheng V W, Zhu F, Chang K C C, Huang Z. From community detection to community profiling. Proceedings of the VLDB Endowment, 2017, 10(7): 817–828
Palsetia D, Patwary M M A, Hendrix W, Agrawal A, Choudhary A. Clique guided community detection. In: Proceedings of the 2014 IEEE International Conference on Big Data. 2014, 500–509
Boginski V, Butenko S, Pardalos P M. Mining market data: a network approach. Computers & Operations Research, 2006, 33(11): 3171–3184
Berry N, Ko T, Moy T, Smrcka J, Turnley J, Wu B. Emergent clique formation in terrorist recruitment. In: Proceedings of the Workshop on Agent Organizations: Theory and Practice of AAAI’ 04. 2004, 1–8
Kose F, Weckwerth W, Linke T, Fiehn O. Visualizing plant metabolomic correlation network using clique-metabolite matrices. Bioinformatics, 2001, 17: 1198–1208
Cheng J, Zhu L, Ke Y, Chu S. Fast algorithms for maximal clique enumeration with limited memory. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2012, 1240–1248
Makino K, Uno T. New algorithms for enumerating all maximal cliques. In: Proceedings of Scandinavian Workshop on Algorithm Theory. 2004, 260–272
Östergård P R. A fast algorithm for the maximum clique problem. Discrete Applied Mathematics, 2002, 120(1–3): 197–207
Liang X, Lu R, Lin X, Shen X. Security and Privacy in Mobile Social Networks. Springer-Verlag New York, 2013
Sarvari H, Abozinadah E, Mbaziira A, Mccoy D. Constructing and analyzing criminal networks. In: Proceedings of the 2014 IEEE Security and Privacy Workshops. 2014, 84–91
Schall D. Service-Oriented Crowdsourcing. Springer-Verlag New York, 2012
Bacon K, Dewan P. Mixed-initiative friend-list creation. In: Proceedings of the 12th European Conference on Computer Supported Cooperative Work. 2011, 293–312
Cui W, Xiao Y, Wang H, Lu Y, Wang W. Online search of overlapping communities. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. 2013, 277–288
Matsunaga T, Yonemori C, Tomita E, Muramatsu M. Clique-based data mining for related genes in a biomedical database. BMC Bioinformatics, 2009, 10(1): 205
Sarawagi S, Kirpal A. Efficient set joins on similarity predicates. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data. 2004, 743–754
Hadjieleftheriou M, Chandel A, Koudas N, Srivastava D. Fast indexes and algorithms for set similarity selection queries. In: Proceedings of the 24th IEEE International Conference on Data Engineering. 2008, 267–276
Hadjieleftheriou M, Srivastava D. Weighted set-based string similarity. IEEE Data Engineering Bulletin, 2010, 33(1): 25–36
Culpepper J S, Moffat A. Efficient set intersection for inverted indexing. ACM Transactions on Information System, 2010, 29(1): 1–24
Wu H, Li G, Zhou L. Ginix: generalized inverted index for keyword search. Tsinghua Science and Technology, 2013, 18(1): 77–87
Deng D, Li G, Wen H, Feng J. An efficient partition based method for exact set similarity joins. Proceedings of the VLDB Endowment, 2015, 9(4): 360–371
Yuan L, Qin L, Lin X, Chang L, Zhang W. Diversified top-k clique search. In: Proceedings of the 31st IEEE International Conference on Data Engineering. 2015, 387–398
Wang J, Cheng J, Fu A W C. Redundancy-aware maximal cliques. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2013, 122–130
Li C, Lu J, Lu Y. Efficient merging and filtering algorithms for approximate string searches. In: Proceedings of the 24th IEEE International Conference on Data Engineering. 2008, 257–266
Bayardo R J, Ma Y, Srikant R. Scaling up all pairs similarity search. In: Proceedings of the 16th International World Wide Web Conference. 2007, 131–140
Xiao C, Wang W, Lin X, Yu J X. Efficient similarity joins for near duplicate detection. In: Proceedings of the 17th International World Wide Web Conference. 2008, 131–140
Wang J, Li G, Feng J. Can we beat the prefix filtering?: An adaptive framework for similarity join and search. In: Proceedings of the 2012 ACM International Conference on Management of Data. 2012, 85–96
Xiao C, Wang W, Lin X, Shang H. Top-k set similarity joins. In: Proceedings of the 25th IEEE International Conference on Data Engineering. 2009, 916–927
Deng D, Li G, Feng J. A pivotal prefix based filtering algorithm for string similarity search. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 2014, 673–684
Ao N, Zhang F, Wu D, Stones D S, Wang G, Liu X, Liu J, Lin S. Efficient parallel lists intersection and index compression algorithms using graphics processing units. Proceedings of the VLDB Endowment, 2011, 4(8): 470–481
Inoue H, Ohara M, Taura K. Faster set intersection with simd instructions by reducing branch mispredictions. Proceedings of the VLDB Endowment, 2014, 8(3): 293–304
Vernica R, Carey M J, Li C. Efficient parallel set-similarity joins using mapreduce. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010, 495–506
Bolin Ding A C K. Fast set intersection in memory. In: Proceedings of the 37th International Conference on Very Large Databases. 2011, 255–266
Fan Z, Peng Y, Choi B, Xu J, Bhowmick S S. Towards efficient authenticated subgraph query service in outsourced graph databases. IEEE Transactions on Services Computing, 2014, 7(4): 696–713
Chvatal V. A greedy heuristic for the set-covering problem. Mathematics of Operations Research, 1979, 4(3): 233–235
Eppstein D, Löffler M, Strash D. Listing all maximal cliques in sparse graphs in near-optimal time. Algorithms and Computation, 2010, 403–414
Acknowledgements
Thanks to the support of NSF of China (61502258, 61806105), Major Technology Innovation Project of Shandong (2018CXGC0703), NSF of Shandong Province (ZR2014FQ007), the Project of Shandong Finance Society (2018SDJR31), Soft Science Fund of Shandong Province (2018RKB01373, 2016RKB01043). Also thanks to the Scientific Research Start-up Fund and Outstanding Young Scholars Program of Social Science at Qilu University of Technology.
Author information
Authors and Affiliations
Corresponding author
Additional information
Yun Peng is an associate professor of the Research Center of Big Data Application, Qilu University of Technology, China. He received the PhD degree from the Hong Kong Baptist University, China in 2013 and received the BSci and MPhil degrees in computer science from Shandong University and the Harbin Institute of Technology (HIT), China in 2006 and 2008, respectively. His research interests include graph-structured data processing, data mining, and machine learning.
Yitong Xu is a data scientist at Amazon. He received his BA in math and statistics from the University of Minnesota, Twin Cities, USA and MS in data science from the Data Science Institute, Columbia University, USA. His research interests include information retrieval, graph data mining and recommendation system.
Huawei Zhao is a professor of the Qilu University of Technology, China. He received his MS in School of Computer Science, and PhD degrees in Institute of Information & Network Security from Shandong University, China in 2002 and 2006 respectively. His research interests include unstructured data processing, information security, and digital economy.
Zhizheng Zhou is an associate professor of Qilu University of Technology, China. His research fields include financial data analysis and FinTech.
Huimin Han is now a PhD student at the College of Mechanics and Materials, Hohai University, China. Her research interests include Big Data Analysis and Structural Health Monitoring.
Electronic Supplementary Material
Rights and permissions
About this article
Cite this article
Peng, Y., Xu, Y., Zhao, H. et al. Most similar maximal clique query on large graphs. Front. Comput. Sci. 14, 143601 (2020). https://doi.org/10.1007/s11704-019-7235-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11704-019-7235-0