skip to main content
research-article

HERO: A Hierarchical Set Partitioning and Join Framework for Speeding up the Set Intersection Over Graphs

Published:26 March 2024Publication History
Skip Abstract Section

Abstract

As one of the most primitive operators in graph algorithms, such as the triangle counting, maximal clique enumeration, and subgraph listing, a set intersection operator returns common vertices between any two given sets of vertices in data graphs. It is therefore very important to accelerate the set intersection, which will benefit a bunch of tasks that take it as a built-in block. Existing works on the set intersection usually followed the merge intersection or galloping-search framework, and most optimization research focused on how to leverage the SIMD hardware instructions. In this paper, we propose a novel multi-level set intersection framework, namely hierarchical set partitioning and join (HERO), by using our well-designed set intersection bitmap tree (SIB-tree) index, which is independent of SIMD instructions and completely orthogonal to the merge intersection framework. We recursively decompose the set intersection task into small-sized subtasks and solve each subtask using bitmap and boolean AND operations. To sufficiently achieve the acceleration brought by our proposed intersection approach, we formulate a graph reordering problem, prove its NP-hardness, and then develop a heuristic algorithm to tackle this problem. Extensive experiments on real-world graphs have been conducted to confirm the efficiency and effectiveness of our HERO approach. The speedup over classic merge intersection achieves up to 188x and 176x for triangle counting and maximal clique enumeration, respectively.

References

  1. Aberger, C. R., Lamb, A., Tu, S., Nötzli, A., Olukotun, K., and Ré, C. Emptyheaded: A relational engine for graph processing. ACM Trans. Database Syst. 42, 4 (2017), 20:1--20:44.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Andreev, K., and Räcke, H. Balanced graph partitioning. In SPAA 2004: Proceedings of the Sixteenth Annual ACM Symposium on Parallelism in Algorithms and Architectures, June 27--30, 2004, Barcelona, Spain (2004), P. B. Gibbons and M. Adler, Eds., ACM, pp. 120--124.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Blandford, D. K., Blelloch, G. E., and Kash, I. A. Compact representations of separable graphs. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, January 12--14, 2003, Baltimore, Maryland, USA (2003), ACM/SIAM, pp. 679--688.Google ScholarGoogle Scholar
  4. Brendel, W., Han, F., Marujo, L., Jie, L., and Korolova, A. Practical privacy-preserving friend recommendations on social networks. In Companion of the The Web Conference 2018 on The Web Conference 2018, WWW 2018, Lyon , France, April 23--27, 2018 (2018), P. Champin, F. Gandon, M. Lalmas, and P. G. Ipeirotis, Eds., ACM, pp. 111--112.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bron, C., and Kerbosch, J. Finding all cliques of an undirected graph (algorithm 457). Commun. ACM 16, 9 (1973), 575--576.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chambi, S., Lemire, D., Godin, R., and Kaser, O. Roaring bitmap : nouveau modèle de compression bitmap. In Actes des 10e journées francophones sur les Entrepôts de Données et l'Analyse en Ligne, EDA 2014, Vichy, France, 5--6 Juin, 2014 (2014), S. Bimonte, L. d'Orazio, and E. Negre, Eds., vol. B-10 of RNTI, Hermann-Éditions, pp. 37--50.Google ScholarGoogle Scholar
  7. Chandran, J., and V., M. V. A novel triangle count-based influence maximization method on social networks. Int. J. Knowl. Syst. Sci. 12, 4 (2021), 92--108.Google ScholarGoogle Scholar
  8. Chu, S., and Cheng, J. Triangle listing in massive networks. ACM Trans. Knowl. Discov. Data 6, 4 (2012), 17:1--17:32.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Cui, W., Xiao, Y., Wang, H., Lu, Y., and Wang, W. Online search of overlapping communities. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 22--27, 2013 (2013), K. A. Ross, D. Srivastava, and D. Papadias, Eds., ACM, pp. 277--288.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Demaine, E. D., López-Ortiz, A., and Munro, J. I. Adaptive set intersections, unions, and differences. In Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms, January 9--11, 2000, San Francisco, CA, USA (2000), D. B. Shmoys, Ed., ACM/SIAM, pp. 743--752.Google ScholarGoogle Scholar
  11. Dhulipala, L., Kabiljo, I., Karrer, B., Ottaviano, G., Pupyrev, S., and Shalita, A. Compressing graphs and indexes with recursive graph bisection. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13--17, 2016 (2016), B. Krishnapuram, M. Shah, A. J. Smola, C. C. Aggarwal, D. Shen, and R. Rastogi, Eds., ACM, pp. 1535--1544.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ding, B., and König, A. C. Fast set intersection in memory. Proc. VLDB Endow. 4, 4 (2011), 255--266.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Garey, M. R., and Johnson, D. S. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, 1979.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Han, S., Zou, L., and Yu, J. X. Speeding up set intersections in graph algorithms using SIMD instructions. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10--15, 2018 (2018), G. Das, C. M. Jermaine, and P. A. Bernstein, Eds., ACM, pp. 1587--1602.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Huang, M., Jiang, Q., Qu, Q., Chen, L., and Chen, H. Information fusion oriented heterogeneous social network for friend recommendation via community detection. Appl. Soft Comput. 114 (2022), 108103.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Inoue, H., Ohara, M., and Taura, K. Faster set intersection with SIMD instructions by reducing branch mispredictions. Proc. VLDB Endow. 8, 3 (2014), 293--304.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kang, J., Zhang, J., Song, W., and Yang, X. Friend relationships recommendation algorithm in online education platform. In Web Information Systems and Applications - 18th International Conference, WISA 2021, Kaifeng, China, September 24--26, 2021, Proceedings (2021), C. Xing, X. Fu, Y. Zhang, G. Zhang, and C. Borjigin, Eds., vol. 12999 of Lecture Notes in Computer Science, Springer, pp. 592--604.Google ScholarGoogle Scholar
  18. Kunegis, J. KONECT: the koblenz network collection. In 22nd International World Wide Web Conference, WWW '13, Rio de Janeiro, Brazil, May 13--17, 2013, Companion Volume (2013), L. Carr, A. H. F. Laender, B. F. Lóscio, I. King, M. Fontoura, D. Vrandecic, L. Aroyo, J. P. M. de Oliveira, F. Lima, and E. Wilde, Eds., International World Wide Web Conferences Steering Committee / ACM, pp. 1343--1350.Google ScholarGoogle Scholar
  19. Lemire, D., Boytsov, L., and Kurz, N. SIMD compression and the intersection of sorted integers. Softw. Pract. Exp. 46, 6 (2016), 723--749.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Lemire, D., Kaser, O., Kurz, N., Deri, L., O'Hara, C., Saint-Jacqes, F., and Kai, G. S. Y. Roaring bitmaps: Implementation of an optimized software library. Softw. Pract. Exp. 48, 4 (2018), 867--895.Google ScholarGoogle ScholarCross RefCross Ref
  21. Leskovec, J., and Krevl, A. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.Google ScholarGoogle Scholar
  22. Lim, Y., Kang, U., and Faloutsos, C. Slashburn: Graph compression and mining beyond caveman communities. IEEE Trans. Knowl. Data Eng. 26, 12 (2014), 3077--3089.Google ScholarGoogle ScholarCross RefCross Ref
  23. Schlegel, B., Willhalm, T., and Lehner, W. Fast sorted-set intersection using SIMD instructions. In International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures - ADMS 2011, Seattle, WA, USA, September 2, 2011 (2011), R. Bordawekar and C. A. Lang, Eds., pp. 1--8.Google ScholarGoogle Scholar
  24. Shao, Y., Cui, B., Chen, L., Ma, L., Yao, J., and Xu, N. Parallel subgraph listing in a large-scale graph. In International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22--27, 2014 (2014), C. E. Dyreson, F. Li, and M. T. Özsu, Eds., ACM, pp. 625--636.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Shoaran, M., and Thomo, A. Zero-knowledge-private counting of group triangles in social networks. Comput. J. 60, 1 (2017), 126--134.Google ScholarGoogle ScholarCross RefCross Ref
  26. Shun, J. Shared-memory parallelism can be simple, fast, and scalable. Morgan & Claypool, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Shun, J., and Tangwongsan, K. Multicore triangle computations without tuning. In 31st IEEE International Conference on Data Engineering, ICDE 2015, Seoul, South Korea, April 13--17, 2015 (2015), J. Gehrke, W. Lehner, K. Shim, S. K. Cha, and G. M. Lohman, Eds., IEEE Computer Society, pp. 149--160.Google ScholarGoogle ScholarCross RefCross Ref
  28. Wang, N., Zhang, J., Tan, K., and Tung, A. K. H. On triangulation-based dense neighborhood graphs discovery. Proc. VLDB Endow. 4, 2 (2010), 58--68.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Yaozu, Cui, Junqiu, Li, Xingyuan, and Wang. Uncovering the overlapping community structure of complex networks by maximal cliques. Physica, A. Statistical mechanics and its applications 415 (2014), 398--406.Google ScholarGoogle Scholar
  30. Zechner, N., and Lingas, A. Efficient algorithms for subgraph listing. Algorithms 7, 2 (2014), 243--252.Google ScholarGoogle ScholarCross RefCross Ref
  31. Zheng, W., Yang, Y., and Piao, C. Accelerating set intersections over graphs by reducing-merging. In KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, Singapore, August 14--18, 2021 (2021), F. Zhu, B. C. Ooi, and C. Miao, Eds., ACM, pp. 2349--2359.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. HERO: A Hierarchical Set Partitioning and Join Framework for Speeding up the Set Intersection Over Graphs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Proceedings of the ACM on Management of Data
      Proceedings of the ACM on Management of Data  Volume 2, Issue 1
      PACMMOD
      February 2024
      1874 pages
      EISSN:2836-6573
      DOI:10.1145/3654807
      Issue’s Table of Contents

      Copyright © 2024 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 March 2024
      Published in pacmmod Volume 2, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Qualifiers

      • research-article
    • Article Metrics

      • Downloads (Last 12 months)61
      • Downloads (Last 6 weeks)30

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader