Skip to main content

Clique-TF-IDF: A New Partitioning Framework Based on Dense Substructures

  • Conference paper
  • First Online:
AIxIA 2023 – Advances in Artificial Intelligence (AIxIA 2023)

Abstract

Natural Language Processing (NLP) techniques are powerful tools for analyzing, understanding, and processing human language with a wide range of applications. In this paper we exploit NLP techniques, combined with Machine Learning clustering algorithms, to find good solutions to a traditional combinatorial problem, namely, the computation of a partition with high modularity of a graph. We introduce a novel framework, dubbed Clique-TF-IDF, for computing a graph partition. Such a framework leverages dense subgraphs of the input graph, modeled as maximal cliques, and characterizes each node in terms of the cliques it belongs to, similarly to a term-document matrix. Our experimental results show that the quality of the partitions produced by algorithm Clique-TF-IDF is comparable with that of the most effective algorithms in the literature. While our focus is on maximal cliques and partitioning algorithms, we believe that this strategy can be generalized to devise AI solutions for a variety of intractable combinatorial problems where some substructures can be efficiently enumerated and exploited.

This research was supported in part by MUR PRIN Projects no. 2022TS4Y3N (EXPAND) and no. 2022ME9Z78 (NextGRAAL).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Mach. Learn. 56(1–3), 89–113 (2004). https://doi.org/10.1023/B:MACH.0000033116.57574.95

    Article  MathSciNet  MATH  Google Scholar 

  2. Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008(10), P10008 (2008)

    Google Scholar 

  3. Brandes, U., et al.: On finding graph clusterings with maximum modularity. In: Brandstädt, A., Kratsch, D., Müller, H. (eds.) WG 2007. LNCS, vol. 4769, pp. 121–132. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74839-7_12

    Chapter  Google Scholar 

  4. Bron, C., Kerbosch, J.: Finding all cliques of an undirected graph (algorithm 457). Commun. ACM 16(9), 575–576 (1973)

    Article  MATH  Google Scholar 

  5. Cazals, F., Karande, C.: A note on the problem of reporting maximal cliques. Theor. Comput. Sci. 407(1–3), 564–568 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  6. Chen, Y., et al.: SP-GNN: learning structure and position information from graphs. Neural Netw. 161, 505–514 (2023). https://doi.org/10.1016/j.neunet.2023.01.051

    Article  Google Scholar 

  7. Cheng, J., Ke, Y., Fu, A.W.C., Yu, J.X., Zhu, L.: Finding maximal cliques in massive networks. ACM Trans. Database Syst. 36(4), 21 (2011)

    Article  Google Scholar 

  8. Cheng, J., Zhu, L., Ke, Y., Chu, S.: Fast algorithms for maximal clique enumeration with limited memory. In: KDD, pp. 1240–1248 (2012)

    Google Scholar 

  9. Clauset, A., Newman, M.E.J., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70, 066111 (2004)

    Article  Google Scholar 

  10. Combe, D., Largeron, C., Géry, M., Egyed-Zsigmond, E.: I-Louvain: an attributed graph clustering method. In: Fromont, E., De Bie, T., van Leeuwen, M. (eds.) IDA 2015. LNCS, vol. 9385, pp. 181–192. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24465-5_16

    Chapter  Google Scholar 

  11. Conte, A., De Virgilio, R., Maccioni, A., Patrignani, M., Torlone, R.: Finding all maximal cliques in very large social networks. In: EDBT 2016, pp. 173–184. OpenProceedings.org, Konstanz, Germany (2016)

    Google Scholar 

  12. Conte, A., Firmani, D., Patrignani, M., Torlone, R.: Shared-nothing distributed enumeration of 2-plexes. In: CIKM 2019, pp. 2469–2472. ACM, New York (2019)

    Google Scholar 

  13. Coppa, E., Finocchi, I., Garcia, R.L.: Counting cliques in parallel without a cluster: engineering a fork/join algorithm for shared-memory platforms. Inf. Sci. 496, 553–571 (2019)

    Article  Google Scholar 

  14. Cordasco, G., Gargano, L.: Community detection via semi-synchronous label propagation algorithms. In: 2010 IEEE International Workshop on: Business Applications of Social Network Analysis (BASNA), pp. 1–8 (2010)

    Google Scholar 

  15. Cui, H., Lu, Z., Li, P., Yang, C.: On positional and structural node features for graph neural networks on non-attributed graphs. In: Hasan, M.A., Xiong, L. (eds.) ACM CIKM 2022, pp. 3898–3902. ACM (2022)

    Google Scholar 

  16. Devvrit, F., Sinha, A., Dhillon, I.S., Jain, P.: S3GC: scalable self-supervised graph clustering. In: NeurIPS (2022)

    Google Scholar 

  17. Eppstein, D., Strash, D.: Listing all maximal cliques in large sparse real-world graphs. In: SEA, pp. 364–375 (2011)

    Google Scholar 

  18. Finocchi, I., Finocchi, M., Fusco, E.G.: Clique counting in mapreduce: algorithms and experiments. ACM J. Exp. Algorithmics 20, 1.7:1–1.7:20 (2015)

    Google Scholar 

  19. Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. 99(12), 7821–7826 (2002)

    Google Scholar 

  20. Grover, A., Leskovec, J.: Node2vec: scalable feature learning for networks. In: Krishnapuram, B., Shah, M., Smola, A.J., Aggarwal, C.C., Shen, D., Rastogi, R. (eds.) ACM SIGKDD 2016, pp. 855–864. ACM (2016). https://doi.org/10.1145/2939672.2939754

  21. Karp, R.M.: Reducibility among combinatorial problems. In: Miller, R.E., Thatcher, J.W., Bohlinger, J.D. (eds.) Complexity of Computer Computations. The IBM Research Symposia Series, pp. 85–103. Springer, Boston, MA (1972). https://doi.org/10.1007/978-1-4684-2001-2_9

  22. Koch, I.: Enumerating all connected maximal common subgraphs in two graphs. Theor. Comput. Sci. 250(1–2), 1–30 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  23. Lattanzi, S., Moseley, B., Vassilvitskii, S., Wang, Y., Zhou, R.: Robust online correlation clustering. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) NeurIPS 2021, pp. 4688–4698 (2021)

    Google Scholar 

  24. Leskovec, J., Krevl, A.: SNAP datasets: stanford large network dataset collection, June 2014. https://snap.stanford.edu/data

  25. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge, UK (2008). https://nlp.stanford.edu/IR-book/information-retrieval-book.html

  26. Pattillo, J., Youssef, N., Butenko, S.: Clique relaxation models in social network analysis. In: Thai, M., Pardalos, P. (eds.) Handbook of Optimization in Complex Networks. SOIA, vol. 58, pp. 143–162. Springer, New York, NY (2012). https://doi.org/10.1007/978-1-4614-0857-4_5

  27. Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: online learning of social representations. In: Macskassy, S.A., Perlich, C., Leskovec, J., Wang, W., Ghani, R. (eds.) The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, New York, NY, USA - 24–27 August 2014, pp. 701–710. ACM (2014). https://doi.org/10.1145/2623330.2623732

  28. Pizzuti, C.: GA-Net: a genetic algorithm for community detection in social networks. In: Rudolph, G., Jansen, T., Beume, N., Lucas, S., Poloni, C. (eds.) PPSN 2008. LNCS, vol. 5199, pp. 1081–1090. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87700-4_107

    Chapter  Google Scholar 

  29. Pons, P., Latapy, M.: Computing communities in large networks using random walks (long version) (2005)

    Google Scholar 

  30. Prat-Pérez, A., Dominguez-Sal, D., Larriba-Pey, J.L.: High quality, scalable and parallel community detection for large real graphs. In: Proceedings of the WWW 2014, pp. 225–236. Association for Computing Machinery, New York, NY, USA (2014)

    Google Scholar 

  31. Reichardt, J., Bornholdt, S.: Statistical mechanics of community detection. Phys.l Rev. E 74(1) (2006). https://doi.org/10.1103/2Fphysreve.74.016110

  32. Ribeiro, L.F.R., Saverese, P.H.P., Figueiredo, D.R.: Struc2vec: learning node representations from structural identity. In: ACM SIGKDD 2017, pp. 385–394. ACM (2017). https://doi.org/10.1145/3097983.3098061

  33. Rosvall, M., Bergstrom, C.T.: Maps of random walks on complex networks reveal community structure. Proc. Natl. Acad. Sci. 105(4), 1118–1123 (2008)

    Google Scholar 

  34. Saha, B., Subramanian, S.: Correlation clustering with same-cluster queries bounded by optimal cost. In: Bender, M.A., Svensson, O., Herman, G. (eds.) ESA 2019. LIPIcs, vol. 144, pp. 81:1–81:17. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019). https://doi.org/10.4230/LIPIcs.ESA.2019.81

  35. Srinivasan, B., Ribeiro, B.: On the equivalence between positional node embeddings and structural graph representations. In: ICLR 2020. OpenReview.net (2020)

    Google Scholar 

  36. Tan, P.N., Steinbach, M., Karpatne, A., Kumar, V.: Introduction to Data Mining, 2nd ed. Pearson, London (2018)

    Google Scholar 

  37. Tomita, E., Tanaka, A., Takahashi, H.: The worst-case time complexity for generating all maximal cliques. In: Chwa, K.-Y., Munro, J.I.J. (eds.) COCOON 2004. LNCS, vol. 3106, pp. 161–170. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-27798-9_19

    Chapter  Google Scholar 

  38. Traag, V.A., Waltman, L., van Eck, N.J.: From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9, 1–12 (2019). https://www.nature.com/articles/s41598-019-41695-z#citeas

  39. Wang, J., Cheng, J.: Truss decomposition in massive networks. Proc. VLDB Endow. 5(9), 812–823 (2012)

    Article  Google Scholar 

  40. Xu, Y., Cheng, J., Fu, A.W.C., Bu, Y.: Distributed maximal clique computation. In: International Congress on Big Data, pp. 160–167. IEEE (2014)

    Google Scholar 

  41. Zhu, J., Lu, X., Heimann, M., Koutra, D.: Node proximity is all you need: unified structural and positional node and graph embedding. In: Demeniconi, C., Davidson, I. (eds.) SIAM International Conference on Data Mining, SDM 2021, pp. 163–171. SIAM (2021). https://doi.org/10.1137/1.9781611976700.19

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marco D’Elia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

D’Elia, M., Finocchi, I., Patrignani, M. (2023). Clique-TF-IDF: A New Partitioning Framework Based on Dense Substructures. In: Basili, R., Lembo, D., Limongelli, C., Orlandini, A. (eds) AIxIA 2023 – Advances in Artificial Intelligence. AIxIA 2023. Lecture Notes in Computer Science(), vol 14318. Springer, Cham. https://doi.org/10.1007/978-3-031-47546-7_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-47546-7_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-47545-0

  • Online ISBN: 978-3-031-47546-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics