Skip to main content

Finding Coherent Node Groups in Directed Graphs

  • Conference paper
  • First Online:
Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2135))

  • 29 Accesses

Abstract

Summarizing a large graph by grouping the nodes into clusters is a standard technique for studying the given network. Traditionally, the order of the discovered groups does not matter. However, there are applications where, for example, given a directed graph, we would like to find coherent groups while minimizing the backward cross edges. More formally, in this paper, we study a problem where we are given a directed network and are asked to partition the graph into a sequence of coherent groups while attempting to conform to the cross edges. We assume that nodes in the network have features, and we measure the group coherence by comparing these features. Furthermore, we incorporate the cross edges by penalizing the forward cross edges and backward cross edges with different weights. If the weights are set to 0, then the problem is equivalent to clustering. However, if we penalize the backward edges significantly more, then the order of discovered groups matters, and we can view our problem as a generalization of a classic segmentation problem. To solve the algorithm we consider a common iterative approach where we solve the groups given the centroids, and then find the centroids given the groups. We show that—unlike in clustering—the first subproblem is NP-hard. However, we show that if the underlying graph is a tree we can solve the subproblem with dynamic programming. In addition, if the number of groups is 2, we can solve the subproblem with a minimum cut. For the more general case, we propose a heuristic where we optimize each pair of groups separately while keeping the remaining groups intact. We also propose a greedy search where nodes are moved between the groups while optimizing the overall loss. We demonstrate with our experiments that the algorithms are practical and yield interpretable results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://version.helsinki.fi/dacs/coherent-groups-network.

  2. 2.

    https://www.aminer.org/citation.

  3. 3.

    https://www.reddit.com/r/politics/comments/jptq5n/megathread_joe_biden_projected_to_defeat/.

References

  1. Abbe, E.: Community detection and stochastic block models: recent developments. JMLR 18(1), 6446–6531 (2017)

    MathSciNet  MATH  Google Scholar 

  2. Basu, S., Davidson, I., Wagstaff, K.: Constrained Clustering: Advances in Algorithms, Theory, and Applications. CRC Press (2008)

    Google Scholar 

  3. Bellman, R.: On the approximation of curves by line segments using dynamic programming. Commun. ACM 4(6), 284–284 (1961)

    Article  MathSciNet  MATH  Google Scholar 

  4. Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. TPAMI 26(9), 1124–1137 (2004)

    Article  MATH  Google Scholar 

  5. Chung, F.: Laplacians and the cheeger inequality for directed graphs. Ann. Comb. 9, 1–19 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  6. Dahlhaus, E., Johnson, D.S., Papadimitriou, C.H., Seymour, P.D., Yannakakis, M.: The complexity of multiterminal cuts. SIAM J. Comput. 23(4), 864–894 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  7. Davidson, I., Ravi, S.: Clustering with constraints: feasibility issues and the k-means algorithm. In: SDM, pp. 138–149. SIAM (2005)

    Google Scholar 

  8. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley (2001)

    Google Scholar 

  9. Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3–5), 75–174 (2010)

    Article  ADS  MathSciNet  MATH  Google Scholar 

  10. Guha, S., Koudas, N., Shim, K.: Data-streams and histograms. In: STOC, pp. 471–475 (2001)

    Google Scholar 

  11. Guha, S., Koudas, N., Shim, K.: Approximation and streaming algorithms for histogram construction problems. TODS 31(1), 396–438 (2006)

    Article  MATH  Google Scholar 

  12. Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)

    Article  MATH  Google Scholar 

  13. Kyng, R., Rao, A., Sachdeva, S.: Fast, provable algorithms for isotonic regression in all \(\ell _p\)-norms. In: NIPS, pp. 2719–2727 (2015)

    Google Scholar 

  14. Leicht, E.A., Newman, M.E.: Community structure in directed networks. Phys. Rev. Lett. 100(11), 118703 (2008)

    Article  ADS  PubMed  MATH  Google Scholar 

  15. Mahajan, M., Nimbhorkar, P., Varadarajan, K.: The planar k-means problem is np-hard. Theoret. Comput. Sci. 442, 13–21 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  16. Malliaros, F.D., Vazirgiannis, M.: Clustering and community detection in directed networks: a survey. Phys. Rep. 533(4), 95–142 (2013)

    Article  ADS  MathSciNet  MATH  Google Scholar 

  17. Meilă, M., Pentney, W.: Clustering by weighted cuts in directed graphs. In: SDM, pp. 135–144 (2007)

    Google Scholar 

  18. Newman, M.E., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)

    Article  ADS  MATH  Google Scholar 

  19. Orlin, J.B.: Max flows in \(O(nm)\) time, or better. In: STOC, pp. 765–774 (2013)

    Google Scholar 

  20. Reimers, N.: SBert sentence-transformers documentation (2022). https://www.sbert.net/docs/pretrained_models.html. Accessed 02 Apr 2023

  21. Rosvall, M., Bergstrom, C.T.: Maps of random walks on complex networks reveal community structure. PNAS 105(4), 1118–1123 (2008)

    Article  ADS  PubMed  PubMed Central  MATH  Google Scholar 

  22. Schaeffer, S.E.: Graph clustering. Comput. Sci. Rev. 1(1), 27–64 (2007)

    Article  Google Scholar 

  23. Song, K., Tan, X., Qin, T., Lu, J., Liu, T.Y.: MPNET: masked and permuted pre-training for language understanding. NIPS 33, 16857–16867 (2020)

    MATH  Google Scholar 

  24. Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: Arnetminer: extraction and mining of academic social networks. In: KDD, pp. 990–998 (2008)

    Google Scholar 

  25. Tatti, N.: Strongly polynomial efficient approximation scheme for segmentation. IPL 142, 1–8 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  26. Von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17, 395–416 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  27. Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S.: Constrained k-means clustering with background knowledge. In: ICML, vol. 1, pp. 577–584 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Iiro Kumpulainen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kumpulainen, I., Tatti, N. (2025). Finding Coherent Node Groups in Directed Graphs. In: Meo, R., Silvestri, F. (eds) Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2023. Communications in Computer and Information Science, vol 2135. Springer, Cham. https://doi.org/10.1007/978-3-031-74633-8_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-74633-8_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-74632-1

  • Online ISBN: 978-3-031-74633-8

  • eBook Packages: Artificial Intelligence (R0)

Publish with us

Policies and ethics