Abstract
Summarizing a large graph by grouping the nodes into clusters is a standard technique for studying the given network. Traditionally, the order of the discovered groups does not matter. However, there are applications where, for example, given a directed graph, we would like to find coherent groups while minimizing the backward cross edges. More formally, in this paper, we study a problem where we are given a directed network and are asked to partition the graph into a sequence of coherent groups while attempting to conform to the cross edges. We assume that nodes in the network have features, and we measure the group coherence by comparing these features. Furthermore, we incorporate the cross edges by penalizing the forward cross edges and backward cross edges with different weights. If the weights are set to 0, then the problem is equivalent to clustering. However, if we penalize the backward edges significantly more, then the order of discovered groups matters, and we can view our problem as a generalization of a classic segmentation problem. To solve the algorithm we consider a common iterative approach where we solve the groups given the centroids, and then find the centroids given the groups. We show that—unlike in clustering—the first subproblem is NP-hard. However, we show that if the underlying graph is a tree we can solve the subproblem with dynamic programming. In addition, if the number of groups is 2, we can solve the subproblem with a minimum cut. For the more general case, we propose a heuristic where we optimize each pair of groups separately while keeping the remaining groups intact. We also propose a greedy search where nodes are moved between the groups while optimizing the overall loss. We demonstrate with our experiments that the algorithms are practical and yield interpretable results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abbe, E.: Community detection and stochastic block models: recent developments. JMLR 18(1), 6446–6531 (2017)
Basu, S., Davidson, I., Wagstaff, K.: Constrained Clustering: Advances in Algorithms, Theory, and Applications. CRC Press (2008)
Bellman, R.: On the approximation of curves by line segments using dynamic programming. Commun. ACM 4(6), 284–284 (1961)
Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. TPAMI 26(9), 1124–1137 (2004)
Chung, F.: Laplacians and the cheeger inequality for directed graphs. Ann. Comb. 9, 1–19 (2005)
Dahlhaus, E., Johnson, D.S., Papadimitriou, C.H., Seymour, P.D., Yannakakis, M.: The complexity of multiterminal cuts. SIAM J. Comput. 23(4), 864–894 (1994)
Davidson, I., Ravi, S.: Clustering with constraints: feasibility issues and the k-means algorithm. In: SDM, pp. 138–149. SIAM (2005)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley (2001)
Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3–5), 75–174 (2010)
Guha, S., Koudas, N., Shim, K.: Data-streams and histograms. In: STOC, pp. 471–475 (2001)
Guha, S., Koudas, N., Shim, K.: Approximation and streaming algorithms for histogram construction problems. TODS 31(1), 396–438 (2006)
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)
Kyng, R., Rao, A., Sachdeva, S.: Fast, provable algorithms for isotonic regression in all \(\ell _p\)-norms. In: NIPS, pp. 2719–2727 (2015)
Leicht, E.A., Newman, M.E.: Community structure in directed networks. Phys. Rev. Lett. 100(11), 118703 (2008)
Mahajan, M., Nimbhorkar, P., Varadarajan, K.: The planar k-means problem is np-hard. Theoret. Comput. Sci. 442, 13–21 (2012)
Malliaros, F.D., Vazirgiannis, M.: Clustering and community detection in directed networks: a survey. Phys. Rep. 533(4), 95–142 (2013)
Meilă, M., Pentney, W.: Clustering by weighted cuts in directed graphs. In: SDM, pp. 135–144 (2007)
Newman, M.E., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)
Orlin, J.B.: Max flows in \(O(nm)\) time, or better. In: STOC, pp. 765–774 (2013)
Reimers, N.: SBert sentence-transformers documentation (2022). https://www.sbert.net/docs/pretrained_models.html. Accessed 02 Apr 2023
Rosvall, M., Bergstrom, C.T.: Maps of random walks on complex networks reveal community structure. PNAS 105(4), 1118–1123 (2008)
Schaeffer, S.E.: Graph clustering. Comput. Sci. Rev. 1(1), 27–64 (2007)
Song, K., Tan, X., Qin, T., Lu, J., Liu, T.Y.: MPNET: masked and permuted pre-training for language understanding. NIPS 33, 16857–16867 (2020)
Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: Arnetminer: extraction and mining of academic social networks. In: KDD, pp. 990–998 (2008)
Tatti, N.: Strongly polynomial efficient approximation scheme for segmentation. IPL 142, 1–8 (2019)
Von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17, 395–416 (2007)
Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S.: Constrained k-means clustering with background knowledge. In: ICML, vol. 1, pp. 577–584 (2001)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kumpulainen, I., Tatti, N. (2025). Finding Coherent Node Groups in Directed Graphs. In: Meo, R., Silvestri, F. (eds) Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2023. Communications in Computer and Information Science, vol 2135. Springer, Cham. https://doi.org/10.1007/978-3-031-74633-8_36
Download citation
DOI: https://doi.org/10.1007/978-3-031-74633-8_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-74632-1
Online ISBN: 978-3-031-74633-8
eBook Packages: Artificial Intelligence (R0)