Skip to main content
Log in

Spectral embedding of directed networks

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Most relationships in a social network are asymmetric: The strength of A’s relationship to B is not the same as the strength of B’s relationship to A. Such relationships can reflect asymmetric emotional bonds, influence or power. It is natural to model such social networks by directed graphs, with a node for each participant, and a weighted directed edge for each relationship. Spectral embeddings for directed graphs are known, but they have significant weaknesses. We design a new directed graph embedding, prove that it has desirable mathematical properties, and demonstrate its application to both synthetic and real-world networks. The advantages of the new technique are that it avoids the weaknesses of existing techniques, it models the net flow across each node (the extent to which its upstream community is distinct from its downstream community), and it enables directed edge prediction (which so-far-unobserved edges are most likely to exist and which direction and intensity they will have).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Chang Y, Pantazis D, Leahy RM (2010) Statistically optimal modular partitioning of directed graphs. In: IEEE signals, systems and computers (ASILOMAR), pp 1075–1079

  • Chang Y, Pantazis D, Leahy RM (2011) Partitioning directed graphs based on modularity and information flow. In: IEEE international symposium on biomedical imaging: from nano to macro, pp 1105–1108

  • Chen T, Yang Q, Tang X (2007) Directed graph embedding. In: Proceedings of the international joint conference on artificial intelligence (IJCAI), pp 2707–2712

  • Chung F (2005) Laplacians and the Cheeger inequality for directed graphs. Ann Comb 9(1):1–19

    Article  MathSciNet  MATH  Google Scholar 

  • Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp 269–274

  • Felleman DJ, Van Essen DC (1991) Distributed hierarchical processing in the primate cerebral cortex. Cereb Cortex 1(1):1–47

    Article  Google Scholar 

  • Hilgetag C, O’Neill MA, Young MP (2000) Hierarchical organization of macaque and cat cortical sensory systems explored with a novel network processor. Philos Trans R Soc B Biol Sci 355(1393):71–89

    Article  Google Scholar 

  • Huang J, Zhu T, Schuurmans D (2006) Web communities identification from random walks. In: Knowledge discovery in databases: PKDD 2006, Springer, pp 187–198

  • Lehmann S (2003) Spires on the building of science: complex networks and scientific excellence. Ph.D. thesis, The Niels Bohr Institute

  • Leicht EA, Newman ME (2008) Community structure in directed networks. Phys Rev Lett 100(11):118703

    Article  Google Scholar 

  • Malliaros FD, Vazirgiannis M (2013) Clustering and community detection in directed networks: a survey. Phys Rep 533(4):95–142

    Article  MathSciNet  Google Scholar 

  • Meila M, Pentney W (2007) Clustering by weighted cuts in directed graphs. In: Proceedings of the 7th SIAM international conference on data mining, pp 135–144

  • Négyessy L, Nepusz T, Kocsis L, Bazsó F (2006) Prediction of the main cortical areas and connections involved in the tactile function of the visual cortex by network analysis. Eur J Neurosci 23(7):1919–1930

    Article  Google Scholar 

  • Nepusz T, Petróczi A, Négyessy L, Bazsó F (2008) Fuzzy communities and the concept of bridgeness in complex networks. Phys Rev E 77(1):016107

    Article  MathSciNet  Google Scholar 

  • Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. Adv Neural Inf Process Syst 2:849–856

    Google Scholar 

  • Padgett JF, Ansell CK (1993) Robust action and the rise of the Medici, 1400–1434. Am J Sociol 98:1259–1319

    Article  Google Scholar 

  • Satuluri V, Parthasarathy S (2011) Symmetrizations for clustering directed graphs. In: Proceedings of the 14th international conference on extending database technology, pp 343–354, ACM

  • Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intel 22(8):888–905

    Article  Google Scholar 

  • Skillicorn DB, Zheng Q (2014) Global structure in social networks with directed typed edges. In Social networks: analysis and case studies. Springer, pp 61–81

  • Von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416

    Article  MathSciNet  Google Scholar 

  • Yu SX, Shi J (2001) Grouping with directed relationships. In: Energy minimization methods in computer vision and pattern recognition. Springer, pp 283–297

  • Zhou D, Burges CJC (2007) Spectral clustering and transductive learning with multiple views. In: Proceedings of the 24th international conference on machine learning, ICML ’07. ACM, New York, NY, pp 1159–1166

  • Zhou D, Huang J, Schölkopf B (2005) Learning from labeled and unlabeled data on a directed graph. In: Proceedings of the 22nd international conference on machine learning. ACM, pp 1036–1043

  • Zhou D, Schölkopf B, Hofmann T (2005) Semi-supervised learning on directed graphs. In: The annual neural information processing Systems, MIT Press, pp 1633–1640

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to D. B. Skillicorn.

Appendix

Appendix

Proofs that clustering keeps both versions in the same cluster: Figure 9 is a graph cut partition using our graph construction, where a pair of nodes \(x_\mathrm{in}\) and \(x_{out}\) are placed in two different groups, A and \(\bar{A}\). The cost of the cut is \(cut(A,\bar{A})\). Let p be the sum of edge weights from \(x_\mathrm{in}\) to all the nodes in \(\bar{A}\) except \(x_{out}\). Let q be the sum of edge weights from \(x_{out}\) to all the nodes in A except \(x_\mathrm{in}\). This cut will always be worse than the cut that puts \(x_\mathrm{in}\) and \(x_{out}\) in the same group based on both RatioCut and Ncut.

Fig. 9
figure 9

A Cut with the in and out copies of a node x in two different clusters

  1. 1.

    RatioCut Consistency: Assume there is a minimum RatioCut which separates at least one pair of nodes \(x_\mathrm{in}\) and \(x_{out}\) into two different groups A and \(\bar{A}\) as shown in Fig. 9. Then

    $$\begin{aligned} \min RatioCut=RatioCut(A, \bar{A}) & = {} \frac{cut(A, \bar{A})}{|A|}\,+\,\frac{cut(A, \bar{A})}{|\bar{A}|}\\ & = {} \frac{cut(A, \bar{A})*2n}{|A||\bar{A}|}, \end{aligned}$$

    where |A| is the number of the nodes in group A.

    By moving \(x_{out}\) to A, we get

    $$\begin{aligned} &RatioCut(A+x_{out},\bar{A}-x_{out})\\ &\quad= {} \frac{cut((A+x_{out},\bar{A}-x_{out}))*2n}{|A+x_{out}||\bar{A}-x_{out}|}\\ &\quad= {} \frac{cut((A+x_{out},\bar{A}-x_{out}))*2n}{(|A|+1)(|\bar{A}|-1)}\\ &\quad= {} \frac{\left( cut((A,\bar{A})-(din_x+dout_x+q)+(dout_x-q)\right) *2n}{(|A|+1)(|\bar{A}|-1)}\\ &\quad= {}\frac{\left( cut((A,\bar{A})-din_x-2q\right) *2n}{|A||\bar{A}|-|A|+|\bar{A}|-1}. \end{aligned}$$

    Similarly by moving \(x_\mathrm{in}\) to \(\bar{A}\), we get

    $$\begin{aligned}&RatioCut(A-x_\mathrm{in},\bar{A}+x_\mathrm{in})\\& \quad = {} \frac{\left( cut((A,\bar{A})-dout_x-2p\right) *2n}{|A||\bar{A}|+|A|-|\bar{A}|-1}. \end{aligned}$$

    Since the \(RatioCut(A, \bar{A})\) is the minimum,

    $$\begin{aligned} RatioCut(A, \bar{A})\le & {} RatioCut(A+x_{out},\bar{A}-x_{out})\nonumber \\ \Rightarrow \frac{cut(A, \bar{A})}{|A||\bar{A}|}\le & {} \frac{cut((A,\bar{A})-din_x-2q}{|A||\bar{A}|-|A|+|\bar{A}|-1} \end{aligned}$$
    (1)
    $$\begin{aligned} \text {and}\quad RatioCut(A, \bar{A})\le & {} RatioCut(A-x_\mathrm{in},\bar{A}+x_\mathrm{in})\nonumber \\ \Rightarrow \frac{cut(A, \bar{A})}{|A||\bar{A}|}\le & {} \frac{cut((A,\bar{A})-dout_x-2p}{|A||\bar{A}|+|A|-|\bar{A}|-1}. \end{aligned}$$
    (2)

    The graph with two versions of each node must have an even number of nodes. There are three cases for the relative sizes of the pieces of the partition. If \(|A|>|\bar{A}|\), then \(|A|\ge |\bar{A}|+2\). Thus, \(|A||\bar{A}|<|A||\bar{A}|+|A|-|\bar{A}|-1\). Furthermore, \(dout_x\ge 0\) and \(p\ge 0\). This implies that (2) is not true. Similarly, (1) is not true if \(|A|<|\bar{A}|\). If \(|A|=|\bar{A}|\), there is at least one another pair of nodes \(y_\mathrm{in}\) and \(y_{out}\) that is separated into two different groups. By moving \(x_\mathrm{in}\) and \(x_{out}\) into one group and \(y_\mathrm{in}\) and \(y_{out}\) into another, the cut cost will be reduced, although the number of nodes in each group is the same as before. Again this implies \(RatioCut(A, \bar{A})\) is not the minimum.

    So, there does not exist a minimum RatioCut which separates at least one pair of nodes \(x_\mathrm{in}\) and \(x_{out}\) into different groups in a two-way RatioCut. It is straightforward to prove the same result for k-way RatioCut, since the costs of the uninvolved clusters are constant. Therefore, RatioCut clustering will not separate any pair of nodes \(x_\mathrm{in}\) and \(x_{out}\) into two different groups using our approach.

  2. 2.

    NCut Consistency: Assume there is a minimum NCut which separates at least a pair of nodes \(x_\mathrm{in}\) and \(x_{out}\) into two different groups A and \(\bar{A}\) as shown in Fig. 9. Thus

    $$\begin{aligned} \min NCut=NCut(A, \bar{A})= & {} \frac{cut(A, \bar{A})}{vol(A)}+\frac{cut(A, \bar{A})}{vol(\bar{A})}\\= & {} \frac{cut(A, \bar{A})*vol(V)}{vol(A)vol(\bar{A})}, \end{aligned}$$

    where vol(A) is the degree sum of the nodes in cluster A, and vol(V) is the degree sum of the nodes in the double version structure, i.e., \(Vol(V)=\sum _{i=1}^n (t_{iin}+t_{iout})=3\sum _{i=1}^n (din_i+dout_i)\).

    By moving \(x_{out}\) to A, we get

    $$\begin{aligned} &NCut(A+x_{out},\bar{A}-x_{out})\\ &\quad =\, {} \frac{cut((A+x_{out},\bar{A}-x_{out}))*vol(V)}{vol(A+x_{out})vol(\bar{A}-x_{out})}\\ &\quad=\, {} \frac{\left( cut((A,\bar{A})-(din_x+dout_x+q)+(dout_x-q)\right) *vol(V)}{(vol(A)+t_{xout})(vol(\bar{A})-t_{xout})}\\ &\quad=\, {} \frac{\left( cut((A,\bar{A})-din_x-2q\right) *vol(V)}{(vol(A)+din_x+2dout_x)(vol(\bar{A})-din_x-2dout_x)}. \end{aligned}$$

    Similarly by moving \(x_\mathrm{in}\) to \(\bar{A}\), we get

    $$\begin{aligned}&NCut(A-x_{in},\bar{A}+x_{in})\\ &\quad=\, {} \frac{\left( cut((A,\bar{A})-dout_x-2p\right) *vol(V)}{(vol(A)-2din_x-dout_x)(vol(\bar{A})+2din_x+dout_x)}. \end{aligned}$$

    Since the \(NCut(A, \bar{A})\) is the minimum,

    $$\begin{aligned} &NCut(A, \bar{A})\le NCut(A+x_{out},\bar{A}-x_{out})\nonumber \\&\quad{\Rightarrow \frac{cut(A, \bar{A})}{vol(A)vol(\bar{A})}\,\le\, \frac{cut((A,\bar{A})-din_x-2q}{(vol(A)+din_x+2dout_x)(vol(\bar{A})-din_x-2dout_x)}} \end{aligned}$$
    (3)
    $$\begin{aligned}&\text {and}\quad NCut(A, \bar{A})\le NCut(A-x_\mathrm{in},\bar{A}+x_\mathrm{in})\nonumber \\&{\Rightarrow \frac{cut(A, \bar{A})}{vol(A)vol(\bar{A})}\le \frac{cut((A,\bar{A})-dout_x-2p}{(vol(A)-2din_x-dout_x)(vol(\bar{A})+2din_x+dout_x)}.} \end{aligned}$$
    (4)

    Let \(cut(B,\bar{B})\) be any cut where every pair of in and out versions are in the same group. Then

    $$\begin{aligned} cut(B,\bar{B})\le \left\{ \begin{aligned} \sum _{i\in B} (din_i+dout_i)\\ \sum _{i\in \bar{B}} (din_i+dout_i) \end{aligned}\right. \end{aligned}$$
    $$\begin{aligned} \text{and}\quad vol(B)=\sum _{i\in B} (t_{iin}+t_{iout})=\sum _{i\in B} 3*(din_i+dout_i), \end{aligned}$$
    $$\begin{aligned} \quad \quad vol({B})=\sum _{i\in {B}} (t_{iin}+t_{iout})=\sum _{i\in {B}} 3*(din_i+dout_i). \end{aligned}$$
    $$\begin{aligned} \text {Thus,}\quad NCut(B, \bar{B})=\frac{cut(B, \bar{B})}{vol(B)}+\frac{cut(B, \bar{B})}{vol(\bar{B})}\le \dfrac{2}{3}. \end{aligned}$$

    Since the \(NCut(A, \bar{A})\) is the minimum,

    $$\begin{aligned} NCut(A, \bar{A})=\frac{cut(A, \bar{A})*vol(V)}{vol(A)vol(\bar{A})} \le \frac{2}{3} \end{aligned}$$
    (5)

    From the matrix we have:

    $$\begin{aligned}&vol(V)\ge 6*(din_x +dout_x)\nonumber \\&\quad\Rightarrow cut(A, \bar{A})(din_x +dout_x)\le vol(A)vol(\bar{A}). \end{aligned}$$
    (6)

    (3) implies

    $$\begin{aligned}&\frac{cut(A, \bar{A})}{vol(A)vol(\bar{A})}\le \frac{cut((A,\bar{A})-din_x}{(vol(A)+din_x+2dout_x)(vol(\bar{A})-din_x-2d out_x)},\\&\implies {din_x vol(A)vol(\bar{A})}\\&{\le cut(A, \bar{A})\left( vol(A)(din_x+2dout_x)- vol(\bar{A})(din_x+2dout_x) +(din_x+2dout_x)^2\right) .} \end{aligned}$$

    Similarly, (4) implies

    $$\begin{aligned} &{ dout_x vol(A)vol(\bar{A})}\\&\quad{\le cut(A, \bar{A})\left( -vol(A)(2din_x+dout_x)+vol(\bar{A})(2din_x+dout_x) +\,(2din_x+dout_x)^2\right) .} \end{aligned}$$

    By adding above two equations together, we get

    $$\begin{aligned} &{(din_x+dout_x) vol(A)vol(\bar{A})}\\&\quad{\le cut(A, \bar{A})\,\left( vol(A)\,(-din_x\,+\,dout_x)+vol(\bar{A})\,(din_x\,-\,dout_x) +\,5din_x^2+\,5dout_x^2+\,8din_x dout_x\right) .} \end{aligned}$$

    Combining with (6), we get

    $$\begin{aligned} 4din_x^2\,+\, & {} 4dout_x^2+10din_x dout_x \nonumber \\\le & {} vol(A)(-din_x+dout_x)+vol(\bar{A})(din_x-dout_x). \end{aligned}$$
    (7)

    In the inequality (3), the numerator on the left is greater than or equal to the numerator on the right. Thus, the denominator on the left has to be greater than or equal to the denominator on the right:

    $$\begin{aligned} vol(A)vol(\bar{A})\ge (vol(A)+din_x+2dout_x)(vol(\bar{A})-din_x-2dout_x). \end{aligned}$$

    If \(vol(A)\le vol(\bar{A})\), then

    $$\begin{aligned} \Rightarrow vol(\bar{A})\le vol(A)+din_x+2dout_x. \end{aligned}$$

    By applying this to (7), we get

    $$\begin{aligned} &4din_x^2+4dout_x^2+10din_x dout_x\\&\quad\le vol(A)\,(-din_x+dout_x)+(vol(A)+din_x+2dout_x)(din_x-dout_x)\\&\quad\Rightarrow 3din_x^2+6dout_x^2+9din_x dout_x \le 0. \end{aligned}$$

    Since \(din_x \ge 0, din_x \ge 0\) and \(din_x +dout_x > 0\) for a connected graph, the above inequality does not hold. Similarly we can prove that, if \(vol(A)\ge vol(\bar{A})\), the inequality (7) does not hold either. Thus, the assumption cannot be true. In other words, there does not exist a minimum NCut which separates at least one pair of nodes \(x_\mathrm{in}\) and \(x_{out}\) into different groups in a two-way NCut. The proof generalizes to multiple clusters as before. Therefore, NCut clustering will not separate a pair of nodes \(x_\mathrm{in}\) and \(x_{out}\) into two different clusters using our approach.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zheng, Q., Skillicorn, D.B. Spectral embedding of directed networks. Soc. Netw. Anal. Min. 6, 76 (2016). https://doi.org/10.1007/s13278-016-0387-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-016-0387-0

Keywords

Navigation