Spectral embedding of directed networks

Zheng, Q.; Skillicorn, D. B.

doi:10.1007/s13278-016-0387-0

Spectral embedding of directed networks

Original Article
Published: 10 September 2016

Volume 6, article number 76, (2016)
Cite this article

Social Network Analysis and Mining Aims and scope Submit manuscript

Q. Zheng¹ &
D. B. Skillicorn¹

572 Accesses
5 Citations
Explore all metrics

Abstract

Most relationships in a social network are asymmetric: The strength of A’s relationship to B is not the same as the strength of B’s relationship to A. Such relationships can reflect asymmetric emotional bonds, influence or power. It is natural to model such social networks by directed graphs, with a node for each participant, and a weighted directed edge for each relationship. Spectral embeddings for directed graphs are known, but they have significant weaknesses. We design a new directed graph embedding, prove that it has desirable mathematical properties, and demonstrate its application to both synthetic and real-world networks. The advantages of the new technique are that it avoids the weaknesses of existing techniques, it models the net flow across each node (the extent to which its upstream community is distinct from its downstream community), and it enables directed edge prediction (which so-far-unobserved edges are most likely to exist and which direction and intensity they will have).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The homophily principle in social network analysis: A survey

Article 18 January 2022

Centrality measures in networks

Article 24 April 2023

Complex Networks: a Mini-review

Article 13 July 2020

References

Chang Y, Pantazis D, Leahy RM (2010) Statistically optimal modular partitioning of directed graphs. In: IEEE signals, systems and computers (ASILOMAR), pp 1075–1079
Chang Y, Pantazis D, Leahy RM (2011) Partitioning directed graphs based on modularity and information flow. In: IEEE international symposium on biomedical imaging: from nano to macro, pp 1105–1108
Chen T, Yang Q, Tang X (2007) Directed graph embedding. In: Proceedings of the international joint conference on artificial intelligence (IJCAI), pp 2707–2712
Chung F (2005) Laplacians and the Cheeger inequality for directed graphs. Ann Comb 9(1):1–19
Article MathSciNet MATH Google Scholar
Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp 269–274
Felleman DJ, Van Essen DC (1991) Distributed hierarchical processing in the primate cerebral cortex. Cereb Cortex 1(1):1–47
Article Google Scholar
Hilgetag C, O’Neill MA, Young MP (2000) Hierarchical organization of macaque and cat cortical sensory systems explored with a novel network processor. Philos Trans R Soc B Biol Sci 355(1393):71–89
Article Google Scholar
Huang J, Zhu T, Schuurmans D (2006) Web communities identification from random walks. In: Knowledge discovery in databases: PKDD 2006, Springer, pp 187–198
Lehmann S (2003) Spires on the building of science: complex networks and scientific excellence. Ph.D. thesis, The Niels Bohr Institute
Leicht EA, Newman ME (2008) Community structure in directed networks. Phys Rev Lett 100(11):118703
Article Google Scholar
Malliaros FD, Vazirgiannis M (2013) Clustering and community detection in directed networks: a survey. Phys Rep 533(4):95–142
Article MathSciNet Google Scholar
Meila M, Pentney W (2007) Clustering by weighted cuts in directed graphs. In: Proceedings of the 7th SIAM international conference on data mining, pp 135–144
Négyessy L, Nepusz T, Kocsis L, Bazsó F (2006) Prediction of the main cortical areas and connections involved in the tactile function of the visual cortex by network analysis. Eur J Neurosci 23(7):1919–1930
Article Google Scholar
Nepusz T, Petróczi A, Négyessy L, Bazsó F (2008) Fuzzy communities and the concept of bridgeness in complex networks. Phys Rev E 77(1):016107
Article MathSciNet Google Scholar
Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. Adv Neural Inf Process Syst 2:849–856
Google Scholar
Padgett JF, Ansell CK (1993) Robust action and the rise of the Medici, 1400–1434. Am J Sociol 98:1259–1319
Article Google Scholar
Satuluri V, Parthasarathy S (2011) Symmetrizations for clustering directed graphs. In: Proceedings of the 14th international conference on extending database technology, pp 343–354, ACM
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intel 22(8):888–905
Article Google Scholar
Skillicorn DB, Zheng Q (2014) Global structure in social networks with directed typed edges. In Social networks: analysis and case studies. Springer, pp 61–81
Von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416
Article MathSciNet Google Scholar
Yu SX, Shi J (2001) Grouping with directed relationships. In: Energy minimization methods in computer vision and pattern recognition. Springer, pp 283–297
Zhou D, Burges CJC (2007) Spectral clustering and transductive learning with multiple views. In: Proceedings of the 24th international conference on machine learning, ICML ’07. ACM, New York, NY, pp 1159–1166
Zhou D, Huang J, Schölkopf B (2005) Learning from labeled and unlabeled data on a directed graph. In: Proceedings of the 22nd international conference on machine learning. ACM, pp 1036–1043
Zhou D, Schölkopf B, Hofmann T (2005) Semi-supervised learning on directed graphs. In: The annual neural information processing Systems, MIT Press, pp 1633–1640

Download references

Author information

Authors and Affiliations

School of Computing, Queen’s University, Kingston, Canada
Q. Zheng & D. B. Skillicorn

Authors

Q. Zheng
View author publications
You can also search for this author in PubMed Google Scholar
D. B. Skillicorn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to D. B. Skillicorn.

Appendix

Proofs that clustering keeps both versions in the same cluster: Figure 9 is a graph cut partition using our graph construction, where a pair of nodes $x_\mathrm{in}$ and $x_{out}$ are placed in two different groups, A and $\bar{A}$. The cost of the cut is $cut(A,\bar{A})$. Let p be the sum of edge weights from $x_\mathrm{in}$ to all the nodes in $\bar{A}$ except $x_{out}$. Let q be the sum of edge weights from $x_{out}$ to all the nodes in A except $x_\mathrm{in}$. This cut will always be worse than the cut that puts $x_\mathrm{in}$ and $x_{out}$ in the same group based on both RatioCut and Ncut.

1.
RatioCut Consistency: Assume there is a minimum RatioCut which separates at least one pair of nodes $x_\mathrm{in}$ and $x_{out}$ into two different groups A and $\bar{A}$ as shown in Fig. 9. Then
$$\begin{aligned} \min RatioCut=RatioCut(A, \bar{A}) & = {} \frac{cut(A, \bar{A})}{|A|}\,+\,\frac{cut(A, \bar{A})}{|\bar{A}|}\\ & = {} \frac{cut(A, \bar{A})*2n}{|A||\bar{A}|}, \end{aligned}$$
where |A| is the number of the nodes in group A.

By moving $x_{out}$ to A, we get
$$\begin{aligned} &RatioCut(A+x_{out},\bar{A}-x_{out})\\ &\quad= {} \frac{cut((A+x_{out},\bar{A}-x_{out}))*2n}{|A+x_{out}||\bar{A}-x_{out}|}\\ &\quad= {} \frac{cut((A+x_{out},\bar{A}-x_{out}))*2n}{(|A|+1)(|\bar{A}|-1)}\\ &\quad= {} \frac{\left( cut((A,\bar{A})-(din_x+dout_x+q)+(dout_x-q)\right) *2n}{(|A|+1)(|\bar{A}|-1)}\\ &\quad= {}\frac{\left( cut((A,\bar{A})-din_x-2q\right) *2n}{|A||\bar{A}|-|A|+|\bar{A}|-1}. \end{aligned}$$
Similarly by moving $x_\mathrm{in}$ to $\bar{A}$, we get
$$\begin{aligned}&RatioCut(A-x_\mathrm{in},\bar{A}+x_\mathrm{in})\\& \quad = {} \frac{\left( cut((A,\bar{A})-dout_x-2p\right) *2n}{|A||\bar{A}|+|A|-|\bar{A}|-1}. \end{aligned}$$
Since the $RatioCut(A, \bar{A})$ is the minimum,
$$\begin{aligned} RatioCut(A, \bar{A})\le & {} RatioCut(A+x_{out},\bar{A}-x_{out})\nonumber \\ \Rightarrow \frac{cut(A, \bar{A})}{|A||\bar{A}|}\le & {} \frac{cut((A,\bar{A})-din_x-2q}{|A||\bar{A}|-|A|+|\bar{A}|-1} \end{aligned}$$
(1)

$$\begin{aligned} \text {and}\quad RatioCut(A, \bar{A})\le & {} RatioCut(A-x_\mathrm{in},\bar{A}+x_\mathrm{in})\nonumber \\ \Rightarrow \frac{cut(A, \bar{A})}{|A||\bar{A}|}\le & {} \frac{cut((A,\bar{A})-dout_x-2p}{|A||\bar{A}|+|A|-|\bar{A}|-1}. \end{aligned}$$
(2)
The graph with two versions of each node must have an even number of nodes. There are three cases for the relative sizes of the pieces of the partition. If $|A|>|\bar{A}|$, then $|A|\ge |\bar{A}|+2$. Thus, $|A||\bar{A}|<|A||\bar{A}|+|A|-|\bar{A}|-1$. Furthermore, $dout_x\ge 0$ and $p\ge 0$. This implies that (2) is not true. Similarly, (1) is not true if $|A|<|\bar{A}|$. If $|A|=|\bar{A}|$, there is at least one another pair of nodes $y_\mathrm{in}$ and $y_{out}$ that is separated into two different groups. By moving $x_\mathrm{in}$ and $x_{out}$ into one group and $y_\mathrm{in}$ and $y_{out}$ into another, the cut cost will be reduced, although the number of nodes in each group is the same as before. Again this implies $RatioCut(A, \bar{A})$ is not the minimum.

So, there does not exist a minimum RatioCut which separates at least one pair of nodes $x_\mathrm{in}$ and $x_{out}$ into different groups in a two-way RatioCut. It is straightforward to prove the same result for k-way RatioCut, since the costs of the uninvolved clusters are constant. Therefore, RatioCut clustering will not separate any pair of nodes $x_\mathrm{in}$ and $x_{out}$ into two different groups using our approach.
2.
NCut Consistency: Assume there is a minimum NCut which separates at least a pair of nodes $x_\mathrm{in}$ and $x_{out}$ into two different groups A and $\bar{A}$ as shown in Fig. 9. Thus
$$\begin{aligned} \min NCut=NCut(A, \bar{A})= & {} \frac{cut(A, \bar{A})}{vol(A)}+\frac{cut(A, \bar{A})}{vol(\bar{A})}\\= & {} \frac{cut(A, \bar{A})*vol(V)}{vol(A)vol(\bar{A})}, \end{aligned}$$
where vol(A) is the degree sum of the nodes in cluster A, and vol(V) is the degree sum of the nodes in the double version structure, i.e., $Vol(V)=\sum _{i=1}^n (t_{iin}+t_{iout})=3\sum _{i=1}^n (din_i+dout_i)$.

By moving $x_{out}$ to A, we get
$$\begin{aligned} &NCut(A+x_{out},\bar{A}-x_{out})\\ &\quad =\, {} \frac{cut((A+x_{out},\bar{A}-x_{out}))*vol(V)}{vol(A+x_{out})vol(\bar{A}-x_{out})}\\ &\quad=\, {} \frac{\left( cut((A,\bar{A})-(din_x+dout_x+q)+(dout_x-q)\right) *vol(V)}{(vol(A)+t_{xout})(vol(\bar{A})-t_{xout})}\\ &\quad=\, {} \frac{\left( cut((A,\bar{A})-din_x-2q\right) *vol(V)}{(vol(A)+din_x+2dout_x)(vol(\bar{A})-din_x-2dout_x)}. \end{aligned}$$
Similarly by moving $x_\mathrm{in}$ to $\bar{A}$, we get
$$\begin{aligned}&NCut(A-x_{in},\bar{A}+x_{in})\\ &\quad=\, {} \frac{\left( cut((A,\bar{A})-dout_x-2p\right) *vol(V)}{(vol(A)-2din_x-dout_x)(vol(\bar{A})+2din_x+dout_x)}. \end{aligned}$$
Since the $NCut(A, \bar{A})$ is the minimum,
$$\begin{aligned} &NCut(A, \bar{A})\le NCut(A+x_{out},\bar{A}-x_{out})\nonumber \\&\quad{\Rightarrow \frac{cut(A, \bar{A})}{vol(A)vol(\bar{A})}\,\le\, \frac{cut((A,\bar{A})-din_x-2q}{(vol(A)+din_x+2dout_x)(vol(\bar{A})-din_x-2dout_x)}} \end{aligned}$$
(3)

$$\begin{aligned}&\text {and}\quad NCut(A, \bar{A})\le NCut(A-x_\mathrm{in},\bar{A}+x_\mathrm{in})\nonumber \\&{\Rightarrow \frac{cut(A, \bar{A})}{vol(A)vol(\bar{A})}\le \frac{cut((A,\bar{A})-dout_x-2p}{(vol(A)-2din_x-dout_x)(vol(\bar{A})+2din_x+dout_x)}.} \end{aligned}$$
(4)
Let $cut(B,\bar{B})$ be any cut where every pair of in and out versions are in the same group. Then
$$\begin{aligned} cut(B,\bar{B})\le \left\{ \begin{aligned} \sum _{i\in B} (din_i+dout_i)\\ \sum _{i\in \bar{B}} (din_i+dout_i) \end{aligned}\right. \end{aligned}$$

$$\begin{aligned} \text{and}\quad vol(B)=\sum _{i\in B} (t_{iin}+t_{iout})=\sum _{i\in B} 3*(din_i+dout_i), \end{aligned}$$

$$\begin{aligned} \quad \quad vol({B})=\sum _{i\in {B}} (t_{iin}+t_{iout})=\sum _{i\in {B}} 3*(din_i+dout_i). \end{aligned}$$

$$\begin{aligned} \text {Thus,}\quad NCut(B, \bar{B})=\frac{cut(B, \bar{B})}{vol(B)}+\frac{cut(B, \bar{B})}{vol(\bar{B})}\le \dfrac{2}{3}. \end{aligned}$$
Since the $NCut(A, \bar{A})$ is the minimum,
$$\begin{aligned} NCut(A, \bar{A})=\frac{cut(A, \bar{A})*vol(V)}{vol(A)vol(\bar{A})} \le \frac{2}{3} \end{aligned}$$
(5)
From the matrix we have:
$$\begin{aligned}&vol(V)\ge 6*(din_x +dout_x)\nonumber \\&\quad\Rightarrow cut(A, \bar{A})(din_x +dout_x)\le vol(A)vol(\bar{A}). \end{aligned}$$
(6)
(3) implies
$$\begin{aligned}&\frac{cut(A, \bar{A})}{vol(A)vol(\bar{A})}\le \frac{cut((A,\bar{A})-din_x}{(vol(A)+din_x+2dout_x)(vol(\bar{A})-din_x-2d out_x)},\\&\implies {din_x vol(A)vol(\bar{A})}\\&{\le cut(A, \bar{A})\left( vol(A)(din_x+2dout_x)- vol(\bar{A})(din_x+2dout_x) +(din_x+2dout_x)^2\right) .} \end{aligned}$$
Similarly, (4) implies
$$\begin{aligned} &{ dout_x vol(A)vol(\bar{A})}\\&\quad{\le cut(A, \bar{A})\left( -vol(A)(2din_x+dout_x)+vol(\bar{A})(2din_x+dout_x) +\,(2din_x+dout_x)^2\right) .} \end{aligned}$$
By adding above two equations together, we get
$$\begin{aligned} &{(din_x+dout_x) vol(A)vol(\bar{A})}\\&\quad{\le cut(A, \bar{A})\,\left( vol(A)\,(-din_x\,+\,dout_x)+vol(\bar{A})\,(din_x\,-\,dout_x) +\,5din_x^2+\,5dout_x^2+\,8din_x dout_x\right) .} \end{aligned}$$
Combining with (6), we get
$$\begin{aligned} 4din_x^2\,+\, & {} 4dout_x^2+10din_x dout_x \nonumber \\\le & {} vol(A)(-din_x+dout_x)+vol(\bar{A})(din_x-dout_x). \end{aligned}$$
(7)
In the inequality (3), the numerator on the left is greater than or equal to the numerator on the right. Thus, the denominator on the left has to be greater than or equal to the denominator on the right:
$$\begin{aligned} vol(A)vol(\bar{A})\ge (vol(A)+din_x+2dout_x)(vol(\bar{A})-din_x-2dout_x). \end{aligned}$$
If $vol(A)\le vol(\bar{A})$, then
$$\begin{aligned} \Rightarrow vol(\bar{A})\le vol(A)+din_x+2dout_x. \end{aligned}$$
By applying this to (7), we get
$$\begin{aligned} &4din_x^2+4dout_x^2+10din_x dout_x\\&\quad\le vol(A)\,(-din_x+dout_x)+(vol(A)+din_x+2dout_x)(din_x-dout_x)\\&\quad\Rightarrow 3din_x^2+6dout_x^2+9din_x dout_x \le 0. \end{aligned}$$
Since $din_x \ge 0, din_x \ge 0$ and $din_x +dout_x > 0$ for a connected graph, the above inequality does not hold. Similarly we can prove that, if $vol(A)\ge vol(\bar{A})$, the inequality (7) does not hold either. Thus, the assumption cannot be true. In other words, there does not exist a minimum NCut which separates at least one pair of nodes $x_\mathrm{in}$ and $x_{out}$ into different groups in a two-way NCut. The proof generalizes to multiple clusters as before. Therefore, NCut clustering will not separate a pair of nodes $x_\mathrm{in}$ and $x_{out}$ into two different clusters using our approach.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zheng, Q., Skillicorn, D.B. Spectral embedding of directed networks. Soc. Netw. Anal. Min. 6, 76 (2016). https://doi.org/10.1007/s13278-016-0387-0

Download citation

Received: 18 December 2015
Accepted: 25 August 2016
Published: 10 September 2016
DOI: https://doi.org/10.1007/s13278-016-0387-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spectral embedding of directed networks

Abstract

Access this article

Similar content being viewed by others