Weighted line graphs for overlapping community discovery

Yoshida, Tetsuya

doi:10.1007/s13278-013-0104-1

Weighted line graphs for overlapping community discovery

Original Article
Published: 21 March 2013

Volume 3, pages 1001–1013, (2013)
Cite this article

Social Network Analysis and Mining Aims and scope Submit manuscript

Tetsuya Yoshida¹

349 Accesses
9 Citations
Explore all metrics

Abstract

We propose weighted line graphs for overlapping community discovery where a node in a network can be assigned to more than one community. For undirected connected networks without self-loops, we propose weighted line graphs by: (1) defining weights of a line graph based on the weights in the original network, and (2) removing self-loops in weighted line graphs, while sustaining their properties. By applying some off-the-shelf node partitioning method to the transformed graph, community labels of adjacent links are assigned to each node in the original network. Experiments are conducted over both synthetic and real-world networks, and the results indicate that the proposed approach can improve the quality of discovered overlapping communities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel intelligent Fuzzy-AHP based evolutionary algorithm for detecting communities in complex networks

Article 29 February 2024

A comprehensive survey on community detection methods and applications in complex information networks

Article 18 April 2024

Clustering graph data: the roadmap to spectral techniques

Article Open access 22 January 2024

Notes

We also refer to a network as a graph, a node as a vertex, and a link as an edge.
A k-clique-community is defined as a union of all k-cliques that can be reached from each other through a series of adjacent k-cliques (which share k-1 nodes).
The ith diagonal element in D is set to the ith element of ${\bf k}$.
http://www-personal.umich.edu/~mejn/netdata/ (celegans was converted into undirected network in the experiment).
Pascal: http://analytics.ijs.si/~blazf/pvc/data.html; IV’04: http://iv.slis.indiana.edu/ref/iv04contest/.
The initial degree in BA model was set to 20.
The initial degree in BA model was set to 2 (1/10 of 20 in Step 1.).

References

Ahn YY, Bagrow JP, Lehmann S (2010) Link communities reveal multiscale complexity in networks. Nat Biotechnol 466:761–764
Article Google Scholar
Barabási AL, Albert R (1999) Emergence of scaling in random networks. Sci Agric 286:509–512
Article MathSciNet Google Scholar
Bhattacharyya P, Garg A, Wu SF (2011) Analysis of user keyword similarity in online social networks. Social Netw Anal Mining 1(3):143–158
Article Google Scholar
Clauset A, Newman MEJ, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70(6):066,111
Article Google Scholar
Cormen TH, Leiserson CE, Rivest RL, Stein C (2001) Introduction to algorithms. MIT Press, Cambridge
Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc 39(2):1–38
MathSciNet MATH Google Scholar
Diestel R (2006) Graph theory. Springer, Berlin
Evans T, Lambiotte R (2009) Line graphs, link partitions, and overlapping communities. Phys Rev E 80(1), 016,105:1–8
Google Scholar
Evans T, Lambiotte R (2010) Line graphs of weighted networks for overlapping communities. Eur Phys J B 77:265–272
Article Google Scholar
Gregory S (2009) Finding overlapping communities using disjoint community detection algorithms. In: Complex networks. Springer, Berlin, pp 47–61
Gregory S (2011) Fuzzy overlapping communities in networks. J Stat Mech Theor Exp P02017
Hanneman RA, Shelton CR (2011) Applying modality and equivalence concepts to pattern finding in social process-produced data. Social Netw Anal Mining 1(1):59–72
Article Google Scholar
Harville DA (2008) Matrix algebra from a Statistican’s perspective. Springer, Berlin
Kempe D, Kleinberg J, Tardos E (2003) Maximizing the spread of influence through a social network. In: Proceedings of KDD’03, pp 137–146
Mika P (2007) Social networks and the semantic web. Springer, Berlin
Müller M (2007) Information retrieval for music and motion. Springer, Berlin
Nepusz T, Petróczi A, Négyessy L, Bazsó F (2008) Fuzzy communities and the concept of bridgeness in complex networks. Phys Rev E 77:016,107
Google Scholar
Newman M (2006) Finding community structure using the eigenvectors of matrices. Phys Rev E 76(3):036,104
Google Scholar
Newman M (2010) Networks: an introduction. Oxford University Press, Oxford
Palla G, Derényi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nat Biotechnol 435:814–818
Article Google Scholar
Pons P, Latapy M (2006) Computing communities in large networks using random walks. J Graph Algorithms 10(2):191–218
Article MathSciNet MATH Google Scholar
Raghavan U, Albert R, Kumara S (2007) Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E 76:036,106
Google Scholar
Scott J (2011) Social network analysis: developments, advances, and prospects. Social Netw Anal Mining 1(1):21–26
Article Google Scholar
Shen HW, Chenga XQ, Guo JF (2011) Quantifying and identifying the overlapping community structure in networks. J Stat Mech Theor Exp P07042
von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416
Article MathSciNet Google Scholar
Watts DJ (2003) Small worlds: the dynamics of networks between order and randomness. Princeton University Press, Princeton
Watts DJ (2004) Six degrees: the science of a connected age. W W Norton & Co Inc, New York
Whitney H (1932) Congruent graphs and the connectivity of graphs. Am J Math 54:150–168
Article MathSciNet Google Scholar
Yoshida T (2012) Overlapping community discovery via weighted line graphs of of networks. In: Proceedings of PRICAI’12 (LNAI 7458), pp 895–898
Yoshida T (2013) Toward finding hidden communities based on user profile. J Intell Inf Syst (in press)
Zhang S, Wang FS, Zhang XS (2007) Identification of overalpping community structure in complex networks using fuzzy c-means clustering. Phys A 388(8):483–490
Article Google Scholar

Download references

Acknowledgments

We express sincere gratitude to the reviewers for their careful reading of the manuscript and for providing valuable suggestions to improve the paper. This work is partially supported by the grant-in-aid for scientific research (No. 24300049) funded by MEXT, Japan, and the Murata Science Foundation.

Author information

Authors and Affiliations

Graduate School of Information Science and Technology, Hokkaido University, N-14 W-9, Sapporo, 060-0814, Japan
Tetsuya Yoshida

Authors

Tetsuya Yoshida
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tetsuya Yoshida.

Appendices

Appendix 1: Proof of Theorem 2

Theorem 2 can be formalized in terms of the adjacency matrices of transformed networks based on the following properties:

$$ {\bf 1}_{m}^{\rm T}{\bf C}=({\bf k} - {\bf 1}_{n})^{\rm T}{\bf B}, \quad {\bf 1}_{m}^{\rm T}\tilde{\bf C}=(\tilde{\bf k} - {\bf 1}_{n})^{\rm T}\tilde{\bf B} $$

(25)

$$ {\bf 1}_{m}^{\rm T}{\bf E}=2 {\bf 1}_m^{\rm T}, \quad {\bf 1}_{m}^{\rm T}\tilde{\bf E}=2 {\bf w}^{\rm T} $$

(26)

$$ {\bf 1}_{m}^{\rm T}{\bf E}_{1}=2 {\bf 1}_m^{\rm T}, \quad {\bf 1}_{m}^{\rm T}\tilde{\bf E}_{1}=2 {\bf w}^{\rm T}. $$

(27)

Proof

$$ {\bf 1}_{m}^{\rm T}{\bf C}={\bf 1}_{m}^{\rm T}({\bf B}^{\rm T}{\bf B} - 2{\bf I}_{m})={\bf k}^{\rm T}{\bf B} - 2{\bf 1}_{m}^{\rm T}\\={\bf k}^{\rm T}{\bf B} - {\bf 1}_{n}^{\rm T}{\bf B}=({\bf k} - {\bf 1}_{n})^{\rm T}{\bf B}\\ $$

From Eq. (9) in Proposition 1, 1 ^T_m B ^T = k ^T holds. Furthermore, from Eq. (10), 21 ^T_m = 1 ^T_n B holds. Similarly, by utilizing the right-hand side of Eqs. (9) and (10) in Proposition 1, we can prove Eq. (25).

On the other hand, based on Proposition 1, we can prove Eq. (26) and Eq. (27) as follows:

$$ \begin{aligned} {\bf 1}_{m}^{\rm T}{\bf E} &= {\bf 1}_{m}^{\rm T}{\bf B}^{\rm T}{\bf D}^{-1}{\bf B}={\bf k}^{\rm T}{\bf D}^{-1}{\bf B}={\bf 1}_{n}^{\rm T}{\bf B}=2{\bf 1}_{m}\\ {\bf 1}_{m}^{\rm T}{\bf E}_{1} &= {\bf 1}_{m}^{\rm T}{\bf B}^{\rm T}{\bf D}^{-1}{\bf A}{\bf D}^{-1}{\bf B}={\bf k}^{\rm T}{\bf D}^{-1}{\bf A}{\bf D}^{-1}{\bf B}\\ &= {\bf 1}_{n}^{\rm T}{\bf A}{\bf D}^{-1}{\bf B}={\bf k}^{\rm T}{\bf D}^{-1}{\bf B}={\bf 1}_{n}^{\rm T}{\bf B}=2{\bf 1}_{m} \end{aligned} $$

Similarly, by utilizing the right-hand side in Proposition 1, we can prove the right-hand-side of Eqs. (26) and (27).

From the left hand side of Eqs. (26) and (27), we can see that 1 ^T_n A 1 _n = 1 ^T_m E 1 _m = 1 ^T_m E ₁ 1 _m = 2m. Similarly, from the right hand side of Eqs. (26) and (27), ${\bf 1}_{n}^{\rm T}\tilde{\bf A} {\bf 1}_{n}$ = ${\bf 1}_{m}^{\rm T}\tilde{\bf E} {\bf 1}_{m}$ = ${\bf 1}_{m}^{\rm T}\tilde{\bf E}_{1} {\bf 1}_{m}$ = $\sum_{i,j} \tilde{\bf A}_{ij}$. Thus, by defining the adjacency matrix of transformed network with Eq. (12) or Eq. (13), the sum of the weights in the original network is preserved in the transformed network. □

Appendix 2: Proof of Theorem 3

The properties (1) to (3) in Theorem 3 can be formalized in terms of the corresponding matrix ${\bf N}$ in Eq. (18) as:

$$ {\rm diag}({\bf N})={\bf 0}_{\ell} $$

(28)

$$ {\bf N}^{\rm T}={\bf N} $$

(29)

$$ {\bf N}{\bf 1}_{\ell}={\bf M}{\bf 1}_{\ell} $$

(30)

We prove that the above properties hold for the matrix N for a symmetric square matrix M with non-negative real values.

Proof

From Eq. (16), diagonal elements of M _wo are all zeros. Since D ^1/2_M diag(m _wo)^−1/2 is a diagonal matrix, scaling the rows and columns of M _wo by multiplying it with D ^1/2_M diag(m _wo)^−1/2 from both left and right (with its transposition) does not change its diagonal elements. Thus, since the diagonals in D ^1/2_M diag(m _wo)^−1/2 M _wo diag(m _wo)^−1/2 D ^1/2_M are also zeros, Eq. (28) holds.

Since M is symmetric, M _wo in Eq. (16) is also symmetric. Multiplying it by the diagonal matrix D ^1/2_M diag(m _wo)^−1/2 from both left and right (with its transposition) is invariant to the symmetric property of a matrix. Thus, since both the first and second terms in Eq. (18) are symmetric matrices, Eq. (29) holds.

For a diagonal matrix D ${\in \mathbb{R}^{\ell \times \ell}}$, D ${\bf 1}_{\ell}\,=\,{\bf d}\, =\,{\bf 1}_{\ell}\, \odot\,{\bf d}$, where the vector d is the row sum of D, and $\odot$ stands for the Hadamard product (element-wise product) (Harville 2008). Thus, since diag(m _wo)^−1/2 D ^1/2_M is a diagonal matrix, the following holds:

$$ \begin{aligned} & {\bf D}_{M}^{1/2}{\rm diag}({\bf m}_{wo})^{-1/2} {\bf M}_{wo} {\rm diag}({\bf m}_{wo})^{-1/2} {\bf D}_{M}^{1/2} {\bf 1}_{\ell} \\ &\quad = {\bf D}_{M}^{1/2}{\rm diag}({\bf m}_{wo})^{-1/2} {\bf M}_{wo} {\bf 1}_{\ell} \odot ( {\rm diag}({\bf m}_{wo})^{-1/2} {\bf D}_{M}^{1/2} ) \end{aligned} $$

(31)

$$={\bf D}_{M}^{1/2} {\bf m}_{wo}^{1/2} \odot ( {\rm diag}({\bf m}_{wo})^{-1/2} {\bf D}_{M}^{1/2} ) $$

(32)

$$={\bf D}_{M}^{1/2} {\bf 1}_{\ell} \odot {\bf D}_{M}^{1/2} $$

(33)

$$={\bf D}_{M}^{1/2} {\bf D}_{M}^{1/2} {\bf 1}_{\ell} $$

(34)

$$={\bf d}_{D_{M}} $$

(35)

where ${\bf d}_{D_{M}} $ is the row sum of D _M in Eq. (15).

Equation (31) follows as above. Since m _wo is the row sum of M _wo in Eq. (17), diag(m _wo)^−1/2 m _wo = m ^1/2_wo in Eq. (32), and Eq. (33) follows based on the definition of Hadamard product. Finally, based on the above property of diagonal matrices and Hadamard product, Eq. (30) follows.

Furthermore, for the first term in Eq. (18), M _wo 1 _ℓ = M 1 _ℓ − D _M 1 _ℓ = M 1 _ℓ − ${\bf d}_{D_{M}}. $ Thus, by summing M 1 _ℓ − ${\bf d}_{D_{M}} \hbox{ and } {\bf d}_{D_{M}} $ , Eq. (30) follows.

Appendix 3: Proof of Corollary 4

Corollary 4 can be formalized in terms of the following properties of the adjacency matrices:

$$ {\bf 1}_{m}^{\rm T}{\bf F}={\bf 1}_{m}^{\rm T}{\bf E}=2 {\bf 1}_m^{\rm T}, \quad {\bf 1}_{m}^{\rm T}\tilde{\bf F}={\bf 1}_{m}^{\rm T}\tilde{\bf E}=2 {\bf w}^{\rm T} $$

(36)

$$ {\bf 1}_{m}^{\rm T}{\bf F}_{1}={\bf 1}_{m}^{\rm T}{\bf E}_{1}=2 {\bf 1}_m^{\rm T}, \quad {\bf 1}_{m}^{\rm T}\tilde{\bf F}_{1}={\bf 1}_{m}^{\rm T}\tilde{\bf E}_{1}=2 {\bf w}^{\rm T} $$

(37)

Proof Since the adjacency matrices $\tilde{\bf E}$ and $\tilde{\bf E}_{1}$ satisfy the condition in Theorem 3, by substituting these matrices as M in Eq. (18), we can construct the corresponding matrices $\tilde{\bf F}$ and $\tilde{\bf F}_{1}$. From Eq. (30), ${\bf 1}_{m}^{\rm T}\tilde{\bf F}={\bf 1}_{m}^{\rm T}\tilde{\bf E} $ and ${\bf 1}_{m}^{\rm T}\tilde{\bf F}_{1}={\bf 1}_{m}^{\rm T}\tilde{\bf E}_{1} $ hold. Since 1 ^T_m E = 21 ^T_m and 1 ^T_m E ₁ = 2 1 ^T_m hold from the right hand side of Eqs. (26) and (27), the right hand side of Eq. (36) and that of Eq. (37) hold. Based on a similarly argument, the left hand side of Eq. (36) and that of Eq. (37) hold.

As shown in Theorem 2, the properties in Eqs. (36) and (37) indicate that the sum of the weights in the original network is preserved in $\tilde{\bf F}$ and $\tilde{\bf F}_{1}$ (also in F and F ₁).□

Appendix 4: Complexity analysis

Suppose a simple connected network G contains n nodes and m links, and let $\langle k \rangle$ be the average degree in G. Basically, the time complexity of constructing $\tilde{\bf E}$ and $\tilde{\bf E}_{1}$ from G based on the weighted incidence matrix $\tilde{\bf B}$ is the same with that of E and E ₁ in Evans and Lambiotte (2009). This is because both approaches define the adjacency matrices based on a similar matrix calculation.

Since each row of $\tilde{\bf B}^{\rm T}$ contains two non-zero elements and $\tilde{\bf D}^{-1}$ is a diagonal matrix, multiplication of $\tilde{\bf B}^{\rm T}$ and $\tilde{\bf D}^{-1}$ can be done in O(m), and each row of $\tilde{\bf B}^{\rm T}\tilde{\bf D}^{-1}$ contains two non-zero elements as well.

Since $\tilde{\bf E} $ in Eq. (12) is based a link–node–link random walk on G, it can be calculated by considering only $2 \langle k \rangle$ links for each link in G. Since the calculation of $ (\tilde{\bf B}^{\rm T} \tilde{\bf D}^{-1}) \tilde{\bf B} $ can be done in $O(m \langle k \rangle)$, the time complexity of constructing $\tilde{\bf E}$ is $O(m \langle k \rangle)$. Similarly, since $ \tilde{\bf E}_{1} $ in Eq. (13) is based on a link–link–link random walk on G, $ \tilde{\bf E}_{1} $ can be calculated by considering only $2 \langle k \rangle^{2}$ links for each link in G. The calculation of $ (\tilde{\bf B}^{\rm T} \tilde{\bf D}^{-1}){\bf A} $ can be done in $O(m \langle k \rangle)$, and that of $ (\tilde{\bf B}^{\rm T} \tilde{\bf D}^{-1}{\bf A}) (\tilde{\bf D}^{-1} \tilde{\bf B}) $ in $O(m \langle k \rangle^{2})$. Thus, the time complexity of constructing $\tilde{\bf E}_{1}$ is $O(m \langle k \rangle^{2}.$

As for the removal of self-loops in Sect. 4.2, let $\langle k_{M} \rangle$ be the average degree of a network with an adjacency matrix M ${\in \mathbb{R}_{+}^{\ell \times \ell}}$ (e.g., $\langle k_{M} \rangle$ is $O(\langle k\rangle)$ in $\tilde{\bf E}$, and $O(\langle k\rangle^{2})$ in $\tilde{\bf E}_{1}$). The calculation of M _wo in Eq. (16) can be done in O(ℓ), and that of m _wo in Eq. (17) in $O(\ell \langle k_{M} \rangle)$. Since both D ^1/2_M and diag(m _wo)^−1/2 are diagonal matrices, the calculation of D ^1/2_M diag(m _wo)^−1/2 can be done in O(ℓ). The scaling of M _wo by multiplying D ^1/2_M diag(m _wo)^−1/2 from left and right needs to be conducted only for $O(\ell \langle k_{M} \rangle)$ non-zero elements, and the addition as well. Thus, the time complexity of constructing N in Eq. (18) is $O(\ell \langle k_{M} \rangle)$. By substituting m into ℓ, the time complexity of constructing F (and $\tilde{\bf F}$) is $O(m \langle k \rangle)$, and that of F ₁ (and $\tilde{\bf F}_{1}$) is $O(m \langle k \rangle^{2}).$

On the other hand, since it is necessary to store the adjacency matrices of line graphs in memory, the space complexity is $O(m \langle k_{M} \rangle),$ where $\langle k_{M} \rangle$ is the average degree in the constructed line graph. In our approach, allocation of adjacency matrices in memory can become a problem for large networks.

Appendix 5: Construction of synthetic networks

Let |C| be the number of communities, n _c for the number of nodes in a community (a network has n _c × |C| nodes). Let w _u stand for the link weight in the overall network, and r _m > 1 for the weight ratio of the links within communities.

A synthetic network was generated as follows:

Step 1:: The overall network with n _c × |C| nodes was created with the Barabási–Albert (BA) model. The constructed overall network was rather dense,^{Footnote 6} and all the link weights were set to small value w _u.
Step 2:: A network of n _c nodes was created for each community with the BA model. In this case, the constructed communities were rather sparse, ^{Footnote 7} and all the link weights in the communities were set to w _u × r _m.
Step 3:: The communities constructed at Step 2 were embedded into the diagonal blocks of the adjacency matrix of the overall network at Step 1. Note that there was no overlap between the embedded diagonal blocks (i.e., embedded communities).
Step 4:: For each node i (with degree k _i) in the overall network, another community was randomly selected for which the node did not belong to. Then, up to k _i nodes were randomly selected in the selected community. Finally, the node i was connected to the selected nodes with link weight w _u × r _m as in Step 2.

.

The overall dense network with relatively small weights is constructed at Step 1. Sparse communities with large weights at Step 2 are embedded into the overall network at Step 3. In addition, since each node is connected to other nodes in another community at Step 4, the constructed network has an overlapping community structure. In the experiments, the parameters were set as w _u = 1 and r _m = 100 so that nodes in each community were tightly connected with large weights.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yoshida, T. Weighted line graphs for overlapping community discovery. Soc. Netw. Anal. Min. 3, 1001–1013 (2013). https://doi.org/10.1007/s13278-013-0104-1

Download citation

Received: 02 August 2012
Revised: 16 December 2012
Accepted: 11 February 2013
Published: 21 March 2013
Issue Date: December 2013
DOI: https://doi.org/10.1007/s13278-013-0104-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Weighted line graphs for overlapping community discovery

Abstract

Access this article

Similar content being viewed by others

A novel intelligent Fuzzy-AHP based evolutionary algorithm for detecting communities in complex networks

A comprehensive survey on community detection methods and applications in complex information networks

Clustering graph data: the roadmap to spectral techniques

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Proof of Theorem 2

Proof

Appendix 2: Proof of Theorem 3

Proof

Appendix 3: Proof of Corollary 4

Appendix 4: Complexity analysis

Appendix 5: Construction of synthetic networks

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Weighted line graphs for overlapping community discovery

Abstract

Access this article

Similar content being viewed by others

A novel intelligent Fuzzy-AHP based evolutionary algorithm for detecting communities in complex networks

A comprehensive survey on community detection methods and applications in complex information networks

Clustering graph data: the roadmap to spectral techniques

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Proof of Theorem 2

Proof

Appendix 2: Proof of Theorem 3

Proof

Appendix 3: Proof of Corollary 4

Appendix 4: Complexity analysis

Appendix 5: Construction of synthetic networks

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation