Effective and efficient attributed community search

Fang, Yixiang; Cheng, Reynold; Chen, Yankai; Luo, Siqiang; Hu, Jiafeng

doi:10.1007/s00778-017-0482-5

Effective and efficient attributed community search

Regular Paper
Published: 21 September 2017

Volume 26, pages 803–828, (2017)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Yixiang Fang ORCID: orcid.org/0000-0002-5047-8593¹,
Reynold Cheng¹,
Yankai Chen¹,
Siqiang Luo¹ &
…
Jiafeng Hu¹

1593 Accesses
75 Citations
Explore all metrics

Abstract

Given a graph G and a vertex $q \in G$, the community search query returns a subgraph of G that contains vertices related to q. Communities, which are prevalent in attributed graphs such as social networks and knowledge bases, can be used in emerging applications such as product advertisement and setting up of social events. In this paper, we investigate the attributed community query (or ACQ), which returns an attributed community (AC) for an attributed graph. The AC is a subgraph of G, which satisfies both structure cohesiveness (i.e., its vertices are tightly connected) and keyword cohesiveness (i.e., its vertices share common keywords). The AC enables a better understanding of how and why a community is formed (e.g., members of an AC have a common interest in music, because they all have the same keyword “music”). An AC can be “personalized”; for example, an ACQ user may specify that an AC returned should be related to some specific keywords like “research” and “sports”. To enable efficient AC search, we develop the CL-tree index structure and three algorithms based on it. We further propose efficient algorithms for maintaining the index on dynamic graphs. Moreover, we study two problems that are related to the ACQ problem. We evaluate our solutions on six large graphs. Our results show that ACQ is more effective and efficient than existing community retrieval approaches. Moreover, an AC contains more precise and personalized information than that of existing community search and detection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel intelligent Fuzzy-AHP based evolutionary algorithm for detecting communities in complex networks

Article 29 February 2024

Identifying vital spreaders in complex networks based on the interpretative structure model and improved Kshell

Article 14 April 2024

A retrospective of knowledge graphs

Article 26 September 2016

Notes

URL of the SDSS project: http://www.sdss.org.
In practice, the query user can be alerted by the system when there is no sharing among the vertices.
We use “node” to mean “CL-tree node” in this paper.
https://www.flickr.com/.
http://dblp.uni-trier.de/xml/.
http://www.kddcup2012.org/c/kddcup2012-track1.
http://dbpedia.org/datasets.

References

Bahmani, B., Kumar, R., Mahdian, M., Upfal, E.: Pagerank on an evolving graph. In: KDD, pp. 24–32 (2012)
Barbieri, N., Bonchi, F., Galimberti, E., Gullo, F.: Efficient and effective community search. DMKD 29(5), 1406–1433 (2015)
Google Scholar
Batagelj, V., Zaversnik, M.: An o(m) algorithm for cores decomposition of networks. (2003). Preprint. arXiv:cs/0310049
Cui, W., Xiao, Y., Wang, H., Lu, Y., Wang W.: Online search of overlapping communities. In: SIGMOD, pp. 277–288 (2013)
Cui, W., Xiao, Y., Wang, H., Wang, W.: Local search of communities in large graphs. In: SIGMOD, pp. 991–1002 (2014)
Ding, B., Yu, J.X., Wang, S., Qin, L., Zhang, X., Lin, X.: Finding top-k min-cost connected trees in databases. In: ICDE (2007)
Dorogovtsev, S.N., Goltsev, A.V., Mendes, J.F.F.: K-core organization of complex networks. Phys. Rev. Lett. 96(4), 040601 (2006)
Article MATH Google Scholar
Fan, W., Li, J., Ma, S., Tang, N., Wu, Y., Wu, Y.: Graph pattern matching: from intractable to polynomial time. Proc. VLDB Endow. 3(1–2), 264–275 (2010)
Article Google Scholar
Fang, Y., Cheng, R., Luo, S., Hu, J.: Effective community search for large attributed graphs. PVLDB 9(12), 1233–1244 (2016)
Google Scholar
Fang, Y., Cheng, R., Luo, S., Hu, J., Huang, K.: C-explorer: browsing communities in large graphs. PVLDB 10(12), 1885–1888 (2017)
Google Scholar
Fang, Y., Cheng, R., Li, X., Luo, S., Hu, J., Hu, J.: Effective community search over large spatial graphs. PVLDB 10(6), 709–720 (2017)
Fang, Y., Zhang, H., Ye, Y., Li, X.: Detecting hot topics from twitter: a multiview approach. J. Inf. Sci. 40(5), 578–593 (2014)
Article Google Scholar
Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3), 75–174 (2010)
Article MathSciNet Google Scholar
Giatsidis, C., Thilikos, D.M., Vazirgiannis, M.: D-cores: measuring collaboration of directed graphs based on degeneracy. In: ICDM, pp. 201–210. IEEE (2011)
Han, J., Kamber, M., Pei. J.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD (2000)
He, H., Wang, H., Yang, J., Yu, P.S.: Blinks: ranked keyword searches on graphs. In: SIGMOD (2007)
https://en.wikipedia.org/wiki/Disjoint-set_data_structure
Hu, J., Wu, X., Cheng, R., Luo, S., Fang, Y.: Querying minimal steiner maximum-connected subgraphs in large graphs. In: CIKM, pp. 1241–1250 (2016)
Hu, J., Wu, X., Cheng, R., Luo, S., Fang, Y.: On minimal steiner maximum-connected subgraph queries. In: TKDE (2017)
Huang, X., Cheng, H., Qin, L., Tian, W., Yu, J.X.: Querying k-truss community in large and dynamic graphs. In: SIGMOD (2014)
Huang, X., Lakshmanan, L.V., Yu, J.X., Cheng, H.: Approximate closest community search in networks. Proc. VLDB Endow. 9(4), 276–287 (2015)
Article Google Scholar
Kacholia, V., et al.: Bidirectional expansion for keyword search on graph databases. In: VLDB (2005)
Kargar, M., An, A.: Keyword search in graphs: finding r-cliques. PVLDB 4(10), 681–692 (2011)
Google Scholar
Li, R.-H., Qin, L., Yu, J.X., Mao, R.: Influential community search in large networks. In: PVLDB (2015)
Li, R.-H., Yu, J.X., Mao, R.: Efficient core maintenance in large dynamic graphs. TKDE 26, 2453–2465 (2014)
Google Scholar
Liu, Y., Niculescu-Mizil, A., Gryc, W.: Topic-link lda: joint models of topic and author community. In: ICML (2009)
Mislove, A.: Online social networks: measurement, analysis, and applications to distributed information systems. Ph.D. thesis, Rice University, Department of Computer Science (2009)
Mislove, A., Koppula, H.S., Gummadi, K.P., Druschel, P., Bhattacharjee, B.: Growth of the flickr social network. In: Proceedings of the 1st ACM SIGCOMM Workshop on Social Networks (WOSN’08) (2008)
Nallapati, R.M., Ahmed, A., Xing, E.P., Cohen, W.W.: Joint latent topic models for text and citations. In: KDD (2008)
Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)
Article Google Scholar
Qi, G.-J., Aggarwal, C.C., Huang, T.S.: Online community detection in social sensing. In: WSDM, pp. 617–626. ACM (2013)
Ren, C., Lo, E., Kao, B., Zhu, X., Cheng, R.: On querying historical evolving graph sequences. VLDB 4(11), 726–737 (2011)
Google Scholar
Ruan, Y., Fuhry, D., Parthasarathy, S.: Efficient community detection in large networks using content and links. In: WWW (2013)
Sachan, M., et al.: Using content and interactions for discovering communities in social networks. In: WWW (2012)
Sarıyüce, A.E., Gedik, B., Jacques-Silva, G., Wu, K.-L., Çatalyürek, Ü.V.: Incremental k-core decomposition: algorithms and evaluation. VLDB J. 25(3), 425–447 (2016)
Article Google Scholar
Seidman, S.B.: Network structure and minimum degree. Soc. Netw. 5(3), 269–287 (1983)
Article MathSciNet Google Scholar
Sozio, M., Gionis, A.: The community-search problem and how to plan a successful cocktail party. In: KDD (2010)
Subbian, K., Aggarwal, C.C., Srivastava, J., Yu, P.S.: Community detection with prior knowledge. In: SDM (2013)
Thomee, B., et al.: The new data and new challenges in multimedia research. (2015). arXiv:1503.01817
Tong, H., Faloutsos, C., Gallagher, B., Eliassi-Rad, T.: Fast best-effort pattern matching in large attributed graphs. In: KDD (2007)
Xu, Z., Ke, Y., Wang, Y., Cheng, H., Cheng, J.: A model-based approach to attributed graph clustering. In: SIGMOD (2012)
Yang, J., McAuley, J., Leskovec, J.: Community detection in networks with node attributes. In: ICDM, pp. 1151–1156 (2013)
Yang, T., Jin, R., Chi, Y., Zhu, S.: Combining link and content for community detection: a discriminative approach. In: KDD (2009)
Yu, J.X., Qin, L., Chang, L.: Keyword search in databases. Synth. Lect. Data Manag. 1, 1–155 (2009)
Article MATH Google Scholar
Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. Proc. VLDB Endow. 2(1), 718–729 (2009)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, The University of Hong Kong, Pokfulam, Hong Kong
Yixiang Fang, Reynold Cheng, Yankai Chen, Siqiang Luo & Jiafeng Hu

Authors

Yixiang Fang
View author publications
You can also search for this author in PubMed Google Scholar
Reynold Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Yankai Chen
View author publications
You can also search for this author in PubMed Google Scholar
Siqiang Luo
View author publications
You can also search for this author in PubMed Google Scholar
Jiafeng Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yixiang Fang.

Appendices

A Proofs of lemmas

Lemma 1

(Anti-monotonicity) 1 Given a graph G, a vertex $q\in G$ and a set S of keywords, if there exists a subgraph $G_k[S]$, then there exists a subgraph $G_k[S']$ for any subset $S'\subseteq S$.

Proof

Based on the definition of $G_k[S]$, each vertex of $G_k[S]$ contains S. Consider a new keyword set $S'\subseteq S$. We can easily conclude that, each vertex of $G_k[S]$ contains $S'$ as well. Also, note that $q\in G_k[S]$. These two properties imply that there exists one subgraph of G, namely $G_k[S]$, with core number at least k, such that it contains q and every vertex of it contains keyword set $S'$. It follows that there exists such a subgraph with maximal size (i.e., $G_k[S']$).$\square $

Proposition 1

For any keyword set S, and vertex q, if $G_k[S]$ exists, then $G_k[S]\subseteq G_k[S']$ for any subset $S'\subseteq S$.

Proof

Since $G_k[S]$ contains vertex q and every vertex in $G_k[S]$ contains $S'$ (due to $S'\subseteq S$), then $G_k[S]\cup G_k[S']$ also contains vertex q and every vertex in it contains $S'$. In addition, the core numbers of $G_k[S]$ and $G_k[S']$ are at least k, it follows that the core number of $G_k[S]\cup G_k[S']$ is at least k. Based on the definition of $G_k[S']$, we have $G_k[S]\cup G_k[S']\subseteq G_k[S']$. It follows that $G_k[S]\subseteq G_k[S']$.$\square $

Lemma 2

Given two subgraphs $G_k[S_1]$ and $G_k[S_2]$ of a graph G, for a new keyword set $S'$ generated from $S_1$ and $S_2$ (i.e., $S'=S_1\cup S_2$), if $G_k[S']$ exists, then it must appear in a k-$\widehat{core}$ with core number at least

$$\begin{aligned} max\{core_G[G_k[S_1]], core_G[G_k[S_2]]\}. \end{aligned}$$

(5)

Proof

Since $S'$ is generated from $S_1$ and $S_2$, then $S_1\subseteq S'$ and $S_2 \subseteq S'$. Based on Proposition 1, we have $G_k[S']\subseteq G_k[S_1]$. With such a containment relationship, it follows that $min\{core_G[v]|$ $v\in G_k[S_1]\}\le min\{core_G[v]|v\in G_k[S']\}$. Hence, the core number of $G_k[S']$ is at least the core number of $G_k[S_1]$. Formally, $core_G[G_k[S_1]]$ $\le core_G[G_k[S']]$. Similarly, $core_G[G_k[S_2]]\le core_G[G_k[S']]$. It directly follows the lemma.$\square $

Lemma 3

Given a connected graph G(V, E) with $n=|V|$ and $m=|E|$, if $m - n < \frac{{{k^2} - k}}{2} - 1$, there is no k-$\widehat{core}$ in G.

Proof

From Definition 1, we can easily conclude that, for any specific k, a k-$\widehat{core}$ has at least $k+1$ vertices. Since each vertex in a specific k-$\widehat{core}$ has at least k edges, the minimum number of edges in a k-$\widehat{core}$ is $\frac{{(k + 1)k}}{2}$.

Consider a connected graph, which contains a k-$\widehat{core}$ and has the minimum number of edges, where the k-core contains only $k+1$ vertices and all the rest $n-(k+1)$ vertices are connected with this k-$\widehat{core}$. The total number of edges is

$$\begin{aligned} \frac{{(k + 1)k}}{2} + \left[ {n - (k + 1)} \right] = m \end{aligned}$$

(6)

By simple transformation, we can conclude that, if m – $n < \frac{{{k^2} - k}}{2} - 1$, there is no k-$\widehat{core}$ in G.$\square $

Lemma 4

Given two keyword sets $S_1$ and $S_2$, if $G_k[S_1]$ and $G_k[S_2]$ exist, we have

$$\begin{aligned} G_k[S_1\cup S_2] \subseteq G_k[S_1]\cap G_k[S_2]. \end{aligned}$$

(7)

Proof

Based on Proposition 1 and $S_1\subseteq {S_1} \cup {S_2}$, we have ${G_k}[{S_1} \cup {S_2}]\subseteq {G_k}[{S_1}]$. For the same reason, we have ${G_k}[{S_1} \cup {S_2}]\subseteq {G_k}[{S_2}]$. It directly follows the lemma.$\square $

Lemma 6

After inserting an edge between two vertices, the maximum number of disconnected k-$\widehat{core}$s which need to be merged is 2.

Proof

We prove the lemma by contradiction. Consider a k-core with 3 disconnected k-$\widehat{core}$s $G_1$, $G_2$, and $G_3$ and $u\in G_1$, $v\in G_2$, $w\in G_3$. Let (u, v) be the newly inserted edge that triggers merging $G_1$ and $G_2$. Suppose $G_3$ is also affected by the insertion that needs to be merged with $G_1$ and $G_2$. Then there must exist one connected path in the form (w, $\cdots $, u, $\cdots $, v). Since (u, v) is the only inserted edge, to enables the above path connected, we can claim that w can already reach to u or v in some paths before insert (u, v). That means $G_3$ is connected to $G_1$ or $G_2$ before the edge insertion and either case is contradictory to the assumption. Hence, the lemma holds.$\square $

Lemma 7

In the process of merging subtrees, the maximum number of nodes which need to be merged in each level is 2.

Proof

It can be proved in the similar way as that of Lemma 6.$\square $

B Basic solutions for ACQ

Algorithms 14 presents basic-g. The input of basic-g is a graph G, a query vertex q, an integer k, and a set S. It first initializes a set, $\varPsi $, of candidate keyword sets with each being a keyword of S (line 2). Then, it finds the k-$\widehat{core},\,{\mathcal C}_k$, containing q from the graph G. In the loop (lines 4–11), it first initializes an empty set $\varPhi $ (line 5) for collecting all the qualified keyword sets. Then for each $S'\in \varPsi $, it finds $G_k[S']$ from ${\mathcal C}_k$ by considering the keyword and degree constraints, and put it into $\varPhi $ if $G_k[S']$ exists (lines 6–8). After checking all the candidate keyword sets in $\varPsi $, if there are at least one qualified keyword sets in $\varPhi $, it generates a new set $\varPsi $ of candidate keyword sets by calling geneCand($\varPhi $) and continues to checking larger candidate keyword sets in next loop; otherwise, it stops and outputs the ACs (lines 9–11).

The other basic algorithm basic-w has the same steps of basic-g, except that for each candidate keyword set $S'$, it finds $G_k[S']$ from G, rather than ${\mathcal C}_k$. We skip the pseudocodes due to the space limitation.

C Basic algorithms for ACQ-A and ACQ-M

1. ACQ-A We show basic-g-v1 in Algorithm 15. The other algorithm basic-w-v1 has the same steps of basic-g-v1, except that it finds $G_k[S]$ from G, rather than ${\mathcal C}_k$.

2. ACQ-M We show basic-g-v2 in Algorithm 16. The other algorithm basic-w-v2 has the same steps of basic-g-v2, except that in line 4 of basic-g-v2, it uses basic-w.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fang, Y., Cheng, R., Chen, Y. et al. Effective and efficient attributed community search. The VLDB Journal 26, 803–828 (2017). https://doi.org/10.1007/s00778-017-0482-5

Download citation

Received: 25 March 2017
Revised: 19 August 2017
Accepted: 07 September 2017
Published: 21 September 2017
Issue Date: December 2017
DOI: https://doi.org/10.1007/s00778-017-0482-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effective and efficient attributed community search

Abstract

Access this article

Similar content being viewed by others

A novel intelligent Fuzzy-AHP based evolutionary algorithm for detecting communities in complex networks