ABSTRACT
Detecting local graph clusters is an important problem in big graph analysis. Given seed nodes in a graph, local clustering aims at finding subgraphs around the seed nodes, which consist of nodes highly relevant to the seed nodes. However, existing local clustering methods either allow only a single seed node, or assume all seed nodes are from the same cluster, which is not true in many real applications. Moreover, the assumption that all seed nodes are in a single cluster fails to use the crucial information of relations between seed nodes. In this paper, we propose a method to take advantage of such relationship. With prior knowledge of the community membership of the seed nodes, the method labels seed nodes in the same (different) community by the same (different) color. To further use this information, we introduce a color-based random walk mechanism, where colors are propagated from the seed nodes to every node in the graph. By the interaction of identical and distinct colors, we can enclose the supervision of seed nodes into the random walk process. We also propose a heuristic strategy to speed up the algorithm by more than 2 orders of magnitude. Experimental evaluations reveal that our clustering method outperforms state-of-the-art approaches by a large margin.
- Reid Andersen, Fan Chung, and Kevin Lang. 2006. Local graph partitioning using pagerank vectors. In FOCS. Google ScholarDigital Library
- Michel Benaïm 1997. Vertex-reinforced random walks and a conjecture of Pemantle. The Annals of Probability 25, 1 (1997), 361-392.Google ScholarCross Ref
- Yuchen Bian, Jingchao Ni, Wei Cheng, and Xiang Zhang. 2017. Many Heads are Better than One: Local Community Detection by the Multi-Walker Chain. In Data Mining (ICDM), 2017 IEEE International Conference on. IEEE, 21-30.Google ScholarCross Ref
- Olivier Chapelle, Bernhard Schölkopf, and Alexander Zien. 2006. Semi-Supervised Learning. Adaptive Computation and Machine Learning series. Google ScholarDigital Library
- Nicolas A Crossley, Andrea Mechelli, Petra E Ve´rtes, Toby T Winton-Brown, Ameera X Patel, Cedric E Ginestet, Philip McGuire, and Edward T Bullmore. 2013. Cognitive relevance of the community structure of the human brain functional coactivation network. Proceedings of the National Academy of Sciences 110, 28(2013), 11583-11588.Google ScholarCross Ref
- Wanyun Cui, Yanghua Xiao, Haixun Wang, and Wei Wang. 2014. Local search of communities in large graphs. In SIGMOD. Google ScholarDigital Library
- Roger A Horn, Roger A Horn, and Charles R Johnson. 1990. Matrix analysis. Cambridge university press. Google ScholarDigital Library
- Kyle Kloster and David F Gleich. 2014. Heat kernel based community detection. In SIGKDD. Google ScholarDigital Library
- Isabel M Kloumann and Jon M Kleinberg. 2014. Community membership identification from small seed sets. In KDD. Google ScholarDigital Library
- Andrea Lancichinetti, Santo Fortunato, and Filippo Radicchi. 2008. Benchmark graphs for testing community detection algorithms. Physical review E 78, 4 (2008), 046110.Google Scholar
- Rui Liu, Wei Cheng, Hanghang Tong, Wei Wang, and Xiang Zhang. 2015. Robust Multi-Network Clustering via Joint Cross-Domain Cluster Alignment. In ICDM. Google ScholarDigital Library
- Qiaozhu Mei, Jian Guo, and Dragomir Radev. 2010. Divrank: the interplay of prestige and diversity in information networks. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. Acm, 1009-1018. Google ScholarDigital Library
- Jingchao Ni, Hongliang Fei, Wei Fan, and Xiang Zhang. 2017. Automated Medical Diagnosis by Ranking Clusters Across the Symptom-Disease Network. In Data Mining (ICDM), 2017 IEEE International Conference on. IEEE, 1009-1014.Google ScholarCross Ref
- Jingchao Ni, Hongliang Fei, Wei Fan, and Xiang Zhang. 2017. Cross-Network Clustering and Cluster Ranking for Medical Diagnosis. In ICDE.Google Scholar
- Jingchao Ni, Mehmet Koyuturk, Hanghang Tong, Jonathan Haines, Rong Xu, and Xiang Zhang. 2016. Disease gene prioritization by integrating tissue-specific molecular networks using a robust multi-network model. BMC bioinformatics 17, 1 (2016), 453.Google Scholar
- Robin Pemantle 2007. A survey of random processes with reinforcement. Probability surveys 4(2007), 1-79.Google Scholar
- Satu Elisa Schaeffer. 2007. Graph clustering. Computer science review 1, 1 (2007), 27-64. Google ScholarDigital Library
- Mauro Sozio and Aristides Gionis. 2010. The community-search problem and how to plan a successful cocktail party. In KDD. Google ScholarDigital Library
- Hanghang Tong, Christos Faloutsos, Brian Gallagher, and Tina Eliassi-Rad. 2007. Fast best-effort pattern matching in large attributed graphs. In KDD. Google ScholarDigital Library
- Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan. 2006. Fast random walk with restart and its applications. (2006). Google ScholarDigital Library
- Marc A Van Driel, Jorn Bruggeman, Gert Vriend, Han G Brunner, and Jack AM Leunissen. 2006. A text-mining analysis of the human phenome. European journal of human genetics 14, 5 (2006), 535-542.Google Scholar
- Yubao Wu, Ruoming Jin, Jing Li, and Xiang Zhang. 2015. Robust local community detection: on free rider effect and its elimination. Proceedings of the VLDB Endowment 8, 7 (2015), 798-809. Google ScholarDigital Library
- Wayne W Zachary. 1977. An information flow model for conflict and fission in small groups. Journal of anthropological research 33, 4 (1977), 452-473.Google ScholarCross Ref
- Denny Zhou, Olivier Bousquet, Thomas N Lal, Jason Weston, and Bernhard Schölkopf. 2004. Learning with local and global consistency. In Advances in neural information processing systems. 321-328. Google ScholarDigital Library
- Xiaojin Zhu, Zoubin Ghahramani, and John D Lafferty. 2003. Semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of the 20th International conference on Machine learning (ICML-03). 912-919. Google ScholarDigital Library
Recommendations
Local Higher-Order Graph Clustering
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningLocal graph clustering methods aim to find a cluster of nodes by exploring a small region of the graph. These methods are attractive because they enable targeted clustering around a given seed node and are faster than traditional global graph clustering ...
Statistical guarantees for local graph clustering
Local graph clustering methods aim to find small clusters in very large graphs. These methods take as input a graph and a seed node, and they return as output a good cluster in a running time that depends on the size of the output cluster but that is ...
Multi-agent Random Walks for Local Clustering on Graphs
ICDM '10: Proceedings of the 2010 IEEE International Conference on Data MiningWe consider the problem of local graph clustering where the aim is to discover the local cluster corresponding to a point of interest. The most popular algorithms to solve this problem start a random walk at the point of interest and let it run until ...
Comments