ABSTRACT
Given a bi-type information network, which is an extended model of well-known bipartite graphs, how can clusters be efficiently found in graphs? Graph clustering is now a fundamental tool to understand overviews from graph-structured data. The RankClus framework accurately performs clustering for bi-type information networks using ranking-based graph clustering techniques. It integrates a graph ranking algorithms such as PageRank or HITS into graph clustering procedures to improve the clustering quality. However, this integration incurs a high computational cost to handle large bi-type information networks since RankClus repeatedly computes the ranking algorithm for all nodes and edges until the clustering procedure converges. To overcome this runtime limitation, herein we present a novel RankClus algorithm that reduces the running time for large bi-type information networks. Our proposed method employs dynamic graph processing techniques into the ranking procedures included in RankClus. By dynamically updating ranking results, our proposal reduces the number of computed nodes and edges during repeated ranking procedures. We experimentally verify using real-world datasets that our proposed method successfully reduces the running time while maintaining the clustering quality of RankClus.
- Daniel A. Spielman and Shang-Hua Teng. 2013. A Local Clustering Algorithm for Massive Graphs and Its Application to Nearly Linear Time Graph Partitioning. SIAM J. Comput. 42, 1 (2013), 1--26.Google ScholarDigital Library
- Andrey Balmin, Vagelis Hristidis, and Yannis Papakonstantinou. 2004. ObjectRank: Authority-Based Keyword Search in Databases. In Proc. VLDB. 564--575.Google ScholarDigital Library
- Pavel Berkhin. [n.d.]. Bookmark-coloring algorithm for personalized PageRank computing. Internet Math 3 ([n. d.]), 2006.Google Scholar
- V.D. Blondel, J.L. Guillaume, R. Lambiotte, and E.L.J.S. Mech. 2008. Fast Unfolding of Communities in Large Networks. Journal of Statistical Mechanics: Theory and Experiment 2008, 10 (2008), P10008.Google ScholarCross Ref
- Sergey Brin and Lawrence Page. 1998. The Anatomy of a Large-scale Hypertextual Web Search Engine. Comput. Netw. ISDN Syst. (1998), 107--117.Google Scholar
- Liangliang Cao, Xin Jin, Zhijun Yin, Andrey Del Pozo, Jiebo Luo, Jiawei Han, and Thomas S Huang. 2012. Rankcompete: Simultaneous ranking and clustering of information networks. Neurocomputing (2012), 98--104.Google Scholar
- Juan David Cruz, Cécile Bothorel, and François Poulet. 2013. Integrating Heterogeneous Information Within a Social Network for Detecting Communities. In Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM '13). 1453--1454.Google ScholarDigital Library
- Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. 226--231.Google ScholarDigital Library
- Santo Fortunato and M Barthelemy. 2007. Resolution Limit in Community Detection. Proceedings of the National Academy of Sciences (Jan 2007).Google ScholarCross Ref
- Glen Jeh and Jennifer Widom. 2003. Scaling Personalized Web Search. In Proceedings of the 12th International Conference on World Wide Web (WWW2003). 271--279.Google ScholarDigital Library
- Jon M. Kleinberg. 1999. Authoritative Sources in a Hyperlinked Environment. J. ACM (1999), 604--632.Google Scholar
- J. B. MacQueen. 1967. Some Methods for Classification and Analysis of MultiVariate Observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1. University of California Press, 281--297.Google Scholar
- M. E. J. Newman and M. Girvan. 2004. Finding and Evaluating Community Structure in Networks. Physical Review E 69, 026113 (2004).Google Scholar
- Naoto Ohsaka, Takanori Maehara, and Ken-ichi Kawarabayashi. 2015. Efficient PageRank Tracking in Evolving Networks (KDD '15). 875--884.Google Scholar
- Makoto Onizuka, Toshimasa Fujimori, and Hiroaki Shiokawa. 2017. Graph Partitioning for Distributed Graph Processing. Data Science and Engineering 2, 1 (01 Mar 2017), 94--105.Google Scholar
- Guo-Jun Qi, Charu C. Aggarwal, and Thomas S. Huang. 2012. On Clustering Heterogeneous Social Media Objects with Outlier Links. In Proceedings of the Fifth ACM International Conference on Web Search and Data Mining (WSDM '12). 553--562.Google Scholar
- Ryan A. Rossi and Nesreen K. Ahmed. 2015. The Network Data Repository with Interactive Graph Analytics and Visualization. In AAAI. http://networkrepository.comGoogle Scholar
- Tomoki Sato, Hiroaki Shiokawa, Yuto Yamaguchi, and Hiroyuki Kitagawa. 2018. FORank: Fast ObjectRank for Large Heterogeneous Graphs. In Companion Proceedings of the The Web Conference 2018. 103--104.Google ScholarDigital Library
- Jianbo Shi and Jitendra Malik. 2000. Normalized Cuts and Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. (2000), 888--905.Google Scholar
- Hiroaki Shiokawa, Toshiyuki Amagasa, and Hiroyuki Kitagawa. 2019. Scaling Fine-grained Modularity Clustering for Massive Graphs. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI 2019). 4597--4604.Google ScholarCross Ref
- Hiroaki Shiokawa, Yasuhiro Fujiwara, and Makoto Onizuka. 2013. Fast Algorithm for Modularity-based Graph Clustering. In Proceedings of the 27th AAAI Conference on Artificial Intelligence. 1170--1176.Google ScholarDigital Library
- Hiroaki Shiokawa, Yasuhiro Fujiwara, and Makoto Onizuka. 2015. SCAN++: Efficient Algorithm for Finding Clusters, Hubs and Outliers on Large-scale Graphs. Proceedings of Very Large Data Bases Endowment 8, 11 (2015), 1178--1189.Google ScholarDigital Library
- Hiroaki Shiokawa and Makoto Onizuka. 2017. Scalable Graph Clustering and Its Applications. Springer New York, New York, NY, 1--10.Google Scholar
- Hiroaki Shiokawa, Tomokatsu Takahashi, and Hiroyuki Kitagawa. 2018. ScaleSCAN: Scalable Density-based Graph Clustering. In Proceedings of the 29th International Conference on Database and Expert Systems Applications (DEXA). 18--34.Google ScholarDigital Library
- Alexander Strehl and Joydeep Ghosh. 2003. Cluster Ensembles --- a Knowledge Reuse Framework for Combining Multiple Partitions. J. Mach. Learn. Res. (2003), 583--617.Google Scholar
- Yizhou Sun, Jiawei Han, Peixiang Zhao, Zhijun Yin, Hong Cheng, and Tianyi Wu. 2009. RankClus: Integrating Clustering with Ranking for Heterogeneous Information Network Analysis (EDBT '09). 565--576.Google ScholarDigital Library
- Tomokatsu Takahashi, Hiroaki Shiokawa, and Hiroyuki Kitagawa. 2017. SCAN-XP: Parallel Structural Graph Clustering Algorithm on Intel Xeon Phi Coprocessors. In Proceedings of the 2nd International Workshop on Network Data Analytics (NDA). New York, NY, USA, Article 6, 7 pages.Google ScholarDigital Library
- Xiaowei Xu, Nurcan Yuruk, Zhidan Feng, and Thomas A. J. Schweiger. 2007. SCAN: A Structural Clustering Algorithm for Networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, New York, NY, USA, 824--833.Google Scholar
- Kotaro Yamazaki, Tomoki Sato, Hiroaki Shiokawa, and Hiroyuki Kitagawa. 2018. Fast Algorithm for Integrating Clustering with Ranking on Heterogeneous Graphs. In Proceedings of the 20th International Conference on Information Integration and Web-based Applications & Services (iiWAS2018). 24--32.Google ScholarDigital Library
- Kotaro Yamazaki, Tomoki Sato, Hiroaki Shiokawa, and Hiroyuki Kitagawa. 2019. Fast and Parallel Ranking-based Clustering for Heterogeneous Graphs. Journal of Data Intelligence 1, 2 (6 2019), 137--158.Google Scholar
Index Terms
- Fast RankCIus Algorithm via Dynamic Rank Score Tracking on Bi-type Information Networks
Recommendations
Fast Algorithm for Integrating Clustering with Ranking on Heterogeneous Graphs
iiWAS2018: Proceedings of the 20th International Conference on Information Integration and Web-based Applications & ServicesThe demands for graph data analysis methods, e.g., clustering and ranking, are increasing. RankClus is a framework to extract good clusters by integrating clustering and ranking on heterogeneous graphs; it enhances the clustering results by alternately ...
K-Harmonic means type clustering algorithm for mixed datasets
Display Omitted A K-Harmonic clustering algorithm for mixed data has been presented to reduce random initialization problem for partitional algorithms.The proposed clustering algorithm uses a distance measure developed for mixed datasets.The experiment ...
Graph Clustering via Cohesiveness-aware Vector Partitioning
iiWAS2018: Proceedings of the 20th International Conference on Information Integration and Web-based Applications & ServicesGraph clustering is one of the key techniques for understanding structures present in the complex graphs such as Web pages, social networks, and others. In the Web and data mining communities, modularity-based graph clustering algorithm is successfully ...
Comments