research-article

Fast RankCIus Algorithm via Dynamic Rank Score Tracking on Bi-type Information Networks

Authors:
Kotaro Yamazaki

Graduate School of SIE, University of Tsukuba

Graduate School of SIE, University of Tsukuba
View Profile

,
Shohei Matsugu

Graduate School of SIE, University of Tsukuba

Graduate School of SIE, University of Tsukuba
View Profile

,
Hiroaki Shiokawa

Center for Computational Sciences, University of Tsukuba

Center for Computational Sciences, University of Tsukuba
View Profile

,
Hiroyuki Kitagawa

Center for Computational Sciences, University of Tsukuba

Center for Computational Sciences, University of Tsukuba
View Profile

iiWAS2019: Proceedings of the 21st International Conference on Information Integration and Web-based Applications & ServicesDecember 2019Pages 110–117https://doi.org/10.1145/3366030.3366051

Published:22 February 2020Publication History

iiWAS2019: Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services

Pages 110–117

ABSTRACT

Given a bi-type information network, which is an extended model of well-known bipartite graphs, how can clusters be efficiently found in graphs? Graph clustering is now a fundamental tool to understand overviews from graph-structured data. The RankClus framework accurately performs clustering for bi-type information networks using ranking-based graph clustering techniques. It integrates a graph ranking algorithms such as PageRank or HITS into graph clustering procedures to improve the clustering quality. However, this integration incurs a high computational cost to handle large bi-type information networks since RankClus repeatedly computes the ranking algorithm for all nodes and edges until the clustering procedure converges. To overcome this runtime limitation, herein we present a novel RankClus algorithm that reduces the running time for large bi-type information networks. Our proposed method employs dynamic graph processing techniques into the ranking procedures included in RankClus. By dynamically updating ranking results, our proposal reduces the number of computed nodes and edges during repeated ranking procedures. We experimentally verify using real-world datasets that our proposed method successfully reduces the running time while maintaining the clustering quality of RankClus.

References

Daniel A. Spielman and Shang-Hua Teng. 2013. A Local Clustering Algorithm for Massive Graphs and Its Application to Nearly Linear Time Graph Partitioning. SIAM J. Comput. 42, 1 (2013), 1--26.Google ScholarDigital Library
Andrey Balmin, Vagelis Hristidis, and Yannis Papakonstantinou. 2004. ObjectRank: Authority-Based Keyword Search in Databases. In Proc. VLDB. 564--575.Google ScholarDigital Library
Pavel Berkhin. [n.d.]. Bookmark-coloring algorithm for personalized PageRank computing. Internet Math 3 ([n. d.]), 2006.Google Scholar
V.D. Blondel, J.L. Guillaume, R. Lambiotte, and E.L.J.S. Mech. 2008. Fast Unfolding of Communities in Large Networks. Journal of Statistical Mechanics: Theory and Experiment 2008, 10 (2008), P10008.Google ScholarCross Ref
Sergey Brin and Lawrence Page. 1998. The Anatomy of a Large-scale Hypertextual Web Search Engine. Comput. Netw. ISDN Syst. (1998), 107--117.Google Scholar
Liangliang Cao, Xin Jin, Zhijun Yin, Andrey Del Pozo, Jiebo Luo, Jiawei Han, and Thomas S Huang. 2012. Rankcompete: Simultaneous ranking and clustering of information networks. Neurocomputing (2012), 98--104.Google Scholar
Juan David Cruz, Cécile Bothorel, and François Poulet. 2013. Integrating Heterogeneous Information Within a Social Network for Detecting Communities. In Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM '13). 1453--1454.Google ScholarDigital Library
Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. 226--231.Google ScholarDigital Library
Santo Fortunato and M Barthelemy. 2007. Resolution Limit in Community Detection. Proceedings of the National Academy of Sciences (Jan 2007).Google ScholarCross Ref
Glen Jeh and Jennifer Widom. 2003. Scaling Personalized Web Search. In Proceedings of the 12th International Conference on World Wide Web (WWW2003). 271--279.Google ScholarDigital Library
Jon M. Kleinberg. 1999. Authoritative Sources in a Hyperlinked Environment. J. ACM (1999), 604--632.Google Scholar
J. B. MacQueen. 1967. Some Methods for Classification and Analysis of MultiVariate Observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1. University of California Press, 281--297.Google Scholar
M. E. J. Newman and M. Girvan. 2004. Finding and Evaluating Community Structure in Networks. Physical Review E 69, 026113 (2004).Google Scholar
Naoto Ohsaka, Takanori Maehara, and Ken-ichi Kawarabayashi. 2015. Efficient PageRank Tracking in Evolving Networks (KDD '15). 875--884.Google Scholar
Makoto Onizuka, Toshimasa Fujimori, and Hiroaki Shiokawa. 2017. Graph Partitioning for Distributed Graph Processing. Data Science and Engineering 2, 1 (01 Mar 2017), 94--105.Google Scholar
Guo-Jun Qi, Charu C. Aggarwal, and Thomas S. Huang. 2012. On Clustering Heterogeneous Social Media Objects with Outlier Links. In Proceedings of the Fifth ACM International Conference on Web Search and Data Mining (WSDM '12). 553--562.Google Scholar
Ryan A. Rossi and Nesreen K. Ahmed. 2015. The Network Data Repository with Interactive Graph Analytics and Visualization. In AAAI. http://networkrepository.comGoogle Scholar
Tomoki Sato, Hiroaki Shiokawa, Yuto Yamaguchi, and Hiroyuki Kitagawa. 2018. FORank: Fast ObjectRank for Large Heterogeneous Graphs. In Companion Proceedings of the The Web Conference 2018. 103--104.Google ScholarDigital Library
Jianbo Shi and Jitendra Malik. 2000. Normalized Cuts and Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. (2000), 888--905.Google Scholar
Hiroaki Shiokawa, Toshiyuki Amagasa, and Hiroyuki Kitagawa. 2019. Scaling Fine-grained Modularity Clustering for Massive Graphs. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI 2019). 4597--4604.Google ScholarCross Ref
Hiroaki Shiokawa, Yasuhiro Fujiwara, and Makoto Onizuka. 2013. Fast Algorithm for Modularity-based Graph Clustering. In Proceedings of the 27th AAAI Conference on Artificial Intelligence. 1170--1176.Google ScholarDigital Library
Hiroaki Shiokawa, Yasuhiro Fujiwara, and Makoto Onizuka. 2015. SCAN++: Efficient Algorithm for Finding Clusters, Hubs and Outliers on Large-scale Graphs. Proceedings of Very Large Data Bases Endowment 8, 11 (2015), 1178--1189.Google ScholarDigital Library
Hiroaki Shiokawa and Makoto Onizuka. 2017. Scalable Graph Clustering and Its Applications. Springer New York, New York, NY, 1--10.Google Scholar
Hiroaki Shiokawa, Tomokatsu Takahashi, and Hiroyuki Kitagawa. 2018. ScaleSCAN: Scalable Density-based Graph Clustering. In Proceedings of the 29th International Conference on Database and Expert Systems Applications (DEXA). 18--34.Google ScholarDigital Library
Alexander Strehl and Joydeep Ghosh. 2003. Cluster Ensembles --- a Knowledge Reuse Framework for Combining Multiple Partitions. J. Mach. Learn. Res. (2003), 583--617.Google Scholar
Yizhou Sun, Jiawei Han, Peixiang Zhao, Zhijun Yin, Hong Cheng, and Tianyi Wu. 2009. RankClus: Integrating Clustering with Ranking for Heterogeneous Information Network Analysis (EDBT '09). 565--576.Google ScholarDigital Library
Tomokatsu Takahashi, Hiroaki Shiokawa, and Hiroyuki Kitagawa. 2017. SCAN-XP: Parallel Structural Graph Clustering Algorithm on Intel Xeon Phi Coprocessors. In Proceedings of the 2nd International Workshop on Network Data Analytics (NDA). New York, NY, USA, Article 6, 7 pages.Google ScholarDigital Library
Xiaowei Xu, Nurcan Yuruk, Zhidan Feng, and Thomas A. J. Schweiger. 2007. SCAN: A Structural Clustering Algorithm for Networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, New York, NY, USA, 824--833.Google Scholar
Kotaro Yamazaki, Tomoki Sato, Hiroaki Shiokawa, and Hiroyuki Kitagawa. 2018. Fast Algorithm for Integrating Clustering with Ranking on Heterogeneous Graphs. In Proceedings of the 20th International Conference on Information Integration and Web-based Applications & Services (iiWAS2018). 24--32.Google ScholarDigital Library
Kotaro Yamazaki, Tomoki Sato, Hiroaki Shiokawa, and Hiroyuki Kitagawa. 2019. Fast and Parallel Ranking-based Clustering for Heterogeneous Graphs. Journal of Data Intelligence 1, 2 (6 2019), 137--158.Google Scholar

Index Terms

Fast RankCIus Algorithm via Dynamic Rank Score Tracking on Bi-type Information Networks
1. Information systems
  1. Information systems applications
    1. Data mining
      1. Clustering
  2. World Wide Web
    1. Web searching and information discovery
      1. Content ranking
2. Theory of computation
  1. Design and analysis of algorithms
    1. Graph algorithms analysis

Recommendations

Fast Algorithm for Integrating Clustering with Ranking on Heterogeneous Graphs
iiWAS2018: Proceedings of the 20th International Conference on Information Integration and Web-based Applications & Services

The demands for graph data analysis methods, e.g., clustering and ranking, are increasing. RankClus is a framework to extract good clusters by integrating clustering and ranking on heterogeneous graphs; it enhances the clustering results by alternately ...
Read More
K-Harmonic means type clustering algorithm for mixed datasets

Display Omitted A K-Harmonic clustering algorithm for mixed data has been presented to reduce random initialization problem for partitional algorithms.The proposed clustering algorithm uses a distance measure developed for mixed datasets.The experiment ...
Read More
Graph Clustering via Cohesiveness-aware Vector Partitioning
iiWAS2018: Proceedings of the 20th International Conference on Information Integration and Web-based Applications & Services

Graph clustering is one of the key techniques for understanding structures present in the complex graphs such as Web pages, social networks, and others. In the Web and data mining communities, modularity-based graph clustering algorithm is successfully ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
iiWAS2019: Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services
December 2019
709 pages
ISBN:9781450371797
DOI:10.1145/3366030
Editors:
Maria Indrawan-Santiago,
Eric Pardede,
Ivan Luiz Salvadori,
Matthias Steinbauer,
Ismail Khalil,
Gabriele Anderst-Kotsis
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 February 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Clustering
Dynamic algorithm
Graph
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 53
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Fast RankCIus Algorithm via Dynamic Rank Score Tracking on Bi-type Information Networks

iiWAS2019: Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services

ABSTRACT

References

Cited By

Index Terms

Recommendations

Fast Algorithm for Integrating Clustering with Ranking on Heterogeneous Graphs

K-Harmonic means type clustering algorithm for mixed datasets

Graph Clustering via Cohesiveness-aware Vector Partitioning