skip to main content
10.1145/3366030.3366051acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiiwasConference Proceedingsconference-collections
research-article

Fast RankCIus Algorithm via Dynamic Rank Score Tracking on Bi-type Information Networks

Published:22 February 2020Publication History

ABSTRACT

Given a bi-type information network, which is an extended model of well-known bipartite graphs, how can clusters be efficiently found in graphs? Graph clustering is now a fundamental tool to understand overviews from graph-structured data. The RankClus framework accurately performs clustering for bi-type information networks using ranking-based graph clustering techniques. It integrates a graph ranking algorithms such as PageRank or HITS into graph clustering procedures to improve the clustering quality. However, this integration incurs a high computational cost to handle large bi-type information networks since RankClus repeatedly computes the ranking algorithm for all nodes and edges until the clustering procedure converges. To overcome this runtime limitation, herein we present a novel RankClus algorithm that reduces the running time for large bi-type information networks. Our proposed method employs dynamic graph processing techniques into the ranking procedures included in RankClus. By dynamically updating ranking results, our proposal reduces the number of computed nodes and edges during repeated ranking procedures. We experimentally verify using real-world datasets that our proposed method successfully reduces the running time while maintaining the clustering quality of RankClus.

References

  1. Daniel A. Spielman and Shang-Hua Teng. 2013. A Local Clustering Algorithm for Massive Graphs and Its Application to Nearly Linear Time Graph Partitioning. SIAM J. Comput. 42, 1 (2013), 1--26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Andrey Balmin, Vagelis Hristidis, and Yannis Papakonstantinou. 2004. ObjectRank: Authority-Based Keyword Search in Databases. In Proc. VLDB. 564--575.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Pavel Berkhin. [n.d.]. Bookmark-coloring algorithm for personalized PageRank computing. Internet Math 3 ([n. d.]), 2006.Google ScholarGoogle Scholar
  4. V.D. Blondel, J.L. Guillaume, R. Lambiotte, and E.L.J.S. Mech. 2008. Fast Unfolding of Communities in Large Networks. Journal of Statistical Mechanics: Theory and Experiment 2008, 10 (2008), P10008.Google ScholarGoogle ScholarCross RefCross Ref
  5. Sergey Brin and Lawrence Page. 1998. The Anatomy of a Large-scale Hypertextual Web Search Engine. Comput. Netw. ISDN Syst. (1998), 107--117.Google ScholarGoogle Scholar
  6. Liangliang Cao, Xin Jin, Zhijun Yin, Andrey Del Pozo, Jiebo Luo, Jiawei Han, and Thomas S Huang. 2012. Rankcompete: Simultaneous ranking and clustering of information networks. Neurocomputing (2012), 98--104.Google ScholarGoogle Scholar
  7. Juan David Cruz, Cécile Bothorel, and François Poulet. 2013. Integrating Heterogeneous Information Within a Social Network for Detecting Communities. In Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM '13). 1453--1454.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. 226--231.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Santo Fortunato and M Barthelemy. 2007. Resolution Limit in Community Detection. Proceedings of the National Academy of Sciences (Jan 2007).Google ScholarGoogle ScholarCross RefCross Ref
  10. Glen Jeh and Jennifer Widom. 2003. Scaling Personalized Web Search. In Proceedings of the 12th International Conference on World Wide Web (WWW2003). 271--279.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jon M. Kleinberg. 1999. Authoritative Sources in a Hyperlinked Environment. J. ACM (1999), 604--632.Google ScholarGoogle Scholar
  12. J. B. MacQueen. 1967. Some Methods for Classification and Analysis of MultiVariate Observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1. University of California Press, 281--297.Google ScholarGoogle Scholar
  13. M. E. J. Newman and M. Girvan. 2004. Finding and Evaluating Community Structure in Networks. Physical Review E 69, 026113 (2004).Google ScholarGoogle Scholar
  14. Naoto Ohsaka, Takanori Maehara, and Ken-ichi Kawarabayashi. 2015. Efficient PageRank Tracking in Evolving Networks (KDD '15). 875--884.Google ScholarGoogle Scholar
  15. Makoto Onizuka, Toshimasa Fujimori, and Hiroaki Shiokawa. 2017. Graph Partitioning for Distributed Graph Processing. Data Science and Engineering 2, 1 (01 Mar 2017), 94--105.Google ScholarGoogle Scholar
  16. Guo-Jun Qi, Charu C. Aggarwal, and Thomas S. Huang. 2012. On Clustering Heterogeneous Social Media Objects with Outlier Links. In Proceedings of the Fifth ACM International Conference on Web Search and Data Mining (WSDM '12). 553--562.Google ScholarGoogle Scholar
  17. Ryan A. Rossi and Nesreen K. Ahmed. 2015. The Network Data Repository with Interactive Graph Analytics and Visualization. In AAAI. http://networkrepository.comGoogle ScholarGoogle Scholar
  18. Tomoki Sato, Hiroaki Shiokawa, Yuto Yamaguchi, and Hiroyuki Kitagawa. 2018. FORank: Fast ObjectRank for Large Heterogeneous Graphs. In Companion Proceedings of the The Web Conference 2018. 103--104.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jianbo Shi and Jitendra Malik. 2000. Normalized Cuts and Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. (2000), 888--905.Google ScholarGoogle Scholar
  20. Hiroaki Shiokawa, Toshiyuki Amagasa, and Hiroyuki Kitagawa. 2019. Scaling Fine-grained Modularity Clustering for Massive Graphs. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI 2019). 4597--4604.Google ScholarGoogle ScholarCross RefCross Ref
  21. Hiroaki Shiokawa, Yasuhiro Fujiwara, and Makoto Onizuka. 2013. Fast Algorithm for Modularity-based Graph Clustering. In Proceedings of the 27th AAAI Conference on Artificial Intelligence. 1170--1176.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Hiroaki Shiokawa, Yasuhiro Fujiwara, and Makoto Onizuka. 2015. SCAN++: Efficient Algorithm for Finding Clusters, Hubs and Outliers on Large-scale Graphs. Proceedings of Very Large Data Bases Endowment 8, 11 (2015), 1178--1189.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Hiroaki Shiokawa and Makoto Onizuka. 2017. Scalable Graph Clustering and Its Applications. Springer New York, New York, NY, 1--10.Google ScholarGoogle Scholar
  24. Hiroaki Shiokawa, Tomokatsu Takahashi, and Hiroyuki Kitagawa. 2018. ScaleSCAN: Scalable Density-based Graph Clustering. In Proceedings of the 29th International Conference on Database and Expert Systems Applications (DEXA). 18--34.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Alexander Strehl and Joydeep Ghosh. 2003. Cluster Ensembles --- a Knowledge Reuse Framework for Combining Multiple Partitions. J. Mach. Learn. Res. (2003), 583--617.Google ScholarGoogle Scholar
  26. Yizhou Sun, Jiawei Han, Peixiang Zhao, Zhijun Yin, Hong Cheng, and Tianyi Wu. 2009. RankClus: Integrating Clustering with Ranking for Heterogeneous Information Network Analysis (EDBT '09). 565--576.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Tomokatsu Takahashi, Hiroaki Shiokawa, and Hiroyuki Kitagawa. 2017. SCAN-XP: Parallel Structural Graph Clustering Algorithm on Intel Xeon Phi Coprocessors. In Proceedings of the 2nd International Workshop on Network Data Analytics (NDA). New York, NY, USA, Article 6, 7 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Xiaowei Xu, Nurcan Yuruk, Zhidan Feng, and Thomas A. J. Schweiger. 2007. SCAN: A Structural Clustering Algorithm for Networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, New York, NY, USA, 824--833.Google ScholarGoogle Scholar
  29. Kotaro Yamazaki, Tomoki Sato, Hiroaki Shiokawa, and Hiroyuki Kitagawa. 2018. Fast Algorithm for Integrating Clustering with Ranking on Heterogeneous Graphs. In Proceedings of the 20th International Conference on Information Integration and Web-based Applications & Services (iiWAS2018). 24--32.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Kotaro Yamazaki, Tomoki Sato, Hiroaki Shiokawa, and Hiroyuki Kitagawa. 2019. Fast and Parallel Ranking-based Clustering for Heterogeneous Graphs. Journal of Data Intelligence 1, 2 (6 2019), 137--158.Google ScholarGoogle Scholar

Index Terms

  1. Fast RankCIus Algorithm via Dynamic Rank Score Tracking on Bi-type Information Networks

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          iiWAS2019: Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services
          December 2019
          709 pages

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 22 February 2020

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited
        • Article Metrics

          • Downloads (Last 12 months)2
          • Downloads (Last 6 weeks)0

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader