Abstract
Graph is an extremely versatile data structure in terms of its expressiveness and flexibility to model a range of real life phenomenon. Various networks like social networks, sensor networks and computer networks are represented and stored in the form of graphs. The analysis of these kind of graphs has an immense importance from quite a long time. It is performed from various aspects to get maximum out of such multifaceted information repository. When the analysis is targeted towards finding groups of vertices based on their similarity in a graph, clustering is the most conspicuous option. Previous graph clustering approaches either focus on the topological structure or attributes likeness, however, few recent methods constitutes both aspects simultaneously. Due to enormous computation requirements for similarity estimation, these methods are often suffered from scalability issues. In order to overcome this limitation, we introduce collaborative similarity measure (CSM) for intra-graph clustering. CSM is based on shortest path strategy, instead of all paths, to define structural and semantic relevance among vertices. First, we calculate the pair-wise similarity among vertices using CSM. Second, vertices are grouped together based on calculated similarity under k-Medoid framework. Empirical analysis, based on density, and entropy, proves the efficacy of CSM over existing measures. Moreover, CSM becomes a potential candidate for medium scaled graph analysis due to an order of magnitude less computations.













Similar content being viewed by others
Notes
\({<}10^4\) nodes.
\({<}10^6\) nodes.
The source code is available at https://github.com/WNawaz/CSM.
The detailed conference list is DB: SIGMOD, VLDB, PODS,ICDE, EDBT; DM: KDD, ICDM, SDM, PAKDD, PKDD; IR: SIGIR,CIKM, ECIR, WWW; AI: IJCAI, AAAI, UAI, NIPS.
\(10^6-10^9\) nodes.
References
Ahn, Y.Y., Han, S., Kwak, H., Moon, S., Jeong, H.: Analysis of topological characteristics of huge online social networking services. In: Proceedings of the 16th International Conference on World Wide Web, WWW ’07, pp. 835–844. ACM, New York, NY, USA (2007). doi:10.1145/1242572.1242685
Anand, R., Reddy, C.K.: Graph-based clustering with constraints. In: Proceedings of the 15th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining—Volume Part II, PAKDD’11, pp. 51–62. Springer, Berlin, Heidelberg (2011). http://dl.acm.org/citation.cfm?id=2022850.2022855
Andersen, R., Lang, K.J.: Communities from seed sets. In: Proceedings of the 15th International Conference on World Wide Web, WWW ’06, pp. 223–232. ACM, New York, NY, USA (2006). doi:10.1145/1135777.1135814
Cheng, H., Zhou, Y., Yu, J.X.: Clustering large attributed graphs: a balance between structural and attribute similarities. ACM Trans. Knowl. Discov. Data 5(2), 12:1–12:33 (2011). doi:10.1145/1921632.1921638
Cook, D.J., Holder, L.B.: Mining Graph Data. Wiley, New York (2006)
Drineas, P., Frieze, A., Kannan, R., Vempala, S., Vinay, V.: Clustering large graphs via the singular value decomposition. Mach. Learn. 56(1–3), 9–33 (2004). doi:10.1023/B:MACH.0000033113.59016.96
Everitt, B.S., Landau, S., Leese, M.: Cluster Analysis, 4th edn. Wiley, New York (2009)
Fjllstrm, P.O.: Algorithms for Graph Partitioning: A Survey. Linkping Electronic Articles in Computer and Information Science 3 (1998). http://www.ep.liu.se/ea/cis/1998/010/cis98010.pdf
Flake, G.W., Tarjan, R.E., Tsioutsiouliklis, K.: Graph clustering and minimum cut trees. Internet Math. 1, 385–408 (2004)
Fredman, M.L., Tarjan, R.E.: Fibonacci heaps and their uses in improved network optimization algorithms. J. ACM 34(3), 596–615 (1987). doi:10.1145/28869.28874
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’99, pp. 50–57. ACM, New York, NY, USA (1999). doi:10.1145/312624.312649
Huang, X., Lai, W.: Clustering graphs for visualization via node similarities. J. Vis. Lang. Comput. 17(3), 225–253 (2006). doi:10.1016/j.jvlc.2005.10.003
Ino, H., Kudo, M., Nakamura, A.: Partitioning of web graphs by community topology. In: Proceedings of the 14th International Conference on World Wide Web, WWW ’05, pp. 661–669. ACM, New York, NY, USA (2005). doi:10.1145/1060745.1060841
Jaccard, P.: Étude comparative de la distribution florale dans une portion des alpes et des jura. Bulletin del la Société Vaudoise des Sciences Naturelles 37, 547–579 (1901)
Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.W.: Statistical properties of community structure in large social and information networks. In: Proceedings of the 17th International Conference on World Wide Web, WWW ’08, pp. 695–704. ACM, New York, NY, USA (2008). doi:10.1145/1367497.1367591
Macropol, K., Singh, A.: Scalable discovery of best clusters on large graphs. Proc. VLDB Endow. 3(1–2), 693–702 (2010). http://dl.acm.org/citation.cfm?id=1920841.1920930
Nawaz, W., Lee, Y.K., Lee, S.: Collaborative similarity measure for intra graph clustering. In: DASFAA Workshops, pp. 204–215 (2012)
Newman, M.: Detecting community structure in networks. Eur. Phys. J. B Condens. Matter Complex Syst. 38, 321–330 (2004). doi:10.1140/epjb/e2004-00124-y
Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 (2004). doi:10.1103/PhysRevE.69.026113
Papadopoulos, S., Kompatsiaris, Y., Vakali, A., Spyridonos, P.: Community detection in social media. Data Min. Knowl. Discov. 24(3), 515–554 (2012). doi:10.1007/s10618-011-0224-z
Rob, G.-P., Hwang, S.: Online clustering algorithms for semantic-rich network trajectories. J. Comput. Sci. Eng. 5, 346–353 (2011). doi:10.5626/JCSE.2011.5.4.346
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22, 888–905 (1997)
Tiakas, E., Papadopoulos, A.N., Manolopoulos, Y.: Graph node clustering via transitive node similarity. In: Panhellenic Conference on Informatics, pp. 72–77 (2010). doi:10.1109/PCI.2010.42
Tian, Y., Hankins, R.A., Patel, J.M.: Efficient aggregation for graph summarization. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD ’08, pp. 567–580. ACM, New York, NY, USA (2008). doi:10.1145/1376616.1376675
van Dongen, S.M.: Graph Clustering by Flow Simulation. Ph.D. thesis, University of Utrecht, The Netherlands (2000)
Xu, X., Yuruk, N., Feng, Z., Schweiger, T.A.J.: Scan: a structural clustering algorithm for networks. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’07, pp. 824–833. ACM, New York, NY, USA (2007). doi:10.1145/1281192.1281280
Zhai, C., Velivelli, A., Yu, B.: A cross-collection mixture model for comparative text mining. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’04, pp. 743–748. ACM, New York, NY, USA (2004). doi:10.1145/1014052.1014150
Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. Proc. VLDB Endow. 2(1), 718–729 (2009). http://dl.acm.org/citation.cfm?id=1687627.1687709
Acknowledgments
We are thankful to the anonymous reviewers for valuable comments and suggestions. This research was supported by the MSIP (Ministry of Science, ICT & Future Planning), Korea, under the ITRC (Information Technology Research Center) support program supervised by the NIPA (National IT Industry Promotion Agency) (NIPA-2014-(H0301-14-1003)).
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
The pseudo code of the proposed method is given in the following algorithm. The similarity estimation and clustering are two main independent steps. Initialization and similarity calculation is done once however the clustering step is repeatative in nature.

Rights and permissions
About this article
Cite this article
Nawaz, W., Khan, KU., Lee, YK. et al. Intra graph clustering using collaborative similarity measure. Distrib Parallel Databases 33, 583–603 (2015). https://doi.org/10.1007/s10619-014-7170-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-014-7170-x