Intra graph clustering using collaborative similarity measure

Nawaz, Waqas; Khan, Kifayat-Ullah; Lee, Young-Koo; Lee, Sungyoung

doi:10.1007/s10619-014-7170-x

Intra graph clustering using collaborative similarity measure

Published: 20 January 2015

Volume 33, pages 583–603, (2015)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

Waqas Nawaz¹,
Kifayat-Ullah Khan¹,
Young-Koo Lee¹ &
…
Sungyoung Lee¹

710 Accesses
26 Citations
3 Altmetric
Explore all metrics

Abstract

Graph is an extremely versatile data structure in terms of its expressiveness and flexibility to model a range of real life phenomenon. Various networks like social networks, sensor networks and computer networks are represented and stored in the form of graphs. The analysis of these kind of graphs has an immense importance from quite a long time. It is performed from various aspects to get maximum out of such multifaceted information repository. When the analysis is targeted towards finding groups of vertices based on their similarity in a graph, clustering is the most conspicuous option. Previous graph clustering approaches either focus on the topological structure or attributes likeness, however, few recent methods constitutes both aspects simultaneously. Due to enormous computation requirements for similarity estimation, these methods are often suffered from scalability issues. In order to overcome this limitation, we introduce collaborative similarity measure (CSM) for intra-graph clustering. CSM is based on shortest path strategy, instead of all paths, to define structural and semantic relevance among vertices. First, we calculate the pair-wise similarity among vertices using CSM. Second, vertices are grouped together based on calculated similarity under k-Medoid framework. Empirical analysis, based on density, and entropy, proves the efficacy of CSM over existing measures. Moreover, CSM becomes a potential candidate for medium scaled graph analysis due to an order of magnitude less computations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Novel Graph Clustering Algorithm Based on Structural Attribute Neighborhood Similarity (SANS)

Weighted clustering of attributed multi-graphs

Article 01 December 2016

Local Graph Clustering by Multi-network Random Walk with Restart

Notes

\({<}10^4\) nodes.
\({<}10^6\) nodes.
http://math.nist.gov/javanumerics/jama.
The source code is available at https://github.com/WNawaz/CSM.
http://www-personal.umich.edu/mejn/netdata.
The detailed conference list is DB: SIGMOD, VLDB, PODS,ICDE, EDBT; DM: KDD, ICDM, SDM, PAKDD, PKDD; IR: SIGIR,CIKM, ECIR, WWW; AI: IJCAI, AAAI, UAI, NIPS.
\(10^6-10^9\) nodes.

References

Ahn, Y.Y., Han, S., Kwak, H., Moon, S., Jeong, H.: Analysis of topological characteristics of huge online social networking services. In: Proceedings of the 16th International Conference on World Wide Web, WWW ’07, pp. 835–844. ACM, New York, NY, USA (2007). doi:10.1145/1242572.1242685
Anand, R., Reddy, C.K.: Graph-based clustering with constraints. In: Proceedings of the 15th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining—Volume Part II, PAKDD’11, pp. 51–62. Springer, Berlin, Heidelberg (2011). http://dl.acm.org/citation.cfm?id=2022850.2022855
Andersen, R., Lang, K.J.: Communities from seed sets. In: Proceedings of the 15th International Conference on World Wide Web, WWW ’06, pp. 223–232. ACM, New York, NY, USA (2006). doi:10.1145/1135777.1135814
Cheng, H., Zhou, Y., Yu, J.X.: Clustering large attributed graphs: a balance between structural and attribute similarities. ACM Trans. Knowl. Discov. Data 5(2), 12:1–12:33 (2011). doi:10.1145/1921632.1921638
Article MATH Google Scholar
Cook, D.J., Holder, L.B.: Mining Graph Data. Wiley, New York (2006)
Book Google Scholar
Drineas, P., Frieze, A., Kannan, R., Vempala, S., Vinay, V.: Clustering large graphs via the singular value decomposition. Mach. Learn. 56(1–3), 9–33 (2004). doi:10.1023/B:MACH.0000033113.59016.96
Article Google Scholar
Everitt, B.S., Landau, S., Leese, M.: Cluster Analysis, 4th edn. Wiley, New York (2009)
Fjllstrm, P.O.: Algorithms for Graph Partitioning: A Survey. Linkping Electronic Articles in Computer and Information Science 3 (1998). http://www.ep.liu.se/ea/cis/1998/010/cis98010.pdf
Flake, G.W., Tarjan, R.E., Tsioutsiouliklis, K.: Graph clustering and minimum cut trees. Internet Math. 1, 385–408 (2004)
Article MathSciNet MATH Google Scholar
Fredman, M.L., Tarjan, R.E.: Fibonacci heaps and their uses in improved network optimization algorithms. J. ACM 34(3), 596–615 (1987). doi:10.1145/28869.28874
Article MathSciNet Google Scholar
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’99, pp. 50–57. ACM, New York, NY, USA (1999). doi:10.1145/312624.312649
Huang, X., Lai, W.: Clustering graphs for visualization via node similarities. J. Vis. Lang. Comput. 17(3), 225–253 (2006). doi:10.1016/j.jvlc.2005.10.003
Article Google Scholar
Ino, H., Kudo, M., Nakamura, A.: Partitioning of web graphs by community topology. In: Proceedings of the 14th International Conference on World Wide Web, WWW ’05, pp. 661–669. ACM, New York, NY, USA (2005). doi:10.1145/1060745.1060841
Jaccard, P.: Étude comparative de la distribution florale dans une portion des alpes et des jura. Bulletin del la Société Vaudoise des Sciences Naturelles 37, 547–579 (1901)
Google Scholar
Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.W.: Statistical properties of community structure in large social and information networks. In: Proceedings of the 17th International Conference on World Wide Web, WWW ’08, pp. 695–704. ACM, New York, NY, USA (2008). doi:10.1145/1367497.1367591
Macropol, K., Singh, A.: Scalable discovery of best clusters on large graphs. Proc. VLDB Endow. 3(1–2), 693–702 (2010). http://dl.acm.org/citation.cfm?id=1920841.1920930
Nawaz, W., Lee, Y.K., Lee, S.: Collaborative similarity measure for intra graph clustering. In: DASFAA Workshops, pp. 204–215 (2012)
Newman, M.: Detecting community structure in networks. Eur. Phys. J. B Condens. Matter Complex Syst. 38, 321–330 (2004). doi:10.1140/epjb/e2004-00124-y
Article Google Scholar
Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 (2004). doi:10.1103/PhysRevE.69.026113
Article Google Scholar
Papadopoulos, S., Kompatsiaris, Y., Vakali, A., Spyridonos, P.: Community detection in social media. Data Min. Knowl. Discov. 24(3), 515–554 (2012). doi:10.1007/s10618-011-0224-z
Article Google Scholar
Rob, G.-P., Hwang, S.: Online clustering algorithms for semantic-rich network trajectories. J. Comput. Sci. Eng. 5, 346–353 (2011). doi:10.5626/JCSE.2011.5.4.346
Article Google Scholar
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22, 888–905 (1997)
Google Scholar
Tiakas, E., Papadopoulos, A.N., Manolopoulos, Y.: Graph node clustering via transitive node similarity. In: Panhellenic Conference on Informatics, pp. 72–77 (2010). doi:10.1109/PCI.2010.42
Tian, Y., Hankins, R.A., Patel, J.M.: Efficient aggregation for graph summarization. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD ’08, pp. 567–580. ACM, New York, NY, USA (2008). doi:10.1145/1376616.1376675
van Dongen, S.M.: Graph Clustering by Flow Simulation. Ph.D. thesis, University of Utrecht, The Netherlands (2000)
Xu, X., Yuruk, N., Feng, Z., Schweiger, T.A.J.: Scan: a structural clustering algorithm for networks. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’07, pp. 824–833. ACM, New York, NY, USA (2007). doi:10.1145/1281192.1281280
Zhai, C., Velivelli, A., Yu, B.: A cross-collection mixture model for comparative text mining. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’04, pp. 743–748. ACM, New York, NY, USA (2004). doi:10.1145/1014052.1014150
Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. Proc. VLDB Endow. 2(1), 718–729 (2009). http://dl.acm.org/citation.cfm?id=1687627.1687709

Download references

Acknowledgments

We are thankful to the anonymous reviewers for valuable comments and suggestions. This research was supported by the MSIP (Ministry of Science, ICT & Future Planning), Korea, under the ITRC (Information Technology Research Center) support program supervised by the NIPA (National IT Industry Promotion Agency) (NIPA-2014-(H0301-14-1003)).

Author information

Authors and Affiliations

Department of Computer Engineering, Kyung Hee University, 1 Seocheon-dong, Giheung-gu, Yongin-si, Gyeonggi-do, 446-701, Republic of Korea
Waqas Nawaz, Kifayat-Ullah Khan, Young-Koo Lee & Sungyoung Lee

Authors

Waqas Nawaz
View author publications
You can also search for this author in PubMed Google Scholar
Kifayat-Ullah Khan
View author publications
You can also search for this author in PubMed Google Scholar
Young-Koo Lee
View author publications
You can also search for this author in PubMed Google Scholar
Sungyoung Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Young-Koo Lee.

Appendix

The pseudo code of the proposed method is given in the following algorithm. The similarity estimation and clustering are two main independent steps. Initialization and similarity calculation is done once however the clustering step is repeatative in nature.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nawaz, W., Khan, KU., Lee, YK. et al. Intra graph clustering using collaborative similarity measure. Distrib Parallel Databases 33, 583–603 (2015). https://doi.org/10.1007/s10619-014-7170-x

Download citation

Published: 20 January 2015
Issue Date: December 2015
DOI: https://doi.org/10.1007/s10619-014-7170-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Intra graph clustering using collaborative similarity measure

Abstract

Access this article

Similar content being viewed by others

A Novel Graph Clustering Algorithm Based on Structural Attribute Neighborhood Similarity (SANS)

Weighted clustering of attributed multi-graphs

Local Graph Clustering by Multi-network Random Walk with Restart

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Intra graph clustering using collaborative similarity measure

Abstract

Access this article

Similar content being viewed by others

A Novel Graph Clustering Algorithm Based on Structural Attribute Neighborhood Similarity (SANS)

Weighted clustering of attributed multi-graphs

Local Graph Clustering by Multi-network Random Walk with Restart

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation