Abstract
Clustering is the subset of data mining techniques used to agnostically classify entities by looking at their attributes. Clustering algorithms specialized to deal with complex networks are called community discovery. Notwithstanding their common objectives, there are crucial assumptions in community discovery – edge sparsity and only one node type, among others – which makes its mapping to clustering non trivial. In this paper, we propose a community discovery to clustering mapping, by focusing on transactional data clustering. We represent a network as a transactional dataset, and we find communities by grouping nodes with common items (neighbors) in their baskets (neighbor lists). By comparing our results with ground truth communities and state of the art community discovery methods, we show that transactional clustering algorithms are a feasible alternative to community discovery, and that a complete mapping of the two problems is possible.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Note that this is far from being an unproblematic definition [11], but it will do for the scope of this paper.
- 2.
References
Blondel, V.D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. JSTAT 2008(10), P10008 (2008)
Bouguessa, M.: A practical approach for clustering transaction data. In: Perner, P. (ed.) MLDM 2011. LNCS (LNAI), vol. 6871, pp. 265–279. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23199-5_20
Coscia, M., Giannotti, F., Pedreschi, D.: A classification for community discovery methods in complex networks. SADM 4(5), 512–546 (2011)
Coscia, M., Neffke, F.M.: Network backboning with noisy data. In: ICDE, pp. 425–436. IEEE (2017)
Coscia, M., Rossetti, G., Giannotti, F., Pedreschi, D.: Uncovering hierarchical and overlapping communities with a local-first approach. TKDD 9(1), 6 (2014)
Guidotti, R., Monreale, A., Nanni, M., Giannotti, F., Pedreschi, D.: Clustering individual transactional data for masses of users. In: KDD (2017)
Huang, X., Lai, W.: Clustering graphs for visualization via node similarities. J. Vis. Lang. Comput. 17(3), 225–253 (2006)
Li, H., Nie, Z., Lee, W.-C., Giles, L., Wen, J.-R.: Scalable community discovery on textual data with relations. In: CIKM, pp. 1203–1212. ACM (2008)
Malliaros, F.D., Vazirgiannis, M.: Clustering and community detection in directed networks: a survey. Phys. Rep. 533(4), 95–142 (2013)
Niu, S., Wang, D., Feng, S., Yu, G.: An improved spectral clustering algorithm for community discovery. In: HIS, vol. 3, pp. 262–267. IEEE (2009)
Peel, L., Larremore, D.B., Clauset, A.: The ground truth about metadata and community detection in networks. Sci. Adv. 3(5), e1602548 (2017)
Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76(3), 036106 (2007)
Rosvall, M., Bergstrom, C.T.: Maps of random walks on complex networks reveal community structure. PNAS 105(4), 1118–1123 (2008)
Satuluri, V., Parthasarathy, S.: Scalable graph clustering using stochastic flows: applications to community discovery. In: KDD, pp. 737–746. ACM (2009)
Schwarz, G., et al.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)
Shannon, C.: A mathematical theory of communication. SIGMOBILE 5, 3–55 (2001)
Tan, P.-N., Steinbach, M., Kumar, V., et al.: Introduction to Data Mining, vol. 1. Pearson Addison Wesley, Boston (2006)
Tsuda, K., Kudo, T.: Clustering graphs by weighted substructure mining. In: ICML, pp. 953–960. ACM (2006)
Vinh, N.X., et al.: Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: ICML, pp. 1073–1080. ACM (2009)
Wang, F., Li, T., Wang, X., Zhu, S., Ding, C.: Community discovery using nonnegative matrix factorization. DAMI 22(3), 493–521 (2011)
Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. Knowl. Inf. Syst. 42(1), 181–213 (2015)
Yang, Y., Guan, X., You, J.: Clope: a fast and effective clustering algorithm for transactional data. In: TKDE, pp. 682–687. ACM (2002)
Acknowledgements
This work is partially supported by the European Project SoBigData: Social Mining & Big Data Ecosystem, http://www.sobigdata.eu, GS501100001809, 654024. Michele Coscia has been partly supported by FNRS, grant #24927961.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Guidotti, R., Coscia, M. (2018). On the Equivalence Between Community Discovery and Clustering. In: Guidi, B., Ricci, L., Calafate, C., Gaggi, O., Marquez-Barja, J. (eds) Smart Objects and Technologies for Social Good. GOODTECHS 2017. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 233. Springer, Cham. https://doi.org/10.1007/978-3-319-76111-4_34
Download citation
DOI: https://doi.org/10.1007/978-3-319-76111-4_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-76110-7
Online ISBN: 978-3-319-76111-4
eBook Packages: Computer ScienceComputer Science (R0)