On the Equivalence Between Community Discovery and Clustering

Guidotti, Riccardo; Coscia, Michele

doi:10.1007/978-3-319-76111-4_34

Riccardo Guidotti²⁰ &
Michele Coscia^21,22

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 233))

Included in the following conference series:

International Conference on Smart Objects and Technologies for Social Good

929 Accesses
3 Citations

Abstract

Clustering is the subset of data mining techniques used to agnostically classify entities by looking at their attributes. Clustering algorithms specialized to deal with complex networks are called community discovery. Notwithstanding their common objectives, there are crucial assumptions in community discovery – edge sparsity and only one node type, among others – which makes its mapping to clustering non trivial. In this paper, we propose a community discovery to clustering mapping, by focusing on transactional data clustering. We represent a network as a transactional dataset, and we find communities by grouping nodes with common items (neighbors) in their baskets (neighbor lists). By comparing our results with ground truth communities and state of the art community discovery methods, we show that transactional clustering algorithms are a feasible alternative to community discovery, and that a complete mapping of the two problems is possible.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Note that this is far from being an unproblematic definition [11], but it will do for the scope of this paper.
2.
https://snap.stanford.edu/data/#communities.

References

Blondel, V.D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. JSTAT 2008(10), P10008 (2008)
Article Google Scholar
Bouguessa, M.: A practical approach for clustering transaction data. In: Perner, P. (ed.) MLDM 2011. LNCS (LNAI), vol. 6871, pp. 265–279. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23199-5_20
Chapter Google Scholar
Coscia, M., Giannotti, F., Pedreschi, D.: A classification for community discovery methods in complex networks. SADM 4(5), 512–546 (2011)
MathSciNet Google Scholar
Coscia, M., Neffke, F.M.: Network backboning with noisy data. In: ICDE, pp. 425–436. IEEE (2017)
Google Scholar
Coscia, M., Rossetti, G., Giannotti, F., Pedreschi, D.: Uncovering hierarchical and overlapping communities with a local-first approach. TKDD 9(1), 6 (2014)
Article Google Scholar
Guidotti, R., Monreale, A., Nanni, M., Giannotti, F., Pedreschi, D.: Clustering individual transactional data for masses of users. In: KDD (2017)
Google Scholar
Huang, X., Lai, W.: Clustering graphs for visualization via node similarities. J. Vis. Lang. Comput. 17(3), 225–253 (2006)
Article Google Scholar
Li, H., Nie, Z., Lee, W.-C., Giles, L., Wen, J.-R.: Scalable community discovery on textual data with relations. In: CIKM, pp. 1203–1212. ACM (2008)
Google Scholar
Malliaros, F.D., Vazirgiannis, M.: Clustering and community detection in directed networks: a survey. Phys. Rep. 533(4), 95–142 (2013)
Article MathSciNet MATH Google Scholar
Niu, S., Wang, D., Feng, S., Yu, G.: An improved spectral clustering algorithm for community discovery. In: HIS, vol. 3, pp. 262–267. IEEE (2009)
Google Scholar
Peel, L., Larremore, D.B., Clauset, A.: The ground truth about metadata and community detection in networks. Sci. Adv. 3(5), e1602548 (2017)
Article Google Scholar
Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76(3), 036106 (2007)
Article Google Scholar
Rosvall, M., Bergstrom, C.T.: Maps of random walks on complex networks reveal community structure. PNAS 105(4), 1118–1123 (2008)
Article Google Scholar
Satuluri, V., Parthasarathy, S.: Scalable graph clustering using stochastic flows: applications to community discovery. In: KDD, pp. 737–746. ACM (2009)
Google Scholar
Schwarz, G., et al.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)
Article MathSciNet MATH Google Scholar
Shannon, C.: A mathematical theory of communication. SIGMOBILE 5, 3–55 (2001)
Google Scholar
Tan, P.-N., Steinbach, M., Kumar, V., et al.: Introduction to Data Mining, vol. 1. Pearson Addison Wesley, Boston (2006)
Google Scholar
Tsuda, K., Kudo, T.: Clustering graphs by weighted substructure mining. In: ICML, pp. 953–960. ACM (2006)
Google Scholar
Vinh, N.X., et al.: Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: ICML, pp. 1073–1080. ACM (2009)
Google Scholar
Wang, F., Li, T., Wang, X., Zhu, S., Ding, C.: Community discovery using nonnegative matrix factorization. DAMI 22(3), 493–521 (2011)
MathSciNet MATH Google Scholar
Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. Knowl. Inf. Syst. 42(1), 181–213 (2015)
Article Google Scholar
Yang, Y., Guan, X., You, J.: Clope: a fast and effective clustering algorithm for transactional data. In: TKDE, pp. 682–687. ACM (2002)
Google Scholar

Download references

Acknowledgements

This work is partially supported by the European Project SoBigData: Social Mining & Big Data Ecosystem, http://www.sobigdata.eu, GS501100001809, 654024. Michele Coscia has been partly supported by FNRS, grant #24927961.

Author information

Authors and Affiliations

KDDLab, ISTI-CNR, Pisa, Italy
Riccardo Guidotti
CID, Harvard University, Cambridge, USA
Michele Coscia
Naxys, University of Namur, Namur, Belgium
Michele Coscia

Authors

Riccardo Guidotti
View author publications
You can also search for this author in PubMed Google Scholar
Michele Coscia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Riccardo Guidotti .

Editor information

Editors and Affiliations

University of Pisa, Pisa, Italy
Barbara Guidi
University of Pisa, Pisa, Italy
Laura Ricci
Polytechnic University of Valencia, Valencia, Spain
Carlos Calafate
University of Padua, Padua, Italy
Ombretta Gaggi
University of Antwerp, Antwerp, Belgium
Johann Marquez-Barja

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guidotti, R., Coscia, M. (2018). On the Equivalence Between Community Discovery and Clustering. In: Guidi, B., Ricci, L., Calafate, C., Gaggi, O., Marquez-Barja, J. (eds) Smart Objects and Technologies for Social Good. GOODTECHS 2017. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 233. Springer, Cham. https://doi.org/10.1007/978-3-319-76111-4_34

Download citation

DOI: https://doi.org/10.1007/978-3-319-76111-4_34
Published: 17 February 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-76110-7
Online ISBN: 978-3-319-76111-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics