Skip to main content

On the Equivalence Between Community Discovery and Clustering

  • Conference paper
  • First Online:
Smart Objects and Technologies for Social Good (GOODTECHS 2017)

Abstract

Clustering is the subset of data mining techniques used to agnostically classify entities by looking at their attributes. Clustering algorithms specialized to deal with complex networks are called community discovery. Notwithstanding their common objectives, there are crucial assumptions in community discovery – edge sparsity and only one node type, among others – which makes its mapping to clustering non trivial. In this paper, we propose a community discovery to clustering mapping, by focusing on transactional data clustering. We represent a network as a transactional dataset, and we find communities by grouping nodes with common items (neighbors) in their baskets (neighbor lists). By comparing our results with ground truth communities and state of the art community discovery methods, we show that transactional clustering algorithms are a feasible alternative to community discovery, and that a complete mapping of the two problems is possible.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Note that this is far from being an unproblematic definition [11], but it will do for the scope of this paper.

  2. 2.

    https://snap.stanford.edu/data/#communities.

References

  1. Blondel, V.D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. JSTAT 2008(10), P10008 (2008)

    Article  Google Scholar 

  2. Bouguessa, M.: A practical approach for clustering transaction data. In: Perner, P. (ed.) MLDM 2011. LNCS (LNAI), vol. 6871, pp. 265–279. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23199-5_20

    Chapter  Google Scholar 

  3. Coscia, M., Giannotti, F., Pedreschi, D.: A classification for community discovery methods in complex networks. SADM 4(5), 512–546 (2011)

    MathSciNet  Google Scholar 

  4. Coscia, M., Neffke, F.M.: Network backboning with noisy data. In: ICDE, pp. 425–436. IEEE (2017)

    Google Scholar 

  5. Coscia, M., Rossetti, G., Giannotti, F., Pedreschi, D.: Uncovering hierarchical and overlapping communities with a local-first approach. TKDD 9(1), 6 (2014)

    Article  Google Scholar 

  6. Guidotti, R., Monreale, A., Nanni, M., Giannotti, F., Pedreschi, D.: Clustering individual transactional data for masses of users. In: KDD (2017)

    Google Scholar 

  7. Huang, X., Lai, W.: Clustering graphs for visualization via node similarities. J. Vis. Lang. Comput. 17(3), 225–253 (2006)

    Article  Google Scholar 

  8. Li, H., Nie, Z., Lee, W.-C., Giles, L., Wen, J.-R.: Scalable community discovery on textual data with relations. In: CIKM, pp. 1203–1212. ACM (2008)

    Google Scholar 

  9. Malliaros, F.D., Vazirgiannis, M.: Clustering and community detection in directed networks: a survey. Phys. Rep. 533(4), 95–142 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  10. Niu, S., Wang, D., Feng, S., Yu, G.: An improved spectral clustering algorithm for community discovery. In: HIS, vol. 3, pp. 262–267. IEEE (2009)

    Google Scholar 

  11. Peel, L., Larremore, D.B., Clauset, A.: The ground truth about metadata and community detection in networks. Sci. Adv. 3(5), e1602548 (2017)

    Article  Google Scholar 

  12. Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76(3), 036106 (2007)

    Article  Google Scholar 

  13. Rosvall, M., Bergstrom, C.T.: Maps of random walks on complex networks reveal community structure. PNAS 105(4), 1118–1123 (2008)

    Article  Google Scholar 

  14. Satuluri, V., Parthasarathy, S.: Scalable graph clustering using stochastic flows: applications to community discovery. In: KDD, pp. 737–746. ACM (2009)

    Google Scholar 

  15. Schwarz, G., et al.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  16. Shannon, C.: A mathematical theory of communication. SIGMOBILE 5, 3–55 (2001)

    Google Scholar 

  17. Tan, P.-N., Steinbach, M., Kumar, V., et al.: Introduction to Data Mining, vol. 1. Pearson Addison Wesley, Boston (2006)

    Google Scholar 

  18. Tsuda, K., Kudo, T.: Clustering graphs by weighted substructure mining. In: ICML, pp. 953–960. ACM (2006)

    Google Scholar 

  19. Vinh, N.X., et al.: Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: ICML, pp. 1073–1080. ACM (2009)

    Google Scholar 

  20. Wang, F., Li, T., Wang, X., Zhu, S., Ding, C.: Community discovery using nonnegative matrix factorization. DAMI 22(3), 493–521 (2011)

    MathSciNet  MATH  Google Scholar 

  21. Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. Knowl. Inf. Syst. 42(1), 181–213 (2015)

    Article  Google Scholar 

  22. Yang, Y., Guan, X., You, J.: Clope: a fast and effective clustering algorithm for transactional data. In: TKDE, pp. 682–687. ACM (2002)

    Google Scholar 

Download references

Acknowledgements

This work is partially supported by the European Project SoBigData: Social Mining & Big Data Ecosystem, http://www.sobigdata.eu, GS501100001809, 654024. Michele Coscia has been partly supported by FNRS, grant #24927961.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Riccardo Guidotti .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Guidotti, R., Coscia, M. (2018). On the Equivalence Between Community Discovery and Clustering. In: Guidi, B., Ricci, L., Calafate, C., Gaggi, O., Marquez-Barja, J. (eds) Smart Objects and Technologies for Social Good. GOODTECHS 2017. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 233. Springer, Cham. https://doi.org/10.1007/978-3-319-76111-4_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-76111-4_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-76110-7

  • Online ISBN: 978-3-319-76111-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics