Skip to main content

Modularity-Driven Kernel k-means for Community Detection

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2017 (ICANN 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10614))

Included in the following conference series:

Abstract

The k-means algorithm is probably the most well-known and most popular clustering method in existence today. This work evaluates if a new, autonomous, kernel k-means approach for graph node clustering coupled with the modularity criterion can rival, e.g., the well-established Louvain method. We test the algorithm on social network datasets of various sizes and types. The new method estimates the optimal kernel or distance parameters as well as the natural number of clusters in the dataset based on modularity. Results indicate that this simple black-box algorithm manages to perform on par with the Louvain method given the same input.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Berkhin, P.: A survey of clustering data mining techniques. In: Kogan, J., Nicholas, C., Teboulle, M. (eds.) Grouping Multidimensional Data, pp. 25–71. Springer, Heidelberg (2006). doi:10.1007/3-540-28349-8_2

    Chapter  Google Scholar 

  2. Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech.: Theory Exp. 10, P10008 (2008)

    Article  Google Scholar 

  3. Bolla, M.: Penalized versions of the Newman-Girvan modularity and their relation to normalized cuts and k-means clustering. Phys. Rev. E 84(1), 016108 (2011)

    Article  Google Scholar 

  4. Borg, I., Groenen, P.: Modern Multidimensional Scaling: Theory and Applications, 2nd edn. Springer, Heidelberg (1997)

    Book  MATH  Google Scholar 

  5. Brandes, U., Delling, D., Gaertler, M., Görke, R., Hoefer, M., Nikoloski, Z., Wagner, D.: On modularity clustering. IEEE Trans. Knowl. Data Eng. 20, 172–188 (2008)

    Article  MATH  Google Scholar 

  6. Chebotarev, P.: A class of graph-geodetic distances generalizing the shortest-path and the resistance distances. Discrete Appl. Math. 159(5), 295–302 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  7. Chebotarev, P.: The graph bottleneck identity. Adv. Appl. Math. 47(3), 403–413 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  8. Chebotarev, P.: The walk distances in graphs. Discrete Appl. Math. 160(10–11), 1484–1500 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  9. Chebotarev, P., Shamis, E.: The matrix-forest theorem and measuring relations in small social groups. Autom. Remote Control 58(9), 1505–1514 (1997)

    MATH  Google Scholar 

  10. Chebotarev, P., Shamis, E.: The forest metric for graph vertices. Electron. Notes Discrete Math. 11, 98–107 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  11. Chung, F., Yau, S.T.: Coverings, heat kernels and spanning trees. J. Comb. 6, 163–184 (1998)

    Google Scholar 

  12. Collignon, A., Maes, F., Delaere, D., Vandermeulen, D., Suetens, P., Marchal, G.: Automated multi-modality image registration based on information theory. Inf. Process. Med. Imaging 3, 263–274 (1995)

    Google Scholar 

  13. Cormen, T., Leiserson, C., Rivest, R., Stein, C.: Introduction to Algorithms, 3rd edn. The MIT Press, Cambridge (2009)

    MATH  Google Scholar 

  14. Daniel, W.W.: Applied Non-parametric Statistics. The Duxbury Advanced Series in Statistics and Decision Sciences. PWS-Kent Publishing Company, Boston (1990)

    Google Scholar 

  15. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MATH  MathSciNet  Google Scholar 

  16. Devooght, R., Mantrach, A., Kivimaki, I., Bersini, H., Jaimes, A., Saerens, M.: Random walks based modularity: application to semi-supervised learning. In: Proceedings of the 23rd International World Wide Web Conference (WWW 2014), pp. 213–224 (2014)

    Google Scholar 

  17. Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, Hoboken (1973)

    MATH  Google Scholar 

  18. Estrada, E.: The communicability distance in graphs. Linear Algebra Appl. 436(11), 4317–4328 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  19. Fortunato, S., Barthelemy, M.: Resolution limit in community detection. Proc. Natl. Acad. Sci. U.S.A. 104(1), 36–41 (2007)

    Article  Google Scholar 

  20. Fouss, F., Saerens, M., Shimbo, M.: Algorithms and Models for Network Data and Link Analysis. Cambridge University Press, Cambridge (2016)

    Book  Google Scholar 

  21. Fouss, F., Yen, L., Pirotte, A., Saerens, M.: An experimental investigation of graph kernels on a collaborative recommendation task. In: Proceedings of the 6th International Conference on Data Mining (ICDM 2006), pp. 863–868 (2006)

    Google Scholar 

  22. Françoisse, K., Kivimäki, I., Mantrach, A., Rossi, F., Saerens, M.: A bag-of-paths framework for network data analysis. Neural Netw. 90, 90–111 (2017)

    Article  Google Scholar 

  23. Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. 99, 7821–7826 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  24. Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)

    Article  MATH  Google Scholar 

  25. Ito, T., Shimbo, M., Kudo, T., Matsumoto, Y.: Application of kernels to link analysis. In: Proceedings of the eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 586–592 (2005)

    Google Scholar 

  26. Ivashkin, V., Chebotarev, P.: Do logarithmic proximity measures outperform plain ones in graph clustering? In: Proceedings of 6th International Conference on Network Analysis (2016)

    Google Scholar 

  27. Kandola, J., Cristianini, N., Shawe-Taylor, J.: Learning semantic similarity. In: Advances in Neural Information Processing Systems (NIPS 2002), vol. 15, pp. 657–664 (2002)

    Google Scholar 

  28. Katz, L.: A new status index derived from sociometric analysis. Psychmetrika 18(1), 39–43 (1953)

    Article  MATH  Google Scholar 

  29. Kivimäki, I., Lebichot, B., Saerens, M.: Developments in the theory of randomized shortest paths with a comparison of graph node distances. Phys. A: Stat. Mech. Appl. 393, 600–616 (2014)

    Article  Google Scholar 

  30. Kondor, R.I., Lafferty, J.: Diffusion kernels on graphs and other discrete structures. In: Proceedings of the 19th International Conference on Machine Learning (ICML 2002), pp. 315–322 (2002)

    Google Scholar 

  31. Krebs, V.: New political patterns (2008). http://www.orgnet.com/divided.html

  32. Lancichinetti, A., Fortunato, S.: Limits of modularity maximization in community detection. Phys. Rev. E 84(6), 066122 (2011)

    Article  Google Scholar 

  33. Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78(4), 46–110 (2008)

    Article  Google Scholar 

  34. Lang, K.: 20 newsgroups dataset. http://bit.ly/lang-newsgroups

  35. Newman, M.E.J.: Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E 74(3), 036104 (2006)

    Article  MathSciNet  Google Scholar 

  36. Newman, M.E.J.: Modularity and community structure in networks. Proc. Natl. Acad. Sci. 103, 8577–8582 (2006)

    Article  Google Scholar 

  37. Newman, M.E.J.: Networks: An Introduction. Oxford University Press, Oxford (2010)

    Book  MATH  Google Scholar 

  38. Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 (2004)

    Article  Google Scholar 

  39. Reichardt, J., Bornholdt, S.: Detecting fuzzy community structures in complex networks with a Potts model. Phys. Rev. Lett. 93(21), 218701 (2004)

    Article  Google Scholar 

  40. Saerens, M., Achbany, Y., Fouss, F., Yen, L.: Randomized shortest-path problems: two related models. Neural Comput. 21(8), 2363–2404 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  41. Siegel, S.: Non-parametric Statistics for the Behavioral Sciences. McGraw-Hill, New York city (1956)

    MATH  Google Scholar 

  42. Smola, A.J., Kondor, R.: Kernels and regularization on graphs. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT-Kernel 2003. LNCS, vol. 2777, pp. 144–158. Springer, Heidelberg (2003). doi:10.1007/978-3-540-45167-9_12

    Chapter  Google Scholar 

  43. Sommer, F., Fouss, F., Saerens, M.: Comparison of graph node distances on clustering tasks. In: Villa, A.E.P., Masulli, P., Pons Rivero, A.J. (eds.) ICANN 2016. LNCS, vol. 9886, pp. 192–201. Springer, Cham (2016). doi:10.1007/978-3-319-44778-0_23

    Chapter  Google Scholar 

  44. von Luxburg, U., Radl, A., Hein, M.: Getting lost in space: large sample analysis of the commute distance. In: Proceedings of the 23th Neural Information Processing Systems conference (NIPS 2010), pp. 2622–2630 (2010)

    Google Scholar 

  45. Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)

    Article  Google Scholar 

  46. Yen, L., Fouss, F., Decaestecker, C., Francq, P., Saerens, M.: Graph nodes clustering based on the commute-time kernel. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 1037–1045. Springer, Heidelberg (2007). doi:10.1007/978-3-540-71701-0_117

    Chapter  Google Scholar 

  47. Yen, L., Fouss, F., Decaestecker, C., Francq, P., Saerens, M.: Graph nodes clustering with the sigmoid commute-time kernel: a comparative study. Data Knowl. Eng. 68(3), 338–361 (2009)

    Article  Google Scholar 

  48. Yen, L., Mantrach, A., Shimbo, M., Saerens, M.: A family of dissimilarity measures between nodes generalizing both the shortest-path and the commute-time distances. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2008), pp. 785–793 (2008)

    Google Scholar 

  49. Zachary, W.W.: An information flow model for conflict and fission in small groups. J. Anthropol. Res. 33, 452–473 (1977)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the FNRS and UCL (FSR) through a PhD scholarship. This work was also partially supported by the Immediate and the Brufence projects funded by InnovIris (Brussels Region). We thank these institutions for giving us the opportunity to conduct both fundamental and applied research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Felix Sommer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Sommer, F., Fouss, F., Saerens, M. (2017). Modularity-Driven Kernel k-means for Community Detection. In: Lintas, A., Rovetta, S., Verschure, P., Villa, A. (eds) Artificial Neural Networks and Machine Learning – ICANN 2017. ICANN 2017. Lecture Notes in Computer Science(), vol 10614. Springer, Cham. https://doi.org/10.1007/978-3-319-68612-7_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-68612-7_48

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-68611-0

  • Online ISBN: 978-3-319-68612-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics