Modularity-Driven Kernel k-means for Community Detection

Sommer, Felix; Fouss, François; Saerens, Marco

doi:10.1007/978-3-319-68612-7_48

Felix Sommer¹⁷,
François Fouss¹⁷ &
Marco Saerens¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10614))

Included in the following conference series:

International Conference on Artificial Neural Networks

4368 Accesses
5 Citations

Abstract

The k-means algorithm is probably the most well-known and most popular clustering method in existence today. This work evaluates if a new, autonomous, kernel k-means approach for graph node clustering coupled with the modularity criterion can rival, e.g., the well-established Louvain method. We test the algorithm on social network datasets of various sizes and types. The new method estimates the optimal kernel or distance parameters as well as the natural number of clusters in the dataset based on modularity. Results indicate that this simple black-box algorithm manages to perform on par with the Louvain method given the same input.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Berkhin, P.: A survey of clustering data mining techniques. In: Kogan, J., Nicholas, C., Teboulle, M. (eds.) Grouping Multidimensional Data, pp. 25–71. Springer, Heidelberg (2006). doi:10.1007/3-540-28349-8_2
Chapter Google Scholar
Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech.: Theory Exp. 10, P10008 (2008)
Article Google Scholar
Bolla, M.: Penalized versions of the Newman-Girvan modularity and their relation to normalized cuts and k-means clustering. Phys. Rev. E 84(1), 016108 (2011)
Article Google Scholar
Borg, I., Groenen, P.: Modern Multidimensional Scaling: Theory and Applications, 2nd edn. Springer, Heidelberg (1997)
Book MATH Google Scholar
Brandes, U., Delling, D., Gaertler, M., Görke, R., Hoefer, M., Nikoloski, Z., Wagner, D.: On modularity clustering. IEEE Trans. Knowl. Data Eng. 20, 172–188 (2008)
Article MATH Google Scholar
Chebotarev, P.: A class of graph-geodetic distances generalizing the shortest-path and the resistance distances. Discrete Appl. Math. 159(5), 295–302 (2011)
Article MATH MathSciNet Google Scholar
Chebotarev, P.: The graph bottleneck identity. Adv. Appl. Math. 47(3), 403–413 (2011)
Article MATH MathSciNet Google Scholar
Chebotarev, P.: The walk distances in graphs. Discrete Appl. Math. 160(10–11), 1484–1500 (2012)
Article MATH MathSciNet Google Scholar
Chebotarev, P., Shamis, E.: The matrix-forest theorem and measuring relations in small social groups. Autom. Remote Control 58(9), 1505–1514 (1997)
MATH Google Scholar
Chebotarev, P., Shamis, E.: The forest metric for graph vertices. Electron. Notes Discrete Math. 11, 98–107 (2002)
Article MATH MathSciNet Google Scholar
Chung, F., Yau, S.T.: Coverings, heat kernels and spanning trees. J. Comb. 6, 163–184 (1998)
Google Scholar
Collignon, A., Maes, F., Delaere, D., Vandermeulen, D., Suetens, P., Marchal, G.: Automated multi-modality image registration based on information theory. Inf. Process. Med. Imaging 3, 263–274 (1995)
Google Scholar
Cormen, T., Leiserson, C., Rivest, R., Stein, C.: Introduction to Algorithms, 3rd edn. The MIT Press, Cambridge (2009)
MATH Google Scholar
Daniel, W.W.: Applied Non-parametric Statistics. The Duxbury Advanced Series in Statistics and Decision Sciences. PWS-Kent Publishing Company, Boston (1990)
Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MATH MathSciNet Google Scholar
Devooght, R., Mantrach, A., Kivimaki, I., Bersini, H., Jaimes, A., Saerens, M.: Random walks based modularity: application to semi-supervised learning. In: Proceedings of the 23rd International World Wide Web Conference (WWW 2014), pp. 213–224 (2014)
Google Scholar
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, Hoboken (1973)
MATH Google Scholar
Estrada, E.: The communicability distance in graphs. Linear Algebra Appl. 436(11), 4317–4328 (2012)
Article MATH MathSciNet Google Scholar
Fortunato, S., Barthelemy, M.: Resolution limit in community detection. Proc. Natl. Acad. Sci. U.S.A. 104(1), 36–41 (2007)
Article Google Scholar
Fouss, F., Saerens, M., Shimbo, M.: Algorithms and Models for Network Data and Link Analysis. Cambridge University Press, Cambridge (2016)
Book Google Scholar
Fouss, F., Yen, L., Pirotte, A., Saerens, M.: An experimental investigation of graph kernels on a collaborative recommendation task. In: Proceedings of the 6th International Conference on Data Mining (ICDM 2006), pp. 863–868 (2006)
Google Scholar
Françoisse, K., Kivimäki, I., Mantrach, A., Rossi, F., Saerens, M.: A bag-of-paths framework for network data analysis. Neural Netw. 90, 90–111 (2017)
Article Google Scholar
Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. 99, 7821–7826 (2002)
Article MATH MathSciNet Google Scholar
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)
Article MATH Google Scholar
Ito, T., Shimbo, M., Kudo, T., Matsumoto, Y.: Application of kernels to link analysis. In: Proceedings of the eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 586–592 (2005)
Google Scholar
Ivashkin, V., Chebotarev, P.: Do logarithmic proximity measures outperform plain ones in graph clustering? In: Proceedings of 6th International Conference on Network Analysis (2016)
Google Scholar
Kandola, J., Cristianini, N., Shawe-Taylor, J.: Learning semantic similarity. In: Advances in Neural Information Processing Systems (NIPS 2002), vol. 15, pp. 657–664 (2002)
Google Scholar
Katz, L.: A new status index derived from sociometric analysis. Psychmetrika 18(1), 39–43 (1953)
Article MATH Google Scholar
Kivimäki, I., Lebichot, B., Saerens, M.: Developments in the theory of randomized shortest paths with a comparison of graph node distances. Phys. A: Stat. Mech. Appl. 393, 600–616 (2014)
Article Google Scholar
Kondor, R.I., Lafferty, J.: Diffusion kernels on graphs and other discrete structures. In: Proceedings of the 19th International Conference on Machine Learning (ICML 2002), pp. 315–322 (2002)
Google Scholar
Krebs, V.: New political patterns (2008). http://www.orgnet.com/divided.html
Lancichinetti, A., Fortunato, S.: Limits of modularity maximization in community detection. Phys. Rev. E 84(6), 066122 (2011)
Article Google Scholar
Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78(4), 46–110 (2008)
Article Google Scholar
Lang, K.: 20 newsgroups dataset. http://bit.ly/lang-newsgroups
Newman, M.E.J.: Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E 74(3), 036104 (2006)
Article MathSciNet Google Scholar
Newman, M.E.J.: Modularity and community structure in networks. Proc. Natl. Acad. Sci. 103, 8577–8582 (2006)
Article Google Scholar
Newman, M.E.J.: Networks: An Introduction. Oxford University Press, Oxford (2010)
Book MATH Google Scholar
Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 (2004)
Article Google Scholar
Reichardt, J., Bornholdt, S.: Detecting fuzzy community structures in complex networks with a Potts model. Phys. Rev. Lett. 93(21), 218701 (2004)
Article Google Scholar
Saerens, M., Achbany, Y., Fouss, F., Yen, L.: Randomized shortest-path problems: two related models. Neural Comput. 21(8), 2363–2404 (2009)
Article MATH MathSciNet Google Scholar
Siegel, S.: Non-parametric Statistics for the Behavioral Sciences. McGraw-Hill, New York city (1956)
MATH Google Scholar
Smola, A.J., Kondor, R.: Kernels and regularization on graphs. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT-Kernel 2003. LNCS, vol. 2777, pp. 144–158. Springer, Heidelberg (2003). doi:10.1007/978-3-540-45167-9_12
Chapter Google Scholar
Sommer, F., Fouss, F., Saerens, M.: Comparison of graph node distances on clustering tasks. In: Villa, A.E.P., Masulli, P., Pons Rivero, A.J. (eds.) ICANN 2016. LNCS, vol. 9886, pp. 192–201. Springer, Cham (2016). doi:10.1007/978-3-319-44778-0_23
Chapter Google Scholar
von Luxburg, U., Radl, A., Hein, M.: Getting lost in space: large sample analysis of the commute distance. In: Proceedings of the 23th Neural Information Processing Systems conference (NIPS 2010), pp. 2622–2630 (2010)
Google Scholar
Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)
Article Google Scholar
Yen, L., Fouss, F., Decaestecker, C., Francq, P., Saerens, M.: Graph nodes clustering based on the commute-time kernel. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 1037–1045. Springer, Heidelberg (2007). doi:10.1007/978-3-540-71701-0_117
Chapter Google Scholar
Yen, L., Fouss, F., Decaestecker, C., Francq, P., Saerens, M.: Graph nodes clustering with the sigmoid commute-time kernel: a comparative study. Data Knowl. Eng. 68(3), 338–361 (2009)
Article Google Scholar
Yen, L., Mantrach, A., Shimbo, M., Saerens, M.: A family of dissimilarity measures between nodes generalizing both the shortest-path and the commute-time distances. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2008), pp. 785–793 (2008)
Google Scholar
Zachary, W.W.: An information flow model for conflict and fission in small groups. J. Anthropol. Res. 33, 452–473 (1977)
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by the FNRS and UCL (FSR) through a PhD scholarship. This work was also partially supported by the Immediate and the Brufence projects funded by InnovIris (Brussels Region). We thank these institutions for giving us the opportunity to conduct both fundamental and applied research.

Author information

Authors and Affiliations

LSM – LouRIM & ICTEAM, Université catholique de Louvain, Chaussée de Binche 151, 7000, Mons, Belgium
Felix Sommer, François Fouss & Marco Saerens

Authors

Felix Sommer
View author publications
You can also search for this author in PubMed Google Scholar
François Fouss
View author publications
You can also search for this author in PubMed Google Scholar
Marco Saerens
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Felix Sommer .

Editor information

Editors and Affiliations

University of Lausanne, Lausanne, Switzerland
Alessandra Lintas
University of Genoa, Genoa, Italy
Stefano Rovetta
Universitat Pompeu Fabra, Barcelona, Spain
Paul F.M.J. Verschure
University of Lausanne, Lausanne, Switzerland
Alessandro E.P. Villa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sommer, F., Fouss, F., Saerens, M. (2017). Modularity-Driven Kernel k-means for Community Detection. In: Lintas, A., Rovetta, S., Verschure, P., Villa, A. (eds) Artificial Neural Networks and Machine Learning – ICANN 2017. ICANN 2017. Lecture Notes in Computer Science(), vol 10614. Springer, Cham. https://doi.org/10.1007/978-3-319-68612-7_48

Download citation

DOI: https://doi.org/10.1007/978-3-319-68612-7_48
Published: 25 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68611-0
Online ISBN: 978-3-319-68612-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics