Abstract
This work presents recent developments in graph node distances and tests them empirically on social network databases of various sizes and types. We compare two versions of a distance-based kernel k-means algorithm with the well-established Louvain method. The first version is a classic kernel k-means approach, the second version additionally makes use of node weights with the Sum-over-Forests density index. Both kernel k-means algorithms employ a variety of classic and modern distances. We compare the results of all three algorithms using statistical measures and an overall rank-comparison to ascertain their capabilities in community detection. Results show that two recently introduced distances outperform the others, on our tested datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bavaud, F., Guex, G.: Interpolating between random walks and shortest paths: a path functional approach. In: Aberer, K., Flache, A., Jager, W., Liu, L., Tang, J., Guéret, C. (eds.) SocInfo 2012. LNCS, vol. 7710, pp. 68–81. Springer, Heidelberg (2012)
Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theor. Exp. 2008, P10008 (2008)
Borg, I., Groenen, P.: Modern Multidimensional Scaling. Springer, New York (1997)
Celeux, G., Diday, E., Govaert, G., Lechevallier, Y., Ralambondrainy, H.: Classification Automatique des Données. Dunod, Paris (1989)
Chebotarev, P., Shamis, E.: The matrix-forest theorem and measuring relations in small social groups. Autom. Remote Control 58(9), 1505–1514 (1997)
Chebotarev, P.: A class of graph-geodetic distances generalizing the shortest-path and the resistance distances. Discrete Appl. Math. 159(5), 295–302 (2011)
Chebotarev, P.: The graph bottleneck identity. Adv. Appl. Math. 47(3), 403–413 (2011)
Collignon, A., Maes, F., Delaere, D., Vandermeulen, D., Suetens, P., Marchal, G.: Automated multi-modality image registration based on information theory. Inf. Process. Med. Imaging 3, 263–274 (1995)
Cormen, T., Leiserson, C., Rivest, R., Stein, C.: Introduction to Algorithms, 3rd edn. The MIT Press, Cambridge (2009)
Daniel, W.: Applied Nonparametric Statistics. The Duxbury Advanced Series in Statistics and Decision Sciences. PWS-Kent Publications, Boston (1990)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)
Fouss, F., Saerens, M., Shimbo, M.: Algorithms for Exploratory Link Analysis. Cambridge University Press (2016, to appear)
Françoisse, K., Kivimäki, I., Mantrach, A., Rossi, F., Saerens, M.: A bag-of-paths framework for network data analysis, pp. 1–36 (2013). arXiv:1302.6766
Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. In: Proceedings of the National Academy of Sciences, vol. 99, pp. 7821–7826. National Academy of Sciences (2002)
Grady, L., Schwartz, E.: The graph analysis toolbox: image processing on arbitrary graphs. CAS/CNS Technical report Series (021) (2010)
Hashimoto, T., Sun, Y., Jaakkola, T.: From random walks to distances on unweighted graphs. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 3411–3419. Curran Associates, Inc. (2015)
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)
Kaufmann, L., Rousseeuw, P.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990)
Kivimäki, I., Lebichot, B., Saerens, M.: Developments in the theory of randomized shortest paths with a comparison of graph node distances. Physica A Stat. Mech. Appl. 393, 600–616 (2014)
Krebs, V.: New political patterns (2008). http://www.orgnet.com/divided.html
Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78(4), 46–110 (2008)
Lang, K.: 20 newsgroups dataset. http://bit.ly/lang-newsgroups
Newman, M.E.J.: Networks: An Introduction. Oxford University Press, New York (2010)
Newman, M.E.J.: Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E 74(3), 036104 (2006)
Newman, M.E.J.: Modularity and community structure in networks. Proc. Natl. Acad. Sci. (USA) 103, 8577–8582 (2006)
Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 (2004)
Saerens, M., Achbany, Y., Fouss, F., Yen, L.: Randomized shortest-path problems: two related models. Neural Comput. 21(8), 2363–2404 (2009)
Schölkopf, B., Smola, A.: Learning with Kernels. The MIT Press, Cambridge (2002)
Senelle, M., Garcia-Diez, S., Mantrach, A., Shimbo, M., Saerens, M., Fouss, F.: The sum-over-forests density index: identifying dense regions in a graph. IEEE Trans. Pattern Anal. Mach. Intell. 36(6), 1268–1274 (2014). arXiv:1301.0725
Siegel, S.: Nonparametric Statistics for the Behavioral Sciences. McGraw-Hill, New York (1956)
Sommer, F., Fouss, F., Saerens, M.: Clustering using a Sum-Over-Forests weighted kernel k-means approach. LSM Working Paper 22 (2015)
von Luxburg, U., Radl, A., Hein, M.: Getting lost in space: large sample analysis of the commute distance. In: Proceedings of the 23th Neural Information Processing Systems Conference (NIPS 2010), pp. 2622–2630 (2010)
von Luxburg, U., Radl, A., Hein, M.: Hitting and commute times in large random neighborhood graphs. J. Mach. Learn. Res. 15, 1751–1798 (2014)
Yen, L., Fouss, F., Decaestecker, C., Francq, P., Saerens, M.: Graph nodes clustering based on the commute-time Kernel. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 1037–1045. Springer, Heidelberg (2007)
Yen, L., Fouss, F., Decaestecker, C., Francq, P., Saerens, M.: Graph nodes clustering with the sigmoid commute-time kernel: a comprehensive study. Data Knowl. Eng. 68(3), 338–361 (2008)
Yen, L., Mantrach, A., Shimbo, M., Saerens, M.: A family of dissimilarity measures between nodes generalizing both the shortest-path and the commute-time distances. In: Proceedings of the 14th SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2008), pp. 785–793 (2008)
Zachary, W.W.: An information flow model for conflict and fission in small groups. J. Anthropol. Res. 33, 452–473 (1977)
Acknowledgements
We would like to thank our Master students Joëlle Van Damme and Augustin Collette for their valuable assistance in realizing this work. This work is supported in part by the FNRS through a PhD scholarship. This work was also partially supported by the Immediate and the Brufence projects funded by InnovIris (Brussels Region). We thank these institutions for giving us the opportunity to conduct both fundamental and applied research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Sommer, F., Fouss, F., Saerens, M. (2016). Comparison of Graph Node Distances on Clustering Tasks. In: Villa, A., Masulli, P., Pons Rivero, A. (eds) Artificial Neural Networks and Machine Learning – ICANN 2016. ICANN 2016. Lecture Notes in Computer Science(), vol 9886. Springer, Cham. https://doi.org/10.1007/978-3-319-44778-0_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-44778-0_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44777-3
Online ISBN: 978-3-319-44778-0
eBook Packages: Computer ScienceComputer Science (R0)