Skip to main content

Comparison of Graph Node Distances on Clustering Tasks

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2016 (ICANN 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9886))

Included in the following conference series:

Abstract

This work presents recent developments in graph node distances and tests them empirically on social network databases of various sizes and types. We compare two versions of a distance-based kernel k-means algorithm with the well-established Louvain method. The first version is a classic kernel k-means approach, the second version additionally makes use of node weights with the Sum-over-Forests density index. Both kernel k-means algorithms employ a variety of classic and modern distances. We compare the results of all three algorithms using statistical measures and an overall rank-comparison to ascertain their capabilities in community detection. Results show that two recently introduced distances outperform the others, on our tested datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bavaud, F., Guex, G.: Interpolating between random walks and shortest paths: a path functional approach. In: Aberer, K., Flache, A., Jager, W., Liu, L., Tang, J., Guéret, C. (eds.) SocInfo 2012. LNCS, vol. 7710, pp. 68–81. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  2. Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theor. Exp. 2008, P10008 (2008)

    Article  Google Scholar 

  3. Borg, I., Groenen, P.: Modern Multidimensional Scaling. Springer, New York (1997)

    Book  MATH  Google Scholar 

  4. Celeux, G., Diday, E., Govaert, G., Lechevallier, Y., Ralambondrainy, H.: Classification Automatique des Données. Dunod, Paris (1989)

    MATH  Google Scholar 

  5. Chebotarev, P., Shamis, E.: The matrix-forest theorem and measuring relations in small social groups. Autom. Remote Control 58(9), 1505–1514 (1997)

    MathSciNet  MATH  Google Scholar 

  6. Chebotarev, P.: A class of graph-geodetic distances generalizing the shortest-path and the resistance distances. Discrete Appl. Math. 159(5), 295–302 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  7. Chebotarev, P.: The graph bottleneck identity. Adv. Appl. Math. 47(3), 403–413 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  8. Collignon, A., Maes, F., Delaere, D., Vandermeulen, D., Suetens, P., Marchal, G.: Automated multi-modality image registration based on information theory. Inf. Process. Med. Imaging 3, 263–274 (1995)

    Google Scholar 

  9. Cormen, T., Leiserson, C., Rivest, R., Stein, C.: Introduction to Algorithms, 3rd edn. The MIT Press, Cambridge (2009)

    MATH  Google Scholar 

  10. Daniel, W.: Applied Nonparametric Statistics. The Duxbury Advanced Series in Statistics and Decision Sciences. PWS-Kent Publications, Boston (1990)

    Google Scholar 

  11. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  12. Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)

    MATH  Google Scholar 

  13. Fouss, F., Saerens, M., Shimbo, M.: Algorithms for Exploratory Link Analysis. Cambridge University Press (2016, to appear)

    Google Scholar 

  14. Françoisse, K., Kivimäki, I., Mantrach, A., Rossi, F., Saerens, M.: A bag-of-paths framework for network data analysis, pp. 1–36 (2013). arXiv:1302.6766

  15. Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. In: Proceedings of the National Academy of Sciences, vol. 99, pp. 7821–7826. National Academy of Sciences (2002)

    Google Scholar 

  16. Grady, L., Schwartz, E.: The graph analysis toolbox: image processing on arbitrary graphs. CAS/CNS Technical report Series (021) (2010)

    Google Scholar 

  17. Hashimoto, T., Sun, Y., Jaakkola, T.: From random walks to distances on unweighted graphs. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 3411–3419. Curran Associates, Inc. (2015)

    Google Scholar 

  18. Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)

    Article  MATH  Google Scholar 

  19. Kaufmann, L., Rousseeuw, P.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990)

    Book  Google Scholar 

  20. Kivimäki, I., Lebichot, B., Saerens, M.: Developments in the theory of randomized shortest paths with a comparison of graph node distances. Physica A Stat. Mech. Appl. 393, 600–616 (2014)

    Article  Google Scholar 

  21. Krebs, V.: New political patterns (2008). http://www.orgnet.com/divided.html

  22. Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78(4), 46–110 (2008)

    Article  Google Scholar 

  23. Lang, K.: 20 newsgroups dataset. http://bit.ly/lang-newsgroups

  24. Newman, M.E.J.: Networks: An Introduction. Oxford University Press, New York (2010)

    Book  MATH  Google Scholar 

  25. Newman, M.E.J.: Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E 74(3), 036104 (2006)

    Article  MathSciNet  Google Scholar 

  26. Newman, M.E.J.: Modularity and community structure in networks. Proc. Natl. Acad. Sci. (USA) 103, 8577–8582 (2006)

    Article  Google Scholar 

  27. Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 (2004)

    Article  Google Scholar 

  28. Saerens, M., Achbany, Y., Fouss, F., Yen, L.: Randomized shortest-path problems: two related models. Neural Comput. 21(8), 2363–2404 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  29. Schölkopf, B., Smola, A.: Learning with Kernels. The MIT Press, Cambridge (2002)

    MATH  Google Scholar 

  30. Senelle, M., Garcia-Diez, S., Mantrach, A., Shimbo, M., Saerens, M., Fouss, F.: The sum-over-forests density index: identifying dense regions in a graph. IEEE Trans. Pattern Anal. Mach. Intell. 36(6), 1268–1274 (2014). arXiv:1301.0725

    Article  Google Scholar 

  31. Siegel, S.: Nonparametric Statistics for the Behavioral Sciences. McGraw-Hill, New York (1956)

    MATH  Google Scholar 

  32. Sommer, F., Fouss, F., Saerens, M.: Clustering using a Sum-Over-Forests weighted kernel k-means approach. LSM Working Paper 22 (2015)

    Google Scholar 

  33. von Luxburg, U., Radl, A., Hein, M.: Getting lost in space: large sample analysis of the commute distance. In: Proceedings of the 23th Neural Information Processing Systems Conference (NIPS 2010), pp. 2622–2630 (2010)

    Google Scholar 

  34. von Luxburg, U., Radl, A., Hein, M.: Hitting and commute times in large random neighborhood graphs. J. Mach. Learn. Res. 15, 1751–1798 (2014)

    MathSciNet  MATH  Google Scholar 

  35. Yen, L., Fouss, F., Decaestecker, C., Francq, P., Saerens, M.: Graph nodes clustering based on the commute-time Kernel. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 1037–1045. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  36. Yen, L., Fouss, F., Decaestecker, C., Francq, P., Saerens, M.: Graph nodes clustering with the sigmoid commute-time kernel: a comprehensive study. Data Knowl. Eng. 68(3), 338–361 (2008)

    Article  Google Scholar 

  37. Yen, L., Mantrach, A., Shimbo, M., Saerens, M.: A family of dissimilarity measures between nodes generalizing both the shortest-path and the commute-time distances. In: Proceedings of the 14th SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2008), pp. 785–793 (2008)

    Google Scholar 

  38. Zachary, W.W.: An information flow model for conflict and fission in small groups. J. Anthropol. Res. 33, 452–473 (1977)

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank our Master students Joëlle Van Damme and Augustin Collette for their valuable assistance in realizing this work. This work is supported in part by the FNRS through a PhD scholarship. This work was also partially supported by the Immediate and the Brufence projects funded by InnovIris (Brussels Region). We thank these institutions for giving us the opportunity to conduct both fundamental and applied research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Felix Sommer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Sommer, F., Fouss, F., Saerens, M. (2016). Comparison of Graph Node Distances on Clustering Tasks. In: Villa, A., Masulli, P., Pons Rivero, A. (eds) Artificial Neural Networks and Machine Learning – ICANN 2016. ICANN 2016. Lecture Notes in Computer Science(), vol 9886. Springer, Cham. https://doi.org/10.1007/978-3-319-44778-0_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-44778-0_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-44777-3

  • Online ISBN: 978-3-319-44778-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics