skip to main content
10.1145/2566486.2567986acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Random walks based modularity: application to semi-supervised learning

Published:07 April 2014Publication History

ABSTRACT

Although criticized for some of its limitations, modularity remains a standard measure for analyzing social networks. Quantifying the statistical surprise in the arrangement of the edges of the network has led to simple and powerful algorithms. However, relying solely on the distribution of edges instead of more complex structures such as paths limits the extent of modularity. Indeed, recent studies have shown restrictions of optimizing modularity, for instance its resolution limit. We introduce here a novel, formal and well-defined modularity measure based on random walks. We show how this modularity can be computed from paths induced by the graph instead of the traditionally used edges. We argue that by computing modularity on paths instead of edges, more informative features can be extracted from the network. We verify this hypothesis on a semi-supervised classification procedure of the nodes in the network, where we show that, under the same settings, the features of the random walk modularity help to classify better than the features of the usual modularity. Additionally, the proposed approach outperforms the classical label propagation procedure on two data sets of labeled social networks.

References

  1. A. Arenas, A. Fernandez, S. Fortunato, and S. Gomez. Motif-based communities in complex networks. Journal of Physics A: Mathematical and Theoretical, 41(22):224001, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  2. A. Arenas, A. Fernandez, and S. Gomez. Analysis of the structure of complex networks at different resolution levels. New Journal of Physics, 10(5):053039, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  3. R. Bekkerman, M. Bilenko, and J. Langford. Scaling up machine learning: Parallel and distributed approaches. Cambridge University Press, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10):P10008, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  5. A. Clauset, M. E. Newman, and C. Moore. Finding community structure in very large networks. Physical review E, 70(6):066111, 2004.Google ScholarGoogle Scholar
  6. J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox. Twister: a runtime for iterative mapreduce. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pages 810--818. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Fortunato. Community detection in graphs. Physics Reports, 486(3):75--174, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  8. S. Fortunato and M. Barthéelemy. Resolution limit in community detection. Proceedings of the National Academy of Sciences, 104(1):36, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  9. F. Fouss, K. Francoisse, L. Yen, A. Pirotte, and M. Saerens. An experimental investigation of kernels on graphs for collaborative recommendation and semisupervised classification. Neural Networks, 31:53--72, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. K. Francoisse, I. Kivimaki, A. Mantrach, F. Rossi, and M. Saerens. A bag-of-paths framework for network data analysis. Submitted for publication; available on ArXiv as ArXiv:1302.6766, pages 1--36, 2013.Google ScholarGoogle Scholar
  11. R. Ghosh and K. Lerman. Community detection using a measure of global influence. In Advances in Social Network Mining and Analysis, pages 20--35. Springer, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. T. Jaynes. Information theory and statistical mechanics. Physical review, 106(4):620, 1957.Google ScholarGoogle ScholarCross RefCross Ref
  13. J. Kandola, J. Shawe-Taylor, and N. Cristianini. Learning semantic similarity. Advances in neural information processing systems, 15:657--664, 2002.Google ScholarGoogle Scholar
  14. A. Lancichinetti and S. Fortunato. Limits of modularity maximization in community detection. Physical Review E, 84(6):066122, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  15. A. Lancichinetti, S. Fortunato, and F. Radicchi. Benchmark graphs for testing community detection algorithms. Physical Review E, 78(4):046110, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  16. R. B. Lehoucq, D. C. Sorensen, and C. Yang. ARPACK users' guide: solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods, volume 6. Siam, 1998.Google ScholarGoogle Scholar
  17. J. Leskovec, K. J. Lang, and M. Mahoney. Empirical comparison of algorithms for network community detection. In Proceedings of the 19th international Conference on World Wide Web, pages 631--640. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Lin and M. Schatz. Design patterns for efficient graph algorithms in mapreduce. In Proceedings of the Eighth Workshop on Mining and Learning with Graphs, pages 78--85. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. H. B. Mann and D. R. Whitney. On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics, 18(1):50--60, 1947.Google ScholarGoogle Scholar
  20. C. D. Manning, P. Raghavan, and H. Schutze. Introduction to information retrieval. Cambridge University Press Cambridge, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Mantrach, L. Yen, J. Callut, K. Francoisse, M. Shimbo, and M. Saerens. The sum-over-paths covariance kernel: A novel covariance measure between nodes of a directed graph. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 32(6):1112--1126, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Newman. Networks: an introduction. OUP Oxford, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. E. Newman. Finding community structure in networks using the eigenvectors of matrices. Physical review E, 74(3):036104, 2006.Google ScholarGoogle Scholar
  24. M. E. Newman. Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 103(23):8577--8582, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  25. M. E. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical review E, 69(2):026113, 2004.Google ScholarGoogle Scholar
  26. A. Ozgur, L. Ozgur, and T. Gungor. Text categorization with class-based and corpus-based keyword selection. In Computer and Information Sciences-ISCIS 2005, pages 606--615. Springer, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: bringing order to the web. 1999.Google ScholarGoogle Scholar
  28. J.-Y. Pan, H.-J. Yang, C. Faloutsos, and P. Duygulu. Automatic multimedia cross-modal correlation discovery. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 653--658. ACM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Reichardt and S. Bornholdt. Statistical mechanics of community detection. Physical Review E, 74(1):016110, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  30. Y. Saad. Numerical methods for large eigenvalue problems, volume 158. SIAM, 1992.Google ScholarGoogle Scholar
  31. M. Senelle, S. García-Díez, A. Mantrach, M. Shimbo, M. Saerens, and F. Fouss. The sum-over-forests density index: identifying dense regions in a graph. Pattern Analysis and Machine Intelligence, IEEE Transactions on, To appear, 2014.Google ScholarGoogle Scholar
  32. L. Tang and H. Liu. Relational learning via latent social dimensions. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 817--826. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. L. Tang and H. Liu. Scalable learning of collective behavior based on sparse social dimensions. In Proceedings of the 18th ACM conference on Information and knowledge management, pages 1107--1116. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. L. Tang, X. Wang, H. Liu, and L. Wang. A multi-resolution approach to learning with overlapping communities. In Proceedings of the First Workshop on Social Media Analytics, pages 14--22. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. H. Tong, C. Faloutsos, and J.-Y. Pan. Random walk with restart: fast solutions and applications. Knowledge and Information Systems, 14(3):327--346, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. Yang and J. Leskovec. Defining and evaluating network communities based on ground-truth. In Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics, page 3. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Scholkopf. Learning with local and global consistency. Advances in neural information processing systems, 16(753760):284, 2004.Google ScholarGoogle Scholar
  38. X. Zhu and Z. Ghahramani. Learning from labeled and unlabeled data with label propagation. Technical report, Technical Report CMU-CALD-02-107, Carnegie Mellon University, 2002.Google ScholarGoogle Scholar

Index Terms

  1. Random walks based modularity: application to semi-supervised learning

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        WWW '14: Proceedings of the 23rd international conference on World wide web
        April 2014
        926 pages
        ISBN:9781450327442
        DOI:10.1145/2566486

        Copyright © 2014 Copyright is held by the International World Wide Web Conference Committee (IW3C2).

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 7 April 2014

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        WWW '14 Paper Acceptance Rate84of645submissions,13%Overall Acceptance Rate1,899of8,196submissions,23%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader