research-article

Random walks based modularity: application to semi-supervised learning

Authors:
Robin Devooght

Université Libre de Bruxelles, Bruxelles, Belgium

Université Libre de Bruxelles, Bruxelles, Belgium
View Profile

,
Amin Mantrach

Yahoo labs, Barcelona, Spain

Yahoo labs, Barcelona, Spain
View Profile

,
Ilkka Kivimäki

Université catholique de Louvain, Louvain-la-Neuve, Belgium

Université catholique de Louvain, Louvain-la-Neuve, Belgium
View Profile

,
Hugues Bersini

Université Libre de Bruxelles, Bruxelles, Belgium

Université Libre de Bruxelles, Bruxelles, Belgium
View Profile

,
Alejandro Jaimes

Yahoo labs, Barcelona, Spain

Yahoo labs, Barcelona, Spain
View Profile

,
Marco Saerens

Université catholique de Louvain, Louvain-la-Neuve, Belgium

Université catholique de Louvain, Louvain-la-Neuve, Belgium
View Profile

WWW '14: Proceedings of the 23rd international conference on World wide webApril 2014Pages 213–224https://doi.org/10.1145/2566486.2567986

Published:07 April 2014Publication History

WWW '14: Proceedings of the 23rd international conference on World wide web

Pages 213–224

ABSTRACT

Although criticized for some of its limitations, modularity remains a standard measure for analyzing social networks. Quantifying the statistical surprise in the arrangement of the edges of the network has led to simple and powerful algorithms. However, relying solely on the distribution of edges instead of more complex structures such as paths limits the extent of modularity. Indeed, recent studies have shown restrictions of optimizing modularity, for instance its resolution limit. We introduce here a novel, formal and well-defined modularity measure based on random walks. We show how this modularity can be computed from paths induced by the graph instead of the traditionally used edges. We argue that by computing modularity on paths instead of edges, more informative features can be extracted from the network. We verify this hypothesis on a semi-supervised classification procedure of the nodes in the network, where we show that, under the same settings, the features of the random walk modularity help to classify better than the features of the usual modularity. Additionally, the proposed approach outperforms the classical label propagation procedure on two data sets of labeled social networks.

References

A. Arenas, A. Fernandez, S. Fortunato, and S. Gomez. Motif-based communities in complex networks. Journal of Physics A: Mathematical and Theoretical, 41(22):224001, 2008.Google ScholarCross Ref
A. Arenas, A. Fernandez, and S. Gomez. Analysis of the structure of complex networks at different resolution levels. New Journal of Physics, 10(5):053039, 2008.Google ScholarCross Ref
R. Bekkerman, M. Bilenko, and J. Langford. Scaling up machine learning: Parallel and distributed approaches. Cambridge University Press, 2011. Google ScholarDigital Library
V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10):P10008, 2008.Google ScholarCross Ref
A. Clauset, M. E. Newman, and C. Moore. Finding community structure in very large networks. Physical review E, 70(6):066111, 2004.Google Scholar
J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox. Twister: a runtime for iterative mapreduce. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pages 810--818. ACM, 2010. Google ScholarDigital Library
S. Fortunato. Community detection in graphs. Physics Reports, 486(3):75--174, 2010.Google ScholarCross Ref
S. Fortunato and M. Barthéelemy. Resolution limit in community detection. Proceedings of the National Academy of Sciences, 104(1):36, 2007.Google ScholarCross Ref
F. Fouss, K. Francoisse, L. Yen, A. Pirotte, and M. Saerens. An experimental investigation of kernels on graphs for collaborative recommendation and semisupervised classification. Neural Networks, 31:53--72, 2012. Google ScholarDigital Library
K. Francoisse, I. Kivimaki, A. Mantrach, F. Rossi, and M. Saerens. A bag-of-paths framework for network data analysis. Submitted for publication; available on ArXiv as ArXiv:1302.6766, pages 1--36, 2013.Google Scholar
R. Ghosh and K. Lerman. Community detection using a measure of global influence. In Advances in Social Network Mining and Analysis, pages 20--35. Springer, 2010. Google ScholarDigital Library
E. T. Jaynes. Information theory and statistical mechanics. Physical review, 106(4):620, 1957.Google ScholarCross Ref
J. Kandola, J. Shawe-Taylor, and N. Cristianini. Learning semantic similarity. Advances in neural information processing systems, 15:657--664, 2002.Google Scholar
A. Lancichinetti and S. Fortunato. Limits of modularity maximization in community detection. Physical Review E, 84(6):066122, 2011.Google ScholarCross Ref
A. Lancichinetti, S. Fortunato, and F. Radicchi. Benchmark graphs for testing community detection algorithms. Physical Review E, 78(4):046110, 2008.Google ScholarCross Ref
R. B. Lehoucq, D. C. Sorensen, and C. Yang. ARPACK users' guide: solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods, volume 6. Siam, 1998.Google Scholar
J. Leskovec, K. J. Lang, and M. Mahoney. Empirical comparison of algorithms for network community detection. In Proceedings of the 19th international Conference on World Wide Web, pages 631--640. ACM, 2010. Google ScholarDigital Library
J. Lin and M. Schatz. Design patterns for efficient graph algorithms in mapreduce. In Proceedings of the Eighth Workshop on Mining and Learning with Graphs, pages 78--85. ACM, 2010. Google ScholarDigital Library
H. B. Mann and D. R. Whitney. On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics, 18(1):50--60, 1947.Google Scholar
C. D. Manning, P. Raghavan, and H. Schutze. Introduction to information retrieval. Cambridge University Press Cambridge, 2008. Google ScholarDigital Library
A. Mantrach, L. Yen, J. Callut, K. Francoisse, M. Shimbo, and M. Saerens. The sum-over-paths covariance kernel: A novel covariance measure between nodes of a directed graph. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 32(6):1112--1126, 2010. Google ScholarDigital Library
M. Newman. Networks: an introduction. OUP Oxford, 2009. Google ScholarDigital Library
M. E. Newman. Finding community structure in networks using the eigenvectors of matrices. Physical review E, 74(3):036104, 2006.Google Scholar
M. E. Newman. Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 103(23):8577--8582, 2006.Google ScholarCross Ref
M. E. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical review E, 69(2):026113, 2004.Google Scholar
A. Ozgur, L. Ozgur, and T. Gungor. Text categorization with class-based and corpus-based keyword selection. In Computer and Information Sciences-ISCIS 2005, pages 606--615. Springer, 2005. Google ScholarDigital Library
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: bringing order to the web. 1999.Google Scholar
J.-Y. Pan, H.-J. Yang, C. Faloutsos, and P. Duygulu. Automatic multimedia cross-modal correlation discovery. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 653--658. ACM, 2004. Google ScholarDigital Library
J. Reichardt and S. Bornholdt. Statistical mechanics of community detection. Physical Review E, 74(1):016110, 2006.Google ScholarCross Ref
Y. Saad. Numerical methods for large eigenvalue problems, volume 158. SIAM, 1992.Google Scholar
M. Senelle, S. García-Díez, A. Mantrach, M. Shimbo, M. Saerens, and F. Fouss. The sum-over-forests density index: identifying dense regions in a graph. Pattern Analysis and Machine Intelligence, IEEE Transactions on, To appear, 2014.Google Scholar
L. Tang and H. Liu. Relational learning via latent social dimensions. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 817--826. ACM, 2009. Google ScholarDigital Library
L. Tang and H. Liu. Scalable learning of collective behavior based on sparse social dimensions. In Proceedings of the 18th ACM conference on Information and knowledge management, pages 1107--1116. ACM, 2009. Google ScholarDigital Library
L. Tang, X. Wang, H. Liu, and L. Wang. A multi-resolution approach to learning with overlapping communities. In Proceedings of the First Workshop on Social Media Analytics, pages 14--22. ACM, 2010. Google ScholarDigital Library
H. Tong, C. Faloutsos, and J.-Y. Pan. Random walk with restart: fast solutions and applications. Knowledge and Information Systems, 14(3):327--346, 2008. Google ScholarDigital Library
J. Yang and J. Leskovec. Defining and evaluating network communities based on ground-truth. In Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics, page 3. ACM, 2012. Google ScholarDigital Library
D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Scholkopf. Learning with local and global consistency. Advances in neural information processing systems, 16(753760):284, 2004.Google Scholar
X. Zhu and Z. Ghahramani. Learning from labeled and unlabeled data with label propagation. Technical report, Technical Report CMU-CALD-02-107, Carnegie Mellon University, 2002.Google Scholar

Index Terms

Random walks based modularity: application to semi-supervised learning
1. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory
      1. Graph algorithms
      2. Paths and connectivity problems

Recommendations

Semi-Supervised learning using random walk limiting probabilities
ISNN'13: Proceedings of the 10th international conference on Advances in Neural Networks - Volume Part II

The semi-supervised learning paradigm allows that a large amount of unlabeled data be classified using just a few labeled data. To account for the minimal a priori label knowledge, the information provided by the unlabeled data is also used in the ...
Read More
Diffusivity of a random walk on random walks

We consider a random walk Zn1,',ZnK+1∈ï K+1 with the constraint that each coordinate of the walk is at distance one from the following one. In this paper, we show that this random walk is slowed down by a variance factor ï K2=2K+2 with respect to the ...
Read More
How slow, or fast, are standard random walks?: analysis of hitting and cover times on trees
CATS '11: Proceedings of the Seventeenth Computing: The Australasian Theory Symposium - Volume 119

Random walk is a powerful tool, not only for modeling, but also for practical use such as the Internet crawlers. Standard random walks on graphs have been well studied; It is well-known that both hitting time and cover time of a standard random walk are ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '14: Proceedings of the 23rd international conference on World wide web
April 2014
926 pages
ISBN:9781450327442
DOI:10.1145/2566486
General Chair:
Chin-Wan Chung
Korea Advanced Institute of Science and Technology, Korea
,
Program Chairs:
Andrei Broder
Google Inc., USA
,
Kyuseok Shim
Seoul National University, Korea
,
Torsten Suel
New York University, USA
Copyright © 2014 Copyright is held by the International World Wide Web Conference Committee (IW3C2).
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 April 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
graph mining
modularity
random walk
semi-supervised learning
social networks
statistical physics
Qualifiers
- research-article
Conference

Acceptance Rates
WWW '14 Paper Acceptance Rate84of645submissions,13%Overall Acceptance Rate1,899of8,196submissions,23%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 13
  Total Citations
  View Citations
- 417
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Random walks based modularity: application to semi-supervised learning

WWW '14: Proceedings of the 23rd international conference on World wide web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Semi-Supervised learning using random walk limiting probabilities

Diffusivity of a random walk on random walks

How slow, or fast, are standard random walks?: analysis of hitting and cover times on trees