ABSTRACT
Although criticized for some of its limitations, modularity remains a standard measure for analyzing social networks. Quantifying the statistical surprise in the arrangement of the edges of the network has led to simple and powerful algorithms. However, relying solely on the distribution of edges instead of more complex structures such as paths limits the extent of modularity. Indeed, recent studies have shown restrictions of optimizing modularity, for instance its resolution limit. We introduce here a novel, formal and well-defined modularity measure based on random walks. We show how this modularity can be computed from paths induced by the graph instead of the traditionally used edges. We argue that by computing modularity on paths instead of edges, more informative features can be extracted from the network. We verify this hypothesis on a semi-supervised classification procedure of the nodes in the network, where we show that, under the same settings, the features of the random walk modularity help to classify better than the features of the usual modularity. Additionally, the proposed approach outperforms the classical label propagation procedure on two data sets of labeled social networks.
- A. Arenas, A. Fernandez, S. Fortunato, and S. Gomez. Motif-based communities in complex networks. Journal of Physics A: Mathematical and Theoretical, 41(22):224001, 2008.Google ScholarCross Ref
- A. Arenas, A. Fernandez, and S. Gomez. Analysis of the structure of complex networks at different resolution levels. New Journal of Physics, 10(5):053039, 2008.Google ScholarCross Ref
- R. Bekkerman, M. Bilenko, and J. Langford. Scaling up machine learning: Parallel and distributed approaches. Cambridge University Press, 2011. Google ScholarDigital Library
- V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10):P10008, 2008.Google ScholarCross Ref
- A. Clauset, M. E. Newman, and C. Moore. Finding community structure in very large networks. Physical review E, 70(6):066111, 2004.Google Scholar
- J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox. Twister: a runtime for iterative mapreduce. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pages 810--818. ACM, 2010. Google ScholarDigital Library
- S. Fortunato. Community detection in graphs. Physics Reports, 486(3):75--174, 2010.Google ScholarCross Ref
- S. Fortunato and M. Barthéelemy. Resolution limit in community detection. Proceedings of the National Academy of Sciences, 104(1):36, 2007.Google ScholarCross Ref
- F. Fouss, K. Francoisse, L. Yen, A. Pirotte, and M. Saerens. An experimental investigation of kernels on graphs for collaborative recommendation and semisupervised classification. Neural Networks, 31:53--72, 2012. Google ScholarDigital Library
- K. Francoisse, I. Kivimaki, A. Mantrach, F. Rossi, and M. Saerens. A bag-of-paths framework for network data analysis. Submitted for publication; available on ArXiv as ArXiv:1302.6766, pages 1--36, 2013.Google Scholar
- R. Ghosh and K. Lerman. Community detection using a measure of global influence. In Advances in Social Network Mining and Analysis, pages 20--35. Springer, 2010. Google ScholarDigital Library
- E. T. Jaynes. Information theory and statistical mechanics. Physical review, 106(4):620, 1957.Google ScholarCross Ref
- J. Kandola, J. Shawe-Taylor, and N. Cristianini. Learning semantic similarity. Advances in neural information processing systems, 15:657--664, 2002.Google Scholar
- A. Lancichinetti and S. Fortunato. Limits of modularity maximization in community detection. Physical Review E, 84(6):066122, 2011.Google ScholarCross Ref
- A. Lancichinetti, S. Fortunato, and F. Radicchi. Benchmark graphs for testing community detection algorithms. Physical Review E, 78(4):046110, 2008.Google ScholarCross Ref
- R. B. Lehoucq, D. C. Sorensen, and C. Yang. ARPACK users' guide: solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods, volume 6. Siam, 1998.Google Scholar
- J. Leskovec, K. J. Lang, and M. Mahoney. Empirical comparison of algorithms for network community detection. In Proceedings of the 19th international Conference on World Wide Web, pages 631--640. ACM, 2010. Google ScholarDigital Library
- J. Lin and M. Schatz. Design patterns for efficient graph algorithms in mapreduce. In Proceedings of the Eighth Workshop on Mining and Learning with Graphs, pages 78--85. ACM, 2010. Google ScholarDigital Library
- H. B. Mann and D. R. Whitney. On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics, 18(1):50--60, 1947.Google Scholar
- C. D. Manning, P. Raghavan, and H. Schutze. Introduction to information retrieval. Cambridge University Press Cambridge, 2008. Google ScholarDigital Library
- A. Mantrach, L. Yen, J. Callut, K. Francoisse, M. Shimbo, and M. Saerens. The sum-over-paths covariance kernel: A novel covariance measure between nodes of a directed graph. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 32(6):1112--1126, 2010. Google ScholarDigital Library
- M. Newman. Networks: an introduction. OUP Oxford, 2009. Google ScholarDigital Library
- M. E. Newman. Finding community structure in networks using the eigenvectors of matrices. Physical review E, 74(3):036104, 2006.Google Scholar
- M. E. Newman. Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 103(23):8577--8582, 2006.Google ScholarCross Ref
- M. E. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical review E, 69(2):026113, 2004.Google Scholar
- A. Ozgur, L. Ozgur, and T. Gungor. Text categorization with class-based and corpus-based keyword selection. In Computer and Information Sciences-ISCIS 2005, pages 606--615. Springer, 2005. Google ScholarDigital Library
- L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: bringing order to the web. 1999.Google Scholar
- J.-Y. Pan, H.-J. Yang, C. Faloutsos, and P. Duygulu. Automatic multimedia cross-modal correlation discovery. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 653--658. ACM, 2004. Google ScholarDigital Library
- J. Reichardt and S. Bornholdt. Statistical mechanics of community detection. Physical Review E, 74(1):016110, 2006.Google ScholarCross Ref
- Y. Saad. Numerical methods for large eigenvalue problems, volume 158. SIAM, 1992.Google Scholar
- M. Senelle, S. García-Díez, A. Mantrach, M. Shimbo, M. Saerens, and F. Fouss. The sum-over-forests density index: identifying dense regions in a graph. Pattern Analysis and Machine Intelligence, IEEE Transactions on, To appear, 2014.Google Scholar
- L. Tang and H. Liu. Relational learning via latent social dimensions. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 817--826. ACM, 2009. Google ScholarDigital Library
- L. Tang and H. Liu. Scalable learning of collective behavior based on sparse social dimensions. In Proceedings of the 18th ACM conference on Information and knowledge management, pages 1107--1116. ACM, 2009. Google ScholarDigital Library
- L. Tang, X. Wang, H. Liu, and L. Wang. A multi-resolution approach to learning with overlapping communities. In Proceedings of the First Workshop on Social Media Analytics, pages 14--22. ACM, 2010. Google ScholarDigital Library
- H. Tong, C. Faloutsos, and J.-Y. Pan. Random walk with restart: fast solutions and applications. Knowledge and Information Systems, 14(3):327--346, 2008. Google ScholarDigital Library
- J. Yang and J. Leskovec. Defining and evaluating network communities based on ground-truth. In Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics, page 3. ACM, 2012. Google ScholarDigital Library
- D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Scholkopf. Learning with local and global consistency. Advances in neural information processing systems, 16(753760):284, 2004.Google Scholar
- X. Zhu and Z. Ghahramani. Learning from labeled and unlabeled data with label propagation. Technical report, Technical Report CMU-CALD-02-107, Carnegie Mellon University, 2002.Google Scholar
Index Terms
- Random walks based modularity: application to semi-supervised learning
Recommendations
Semi-Supervised learning using random walk limiting probabilities
ISNN'13: Proceedings of the 10th international conference on Advances in Neural Networks - Volume Part IIThe semi-supervised learning paradigm allows that a large amount of unlabeled data be classified using just a few labeled data. To account for the minimal a priori label knowledge, the information provided by the unlabeled data is also used in the ...
Diffusivity of a random walk on random walks
We consider a random walk Zn1,',ZnK+1∈ï K+1 with the constraint that each coordinate of the walk is at distance one from the following one. In this paper, we show that this random walk is slowed down by a variance factor ï K2=2K+2 with respect to the ...
How slow, or fast, are standard random walks?: analysis of hitting and cover times on trees
CATS '11: Proceedings of the Seventeenth Computing: The Australasian Theory Symposium - Volume 119Random walk is a powerful tool, not only for modeling, but also for practical use such as the Internet crawlers. Standard random walks on graphs have been well studied; It is well-known that both hitting time and cover time of a standard random walk are ...
Comments