Skip to main content
Log in

Community structure mining in big data social media networks with MapReduce

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Social media networks are playing increasingly prominent role in people’s daily life. Community structure is one of the salient features of social media network and has been applied to practical applications, such as recommendation system and network marketing. With the rapid expansion of social media size and surge of tremendous amount of information, how to identify the communities in big data scenarios has become a challenge. Based on our previous work and the map equation (an equation from information theory for community mining), we develop a novel distributed community structure mining framework. In the framework, (1) we propose a new link information update method to try to avoid data writing related operations and try to speedup the process. (2) We use the local information from the nodes and their neighbors, instead of the pagerank, to calculate the probability distribution of the nodes. (3) We exclude the network partitioning process from our previous work and try to run the map equation directly on MapReduce. Empirical results on real-world social media networks and artificial networks show that the new framework outperforms our previous work and some well-known algorithms, such as Radetal, FastGN, in accuracy, velocity and scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. http://snap.stanford.edu/data/com-LiveJournal.html.

  2. http://snap.stanford.edu/data/as-skitter.html.

  3. http://snap.stanford.edu/data/com-Orkut.html.

References

  1. Andreev, K., Racke, H.: Balanced graph partitioning. Theory Comput. Syst. 39(6), 929–939 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  2. Borthakur, D.: HDFS architecture guide, HADOOP APACHE PROJECT. http://hadoop.apache.org/common/docs/current/hdfs_design (2008)

  3. Cambria, E., Rajagopal, D., Olsher, D., Das, D.: Big social data analysis. Big Data Comput. 401–414 (2013)

  4. Chen, Y., Huang, C., Zhai, K.: Scalable community detection algorithm with MapReduce. Commun. ACM 53, 359–366 (2009)

    Google Scholar 

  5. Clauset, A., Newman, M.E., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70(6), 066111 (2004)

    Article  Google Scholar 

  6. Cooper, S.: The largest social networks in the world include some big surprises, Business Insider, New York, USA. http://www.businessinsider.com/the-largest-social-networks-in-the-world-2013-12 Accessed Jan 2014

  7. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  8. Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3), 75–174 (2010)

    Article  MathSciNet  Google Scholar 

  9. Girvan, M., Newman, M.E.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. 99(12), 7821–7826 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  10. Gleiser, P.M., Danon, L.: Community structure in jazz. Adv. Complex Syst. 6(04), 565–573 (2003)

    Article  Google Scholar 

  11. Huffman, D.A.: A method for the construction of minimum redundancy codes. Proc. IRE 40(9), 1098–1101 (1952)

    Article  Google Scholar 

  12. Ihara, S.: Information Theory for Continuous Systems. World Scientific, Singapore (1993)

    Book  MATH  Google Scholar 

  13. Jin, S., Li, A., Yang, S., Lin, W., Deng, B., Li, S.: A MapReduce and information compression based social community structure mining method, IEEE 16th International Conference on Computational Science and Engineering (CSE), 2013, pp. 971–980. (2013)

  14. Jin, S., Yu, P., Li, S., Yang, S.: A parallel community structure mining method in big social networks, mathematical problems in engineering, (in Press) http://downloads.hindawi.com/journals/mpe/aip/934301 (2014)

  15. Kalyanaraman, R.A.: An efficient MapReduce algorithm for parallelizing large-scale graph clustering,In: ParGraph—Workshop on Parallel Algorithms and Software for Analysis of Massive Graphs, Held in conjunction with HiPC’11. Bengaluru, India (2011)

  16. Kernighan, B.W., Lin, S.: An efficient Heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49(2), 291–308 (1970)

    Article  MATH  Google Scholar 

  17. Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78(4), 046110 (2008)

    Article  Google Scholar 

  18. Lancichinetti, A., Fortunato, S.: Community detection algorithms: a comparative analysis. Phys. Rev. E 80(5), 056117 (2009)

    Article  Google Scholar 

  19. Leskovec, J., Lang, K. J., & Mahoney, M.: Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th international conference on World wide web, 631–640 (2010)

  20. MacQueen, J.: Some methods for classification and analysis of multivariate observations, In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. 1(14), 281–297 (1967)

  21. Newman, M.E.: Fast algorithm for detecting community structure in networks. Phys. Rev. E 69(6), 066133 (2004)

    Article  Google Scholar 

  22. Orman, G.K., Labatut, V., Cherifi, H.: Comparative evaluation of community detection algorithms: a topological approach. J. Stat. Mech. Theory Exp. 2012(08), P08001 (2012)

    Article  Google Scholar 

  23. Pasco, R.C.: Source coding algorithms for fast data compression. Stanford University, Ph.D. dissertation (1976)

  24. Plantié, M., Michel, C.: Survey on Social Community Detection, Social Media Retrieval. Springer, London (2013)

    Google Scholar 

  25. Pothen, A., Simon, H.D., Liou, K.P.: Partitioning sparse matrices with eigenvectors of graphs. SIAM J. Matrix Anal. Appl. 11(3), 430 (1990)

    Article  MATH  MathSciNet  Google Scholar 

  26. Radicchi, F., Castellano, C., Cecconi, F., Loreto, V., Parisi, D.: Defining and identifying communities in networks. Proc. Natl. Acad. Sci. USA 101(9), 2658–2663 (2004)

    Article  Google Scholar 

  27. Riedy, E.J., Meyerhenke, H., Ediger, D., Bader, D.A.: Parallel community detection for massive graphs. In: Parallel Processing and Applied Mathematics, pp. 286–296. Springer, Berlin Heidelberg (2012)

  28. Rosvall, M., Esquivel, A., Lancichinetti, A., West, J., Lambiotte, R.: Memory in network flows and its effects on spreading dynamics and community detection. Nat. Commun. 5, 2014, doi:10.1038/ncomms5630

  29. Rosvall, M., Bergstrom, C.T.: An information-theoretic framework for resolving community structure in complex networks. Proc. Natl. Acad. Sci. 104(18), 7327–7331 (2007)

    Article  Google Scholar 

  30. Rosvall, M., Bergstrom, C.T.: The map equation. Eur. Phys. J. Spec. Top. 178(1), 13–23 (2009)

    Article  Google Scholar 

  31. Shannon, C.E.: A mathematical theory of communication. ACM SIGMOBILE Mobile Comput. Commun. Rev. 5(1), 3–55 (2001)

    Article  Google Scholar 

  32. Staudt, C.L., Meyerhenke, H.: Engineering parallel algorithms for community detection in massive networks. arXiv:1304.4453 (2014)

  33. Yang, B., Liu, D., Liu, J.: Discovering communities from social networks: methodologies and applications. In: Furht, B. (ed.) Handbook of Social Network Technologies and Applications, pp. 331–346. Springer, New York, USA (2010)

    Chapter  Google Scholar 

Download references

Acknowledgments

The authors would like to express our sincere gratitude to Professor Philip S. Yu from University of Illinois at Chicago, Mr. Zhang Yuchao from Beijing Institute of System Engineering for providing great assistance through the entire research process. Besides, this work was supported in part by the National High-Tech Research and Development Program of China (2012AA012600), National Natural Science Foundation of China (61202362, 61472433).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Songchang Jin.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jin, S., Lin, W., Yin, H. et al. Community structure mining in big data social media networks with MapReduce. Cluster Comput 18, 999–1010 (2015). https://doi.org/10.1007/s10586-015-0452-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-015-0452-x

Keywords

Navigation