Skip to main content
Log in

A streaming sampling algorithm for social activity networks using fixed structure learning automata

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Social activity networks are formed from activities among users (such as wall posts, tweets, emails, and etc.), where any activity between two users results in an addition of an edge to the network graph. These networks are streaming and include massive volume of edges. A streaming graph is considered to be a stream of edges that continuously evolves over time. This paper proposes a sampling algorithm for social activity networks, implemented in a streaming fashion. The proposed algorithm utilizes a set of fixed structure learning automata. Each node of the original activity graph is equipped with a learning automaton which decides whether its corresponding node should be added to the sample set or not. The proposed algorithm is compared with the best streaming sampling algorithm reported so far in terms of Kolmogorov-Smirnov (KS) test and normalized L1 and L2 distances over real-world activity networks and synthetic networks presented as a sequence of edges. The experimental results show the superiority of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25

Similar content being viewed by others

References

  1. Leskovec J, Faloutsos C (2006) Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’06. ACM Press, New York, p 631

    Chapter  Google Scholar 

  2. Ebbes P, Huang Z, Rangaswamy A (2012) Subgraph sampling methods for social networks: the good, the bad, and the ugly. SSRN Electron J. doi:10.2139/ssrn.1580074

  3. Lee SH, Kim P-J, Jeong H (2006) Statistical properties of sampled networks. Phys Rev E 73:16102. doi:10.1103/PhysRevE.73.016102

  4. Yoon S, Lee S, Yook S-H, Kim Y (2007) Statistical properties of sampled networks by random walks. Phys Rev E 75:46114. doi:10.1103/PhysRevE.75.046114

    Article  Google Scholar 

  5. Ghavipour M, Meybodi MR (2017) Irregular cellular learning automata-based algorithm for sampling social networks. Eng Appl Artif Intell 59:244–259

    Article  Google Scholar 

  6. Krishnamurthy V, Faloutsos M, Chrobak M et al (2007) Sampling large Internet topologies for simulation purposes. Comput Networks 51:4284–4302. doi:10.1016/j.comnet.2007.06.004

    Article  Google Scholar 

  7. Hübler C, Kriegel H-P, Borgwardt K, Ghahramani Z (2008) Metropolis algorithms for representative subgraph sampling. In: 2008 8th IEEE international conference on data mining. IEEE, pp 283–292

  8. Kurant M, Markopoulou A, Thiran P (2011) Towards unbiased BFS sampling. IEEE J Sel Areas Commun 29:1799–1809. doi:10.1109/JSAC.2011.111005

    Article  Google Scholar 

  9. Rezvanian A, Meybodi MR (2015) Sampling social networks using shortest paths. Phys A Stat Mech Appl 424:254–268. doi:10.1016/j.physa.2015.01.030

    Article  Google Scholar 

  10. Rezvanian A, Meybodi MR (2015) A new learning automata-based sampling algorithm for social networks. Int J Commun Syst, n/a-n/a. doi:10.1002/dac.3091

  11. Ahmed NK, Neville J, Kompella R (2014) Network sampling: from static to streaming graphs. ACM Trans Knowl Discov Data 8:7. doi:10.1145/2601438

    Google Scholar 

  12. Bar-Yossef Z, Kumar R, Sivakumar D (2002) Reductions in streaming algorithms, with an application to counting triangles in graphs. In: Proceedings of the 13th annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, San Francisco, California, pp 623–632

    Google Scholar 

  13. Aggarwal CC (2006) On biased reservoir sampling in the presence of stream evolution. In: Proceedings of the 32nd international conference on very large data bases, pp 607–618

    Google Scholar 

  14. Sarma AD, Gollapudi S, Panigrahy R (2011) Estimating PageRank on graph streams. J ACM 58:1–19. doi:10.1145/1970392.1970397

    Article  MathSciNet  MATH  Google Scholar 

  15. Buriol LS, Frahling G, Leonardi S et al (2006) Counting triangles in data streams. In: Proceedings of the 25th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. ACM, pp 253–262

  16. Aggarwal CC, Li Y, Yu PS, Jin R (2010) On dense pattern mining in graph streams. Proc VLDB Endow 3:975–984

    Article  Google Scholar 

  17. Aggarwal CC, Zhao Y, Yu PS (2010) On clustering graph streams. In: Proceedings of the 2010 SIAM international conference on data mining SIAM, pp 478–489

    Chapter  Google Scholar 

  18. Chen L, Wang C (2010) Continuous subgraph pattern search over certain and uncertain graph streams. IEEE Trans Knowl Data Eng 22:1093–1109

    Article  Google Scholar 

  19. Cormode G, Muthukrishnan S (2005) Space efficient mining of multigraph streams. In: Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems - Pod. ’05. ACM Press, New York, p 271

    Google Scholar 

  20. Ahmed NK, Berchmans F, Neville J, Kompella R (2010) Time-based sampling of social network activity graphs Proceedings 8th Work. Min. Learn. with Graphs - MLG ’10. ACM Press, New York, pp 1–9

    Google Scholar 

  21. Aggarwal CC, Zhao Y, Philip SY (2011) Outlier detection in graph streams. In: 27th IEEE international conference on data engineering 2011 (ICDE 2011). IEEE, pp 399–409

  22. Jin EM, Girvan M, Newman MEJ (2001) Structure of growing social networks. Phys Rev E 64:46132

    Article  Google Scholar 

  23. Tang L, Liu H (2010) Community detection and mining in social media. Synth Lect Data Min Knowl Discov 2:1–137

    Article  Google Scholar 

  24. Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time. In: Proceedings of the Elev. ACM SIGKDD international conference on knowledge discovery and data mining - KDD ’05. ACM Press, New York, p 177

    Chapter  Google Scholar 

  25. Kumar R, Novak J, Tomkins A (2010) Structure and evolution of online social networks. In: Link min Model algorithms Appl. Springer, pp 337–357

  26. Stumpf MP, Wiuf C, May RM (2005) Subnets of scale-free networks are not scale-free: sampling properties of networks. In: Proceedings of the Natl. Acad. Sci. U. S. A. National Acad Sciences, pp 4221–4224

    Google Scholar 

  27. Ahn Y-Y, Han S, Kwak H et al (2007) Analysis of topological characteristics of huge online social networking services. In: Proceedings of the 16th international conference on world wide web. ACM, pp 835–844

  28. Mislove A, Marcon M, Gummadi KP et al (2007) Measurement and analysis of online social networks. In: Proceedings of the 7th ACM SIGCOMM conference on internet measurement. ACM, pp 29–42

  29. Wilson C, Boe B, Sala A et al (2009) User interactions in social networks and their implications. In: Proceedings of the 4th ACM european conference on computer systems. ACM, pp 205– 218

  30. Goodman LA (1961) Snowball sampling. Ann Math Stat 32:148–170

    Article  MathSciNet  MATH  Google Scholar 

  31. Gjoka M, Kurant M, Butts CT, Markopoulou A (2010) Walking in Facebook: A case study of unbiased sampling of OSNs 2010. In: Proceedings of the IEEE Infocom. IEEE, pp 1–9

    Google Scholar 

  32. Ye S, Lang J, Wu F (2010) Crawling online social graphs. In: The 12th international Asia-Pacific web conference (APWeb 2010). IEEE, pp 236–242

  33. Lu J, Li D (2012) Sampling online social networks by random walk. In: Proceedings of the 1st ACM international workshop on hot topics on interdisciplinary social networks research - hotsocial ’12. ACM Press, New York, pp 33–40

    Chapter  Google Scholar 

  34. Kurant M, Gjoka M, Butts CT, Markopoulou A (2011) Walking on a graph with a magnifying glass. In: Proceedings of the ACM SIGMETRICS Jt. international conference on measurement and modeling of computer systems - SIGMETRICS ’11. ACM Press, New York, p 281

    Chapter  Google Scholar 

  35. Rasti AH, Torkjazi M, Rejaie R et al (2009) Respondent-driven sampling for characterizing unstructured overlays. In: IEEE INFOCOM 2009. IEEE, pp 2701–2705

  36. Lee C-H, Xu X, Eun DY et al (2012) Beyond random walk and metropolis-hastings samplers. In: Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE Jt. international conference on measurement and modeling of computer systems - SIGMETRICS ’12. ACM Press, New York, p 319

    Chapter  Google Scholar 

  37. Stutzbach D, Rejaie R, Duffield N et al (2009) On unbiased sampling for unstructured peer-to-peer networks. IEEE/ACM Trans Netw 17:377–390

    Article  Google Scholar 

  38. Ribeiro B, Towsley D (2010) Estimating and sampling graphs with multidimensional random walks. In: Proceedings of the 10th ACM SIGCOMM Conf. Internet Meas. ACM, pp 390–403

  39. Avrachenkov K, Ribeiro B, Towsley D (2010) Improving random walk estimation accuracy with uniform restarts. In: Int. Work. Algorithms Model. Web-Graph. Springer, pp 98–109

  40. Thathachar MAL, Sastry PS (2011) Networks of learning automata: techniques for online stochastic optimization. Springer Science & Business Media

  41. Narendra KS, Thathachar MAL (2012) Learning automata: an introduction. doi:10.1109/TSMCB.2002.1049606

  42. Ghavipour M, Meybodi MR (2016) An adaptive fuzzy recommender system based on learning automata. Electron Commer Res Appl 20:105–115

    Article  Google Scholar 

  43. Mirsaleh MR, Meybodi MR (2016) A new memetic algorithm based on cellular learning automata for solving the vertex coloring problem. Memetic Comput 8:2112–222. doi:10.1007/s12293-016-0183-4

    Google Scholar 

  44. Tsetlin M (1961) On behaviour of finite automata in random medium. Avtom I Telemekhanika 22:1345–1354

    Google Scholar 

  45. Barabási A -L, Albert R (1999) Emergence of scaling in random networks. Science (80-) 286:509–512

    Article  MathSciNet  MATH  Google Scholar 

  46. Albert R, Jeong H, Barabási A-L (2000) Error and attack tolerance of complex networks. Nature 406:378–382

    Article  Google Scholar 

  47. Bayer R, Mccreight E (2002) Organization and maintenance of large ordered indexes. In: Softw. Pioneers. Springer Berlin Heidelberg, Berlin, Heidelberg, pp 245–262

    Chapter  Google Scholar 

  48. Gleich DF (2012) Graph of flickr photo-sharing social network crawled in May 2006. doi:10.4231/D39P2W550

  49. Viswanath B, Mislove A, Cha M, Gummadi KP (2009) On the evolution of user interaction in Facebook. In: Proceedings 2nd ACM work Online soc. networks - WOSN ’09. ACM Press, New York, p 37

    Chapter  Google Scholar 

  50. Leskovec J, Krevl A (2014) SNAP Datasets: Stanford Large Network Dataset Collection

  51. Goldstein ML, Morris SA, Yen GG (2004) Problems with fitting to the power-law distribution. Eur Phys J B 41:255–258. doi:10.1140/epjb/e2004-00316-5

    Article  Google Scholar 

  52. Watts DJ, Strogatz SH (1998) Collective dynamics of “small-world” networks. Nature 393:440–442

    Article  MATH  Google Scholar 

Download references

Acknowledgments

The authors acknowledge the use of high performance computers provided by High Performance Computing Research Center (HPCRC) at Amirkabir University of Technology, in the completion of this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Reza Meybodi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ghavipour, M., Meybodi, M.R. A streaming sampling algorithm for social activity networks using fixed structure learning automata. Appl Intell 48, 1054–1081 (2018). https://doi.org/10.1007/s10489-017-1005-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-017-1005-1

Keywords

Navigation