Skip to main content
Log in

Towards finding the best-fit distribution for OSN data

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Currently, all online social networks (OSNs) are considered to follow a power-law distribution. In this paper, the degree distribution for multiple OSNs has been studied. It is seen that the degree distributions of OSNs differ moderately from a power law. Lognormal distributions are an alternative to power-law distributions and have been used as best fit for many complex networks. It is seen that the degree distributions of OSNs differ massively from a lognormal distribution. Thus, for a better fit, a composite distribution combining power-law and lognormal distribution is suggested. This paper proposes an approach to find the most suitable distribution for a given degree distribution out of the six possible combinations of power law and lognormal, namely power law, lognormal, power law–lognormal, lognormal–power law, double power law, and double power law lognormal. The errors in the fitted composite distribution and the original degree distribution of the OSNs are observed. It is seen that a composite distribution fitted using the approach described in this paper is always a better fit than both power-law and lognormal distributions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. In a hashtag co-occurrence graph, the nodes are hashtags and the edges represent the fact that two hashtags appear in a tweet. In this study, we ignore the edge weight that represents the count of tweets in which these hashtags co-occur.

References

  1. Kemp S (2019) Digital 2019: Global internet use accelerates. We are Social

  2. Smith K (2019) 53 incredible facebook statistics and facts. Brandwatch Report

  3. Kossinets G, Watts DJ (2006) Empirical analysis of an evolving social network. Science 311(5757):88–90

    Article  MathSciNet  Google Scholar 

  4. Kumar R, Novak J, Tomkins A (2006) Structure and evolution of online social networks. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’06). Association for Computing Machinery, New York, NY, pp 611–617. https://doi.org/10.1145/1150402.1150476

    Chapter  Google Scholar 

  5. Gephart JA, Pace ML (2015) Structure and evolution of the global seafood trade network. Environ. Res. Lett. 10(12):125,014

    Article  Google Scholar 

  6. Newman ME (2003) The structure and function of complex networks. SIAM Rev. 45(2):167–256

    Article  MathSciNet  Google Scholar 

  7. Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512

    Article  MathSciNet  Google Scholar 

  8. Newman ME (2005) Power laws, pareto distributions and zipf’s law. Contemp. Phys. 46(5):323–351

    Article  Google Scholar 

  9. Gómez V, Kaltenbrunner A, López V (2008) Statistical analysis of the social network and discussion threads in slashdot. In: Proceedings of the 17th International Conference on World Wide Web (WWW). ACM, pp 645–654

  10. Sala A, Gaito S, Rossi GP, Zheng H, Zhao BY (2011) Revisiting degree distribution models for social graph analysis. arXiv:11080027

  11. Reed WJ, Hughes BD (2003) Power-law distribution from exponential processes: an explanation for the occurrence of long-tailed distributions in biology and elsewhere. Sci Math Jpn 58(2):473–484

    MathSciNet  MATH  Google Scholar 

  12. Reed WJ, Jorgensen M (2004) The double pareto-lognormal distribution—a new parametric model for size distributions. Commun Stat Theory Methods 33(8):1733–1753

    Article  MathSciNet  Google Scholar 

  13. Seshadri M, Machiraju S, Sridharan A, Bolot J, Faloutsos C, Leskovek J (2008) Mobile call graphs: beyond power-law and lognormal distributions. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp 596–604

  14. Fang Z, Wang J, Liu B, Gong W (2012) Double Pareto lognormal distributions in complex networks. In: Thai M. Pardalos P (eds) Handbook of optimization in complex networks. Springer Optimization and Its Applications, vol 57. Springer, Boston, MA

    Chapter  Google Scholar 

  15. Luckstead J, Devadoss S (2017) Pareto tails and lognormal body of US cities size distribution. Phys A Stat Mech Appl 465:573–578

    Article  Google Scholar 

  16. Kwong HS, Nadarajah S (2019) A note on “pareto tails and lognormal body of us cities size distribution”. Phys A Stat Mech Appl 513:55–62

    Article  Google Scholar 

  17. Montebruno P, Bennett RJ, Van Lieshout C, Smith H (2019) A tale of two tails: Do power law and lognormal models fit firm-size distributions in the mid-victorian era? Phys A Stat Mech Appl 523:858–875

    Article  Google Scholar 

  18. Lu S (2018) Power laws in complex graphs: parsimonious generative models, similarity testing algorithms, and the origins. PhD thesis, University of Massachusetts Amherst

  19. Kong Y, Zang H, Ma X (2016) Quick model fitting using a classifying engine. In: 2016 IEEE International Conference on Big Data (Big Data). IEEE, pp 2728–2733

  20. Bee M (2015) Estimation of the lognormal-pareto distribution using probability weighted moments and maximum likelihood. Commun Stat Simul Comput 44(8):2040–2060

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sarbani Roy.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bhattacharya, S., Sinha, S., Roy, S. et al. Towards finding the best-fit distribution for OSN data. J Supercomput 76, 9882–9900 (2020). https://doi.org/10.1007/s11227-020-03232-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-020-03232-y

Keywords

Navigation