skip to main content
10.1145/3485447.3511995acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Is Least-Squares Inaccurate in Fitting Power-Law Distributions? The Criticism is Complete Nonsense

Authors Info & Claims
Published:25 April 2022Publication History

ABSTRACT

Ordinary least-squares estimation is proved to be the best linear unbiased estimator according to the Gauss-Markov theorem. In the last two decades, however, some researchers criticized that least-squares was substantially inaccurate in fitting power-law distributions; such criticism has caused a strong bias in research community. In this paper, we conduct extensive experiments to rebut that such criticism is complete nonsense. Specifically, we sample different sizes of discrete and continuous data from power-law models, showing that even though the long-tailed noises are sampled from power-law models, they cannot be treated as power-law data. We define the correct way to bin continuous power-law data into data points and propose an average strategy for least-squares to fit power-law distributions. Experiments on both simulated and real-world data show that our proposed method fits power-law data perfectly. We uncover a fundamental flaw in the popular method proposed by Clauset et al. [12]: it tends to discard the majority of power-law data and fit the long-tailed noises. Experiments also show that the reverse cumulative distribution function is a bad idea to plot power-law data in practice because it usually hides the true probability distribution of data. We hope that our research can clean up the bias about least-squares fitting power-law distributions.

Source code can be found at https://github.com/xszhong/LSavg.

References

  1. Lada A. Adamic and Bernardo A. Huberman. 2000. The Nature of Markets in the World Wide Web. Quarterly Journal of Electronic Commerce 1, 1 (2000), 5–12.Google ScholarGoogle Scholar
  2. Reka Albert, Hawoong Jeong, and Albert-László Barabási. 1999. Diameter of the World-Wide Web. Nature 401(1999), 130.Google ScholarGoogle ScholarCross RefCross Ref
  3. I. Artico, I. Smolyarenko, V. Vinciotti, and E. C. Wit. 2020. How rare are power-law networks really?. In Proceedings of the Royal Society A, Vol. 476. 20190742.Google ScholarGoogle ScholarCross RefCross Ref
  4. Eduardo M. Azevedo, Alex Deng, Jose Luis Montiel Olea, Justin Rao, and E. Glen Weyl. 2020. A/B Testing with Fat Tails. Journal of Political Economy 128, 12 (2020), 4614–000.Google ScholarGoogle ScholarCross RefCross Ref
  5. Albert-László Barabási and Réka Albert. 1999. Emergence of Scaling in Random Networks. Science 286(1999), 509–512.Google ScholarGoogle ScholarCross RefCross Ref
  6. H. Bauke. 2007. Parameter estimation for power-law distributions by maximum likelihood methods. The European Physical Journal B 58 (2007), 167–173.Google ScholarGoogle ScholarCross RefCross Ref
  7. Bernd Blaslus. 2020. Power-law distribution in the number of confirmed COVID-19 cases. Chaos: An Interdisciplinary Journal of Nonlinear Science 30, 9(2020).Google ScholarGoogle Scholar
  8. Eric Bonnet, Olivier Bour, Noelle E. Odling, Philippe Davy, Ian Main, Patience Cowie, and Brian Berkowitz. 2001. Scaling of fracture systems in geological media. Reviews of geophysics 39, 3 (2001), 347–383.Google ScholarGoogle Scholar
  9. Patrick Erik Bradley and Martin Behnisch. 2019. Heavy-tailed distributions for building stock data. Environment and Planning B: Urban Analytics and City Science 46, 7(2019), 1281–1296.Google ScholarGoogle ScholarCross RefCross Ref
  10. Anna D. Broido and Aaron Clauset. 2019. Scale-free networks are rare. Nature communications 10, 1 (2019), 1–10.Google ScholarGoogle Scholar
  11. Robert Malcolm Clark, S. J. D. Cox, and Geoff M. Laslett. 1999. Generalizations of power-law distributions applicable to sampled fault-trace lengths: model choice, parameter estimation and caveats. Geophysical Journal International 136, 2 (1999), 357–372.Google ScholarGoogle ScholarCross RefCross Ref
  12. Aaron Clauset, Cosma Rohilla Shalizi, and M. E. J. Newman. 2009. Power-law Distributions in Empirical Data. SIAM Rev. 51, 4 (2009), 661–703.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. William G. Cochran. 1952. The Chi-square Test of Goodness of Fit. The Annals of Mathematical Statistics 23, 3 (1952), 315–345.Google ScholarGoogle ScholarCross RefCross Ref
  14. Donald Cochrane and Guy H. Orcutt. 1949. Application of least squares regression to relationships containing auto-correlated error terms. J. Amer. Statist. Assoc. 44, 245 (1949), 32–61.Google ScholarGoogle Scholar
  15. Brian Conrad and Michael Mitzenmacher. 2004. Power laws for monkeys typing randomly: the case of unequal probabilities. IEEE Transactions on information theory 50, 7 (2004), 1403–1414.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Bernat Corominas-Murtra and Ricard V. Solé. 2010. Universality of Zipf’s Law. Physical Review E 82, 1 (2010), 011102.Google ScholarGoogle ScholarCross RefCross Ref
  17. Alvaro Corral and Alvaro Gonzalez. 2019. Power Law Size Distributions in Geoscience Revisited. Earth and Space Science 6, 5 (2019), 673–697.Google ScholarGoogle ScholarCross RefCross Ref
  18. Alvaro Corral, Isabel Serra, and Ramon Ferrer i Cancho. 2020. Distinct flavors of Zipf’s law and its maximum likelihood fitting: Rank-size and size-distribution representations. Physical Review E 102, 5 (2020), 052113.Google ScholarGoogle ScholarCross RefCross Ref
  19. Frederik Michel Dekking, Cornelis Kraaikamp, Hendrik Paul Lopuhaä, and Ludolf Erwin Meester. 2005. A Modern Introduction to Probability and Statistics: Understanding why and how. Springer Science & Business Media.Google ScholarGoogle Scholar
  20. Anna Deluca and Alvaro Corral. 2013. Fitting and Goodness-of-Fit Test of Non-Truncated and Truncated Power-Law Distributions. Acta Geophysica 61, 6 (2013), 1351–1394.Google ScholarGoogle ScholarCross RefCross Ref
  21. Nicole Eikmeier and David F. Gleich. 2017. Revisiting Power-law Distributions in Spectra of Real World Networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 817–826.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Zoltan Eisler, Imre Bartos, and Janos Kertesz. 2008. Fluctuation scaling in complex systems: Taylor’s law and beyond. Advances in Physics 57, 1 (2008), 89–142.Google ScholarGoogle ScholarCross RefCross Ref
  23. Brian J. Enquist, Evan P. Economo, Travis E. Huxman, Andrew P. Allen, Danielle D. Ignace, and James F. Gillooly. 2003. Scaling metabolism from organisms to ecosystems. Nature 423, 6940 (2003), 639–642.Google ScholarGoogle Scholar
  24. Brian J. Enquist and Karl J. Niklas. 2001. Invariant scaling relations across tree-dominated communities. Nature 410, 6829 (2001), 655–660.Google ScholarGoogle Scholar
  25. Xavier Gabaix. 2009. Power Laws in Economics and Finance. Annual Review of Economics 1, 1 (2009), 255–294.Google ScholarGoogle ScholarCross RefCross Ref
  26. M.L. Goldstein, S.A. Morris, and G.G. Yen. 2004. Problems with fitting to the power-law distribution. The European Physical Journal B 41 (2004), 255–258.Google ScholarGoogle ScholarCross RefCross Ref
  27. Beno Gutenberg and Charles F. Richter. 1944. Frequency of Earthquakes in California. Bulletin of the Seismological Society of America 34, 4 (1944), 185–188.Google ScholarGoogle ScholarCross RefCross Ref
  28. Bo-Ping Han and Milan Straskraba. 1998. Size dependence of biomass spectra and population density I. The effects of size scales and size intervals. Journal of Theoretical Biology 191, 3 (1998), 259–265.Google ScholarGoogle ScholarCross RefCross Ref
  29. Rudolf Hanel, Bernat Corominas-Murtra, Bo Liu, and Stefan Thurner. 2017. Fitting power-laws in empirical data with estimators that work for all exponents. PLoS ONE 12, 2 (2017), 1–15.Google ScholarGoogle ScholarCross RefCross Ref
  30. Charles R. Henderson. 1975. Best Linear Unbiased Estimation and Prediction under a Selection Model. Biometrics 31, 2 (1975), 423–447.Google ScholarGoogle ScholarCross RefCross Ref
  31. Hawoong Jeong, Balint Tombor, Reka Albert, Zoltan N. Oltvai, and A-L. Barabasi. 2000. The Large-Scale Organization of Metabolic Networks. Nature 407, 6804 (2000), 651–654.Google ScholarGoogle ScholarCross RefCross Ref
  32. Sonia Kefi, Max Rietkerk, Concepcion L. Alados, Yolanda Pueyo, Vasilios P. Papanastasis, Ahmed ElAich, and Peter C. De Ruiter. 2007. Spatial vegetation patterns and imminent desertification in Mediterranean arid ecosystems. Nature 449, 7159 (2007), 213–217.Google ScholarGoogle Scholar
  33. Wentian Li. 2002. Zipf’s Law Everywhere. Glottometrics 5(2002), 14–21.Google ScholarGoogle Scholar
  34. Edward T. Lu and Russell J. Hamilton. 1991. Avalanches and the Distribution of Solar Flares. The Astrophysical Journal 380 (1991), L89–L92.Google ScholarGoogle ScholarCross RefCross Ref
  35. R. Dean Malmgren, Daniel B. Stouffer, Adilson E. Motter, and Luis AN Amaral. 2008. A Poissonian explanation for heavy tails in e-mail communication. Proceedings of the National Academy of Sciences 105, 47(2008), 18153–18158.Google ScholarGoogle ScholarCross RefCross Ref
  36. Timothy D. Meehan. 2006. Energy Use and Animal Abundance in Litter and Soil Communities. Ecology 87, 7 (2006), 1650–1658.Google ScholarGoogle ScholarCross RefCross Ref
  37. Buddhika Nettasinghe and Vikram Krishnamurthy. 2021. Maximum Likelihood Estimation of Power-law Degree Distributions via Friendship Paradox-based Sampling. ACM Transactions on Knowledge Discovery from Data 15, 6 (2021), 1–28.Google ScholarGoogle Scholar
  38. Mark EJ. Newman. 2005. Power laws, Pareto distributions and Zipf’s law. Contemporary physics 46, 5 (2005), 323–351.Google ScholarGoogle Scholar
  39. Jan Overgoor, Austin R. Benson, and Johan Ugander. 2019. Choosing to Grow a Graph: Modeling Network Formation as Discrete Choice. In Proceedings of the 2019 World Wide Web Conference. 1409–1420.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Karl Pearson. 1990. On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling. Philos. Mag. 5, 50 (1990), 157–175.Google ScholarGoogle Scholar
  41. Steven T. Piantadosi. 2014. Zipf’s word frequency law in natural language: A critical review and future directions. Psychonomic bulletin & review 21, 5 (2014), 1112–1130.Google ScholarGoogle Scholar
  42. G. Pickering, J. M. Bull, and D. J. Sanderson. 1995. Sampling power-law distributions. Tectonophysics 248(1995), 1–20.Google ScholarGoogle ScholarCross RefCross Ref
  43. Carla M.A. Pinto, A. Mendes Lopes, and J.A. Tenreiro Machado. 2012. A review of power laws in real life phenomena. Commun Nonlinear Sci Number Simulat 17 (2012), 3558–3578.Google ScholarGoogle ScholarCross RefCross Ref
  44. Robin L. Plackett. 1949. A Historical Note on the Method of Least Squares. Biometrika 36, 3/4 (1949), 458–460.Google ScholarGoogle ScholarCross RefCross Ref
  45. Derek J. De Solla Price. 1965. Networks of Scientific Papers. Science 149, 3683 (1965), 510–515.Google ScholarGoogle Scholar
  46. Salvador Pueyo and Roger Jovani. 2006. Comment on “A Keystone Mutualism Drives Pattern in a Power Function”. Science 313, 5794 (2006), 1739–1739.Google ScholarGoogle Scholar
  47. John A. Rice. 2006. Mathematical Statistics and Data Analysis. Cengage Learning.Google ScholarGoogle Scholar
  48. Andrea Rinaldo, Amos Maritan, Kent K. Cavender-Bares, and Sallie W. Chisholm. 2002. Cross-scale ecological dynamics and microbial size spectra in marine ecosystems. In Proceedings of the Royal Society of London. Series B: Biological Sciences, Vol. 269. 2051–2059.Google ScholarGoogle Scholar
  49. David W. Sims, David Righton, and Jonathan W. Pitchford. 2007. Minimizing errors in identifying Levy flight behaviour of organisms. Journal of Animal Ecology 76, 2 (2007), 222–229.Google ScholarGoogle ScholarCross RefCross Ref
  50. Nickolay Smirnov. 1948. Table for Estimating the Goodness of Fit of Empirical Distributions. Annals of Mathematical Statistics 19, 2 (1948), 279–281.Google ScholarGoogle ScholarCross RefCross Ref
  51. Michael A Stephens. 1974. EDF statistics for goodness of fit and some comparisons. Journal of the American statistical Association 69, 347(1974), 730–737.Google ScholarGoogle ScholarCross RefCross Ref
  52. Alex Stivala, Garry Robins, and Alessandro Lomi. 2020. Exponential random graph model parameter estimation for very large directed networks. PLoS ONE 15, 1 (2020), e0227804.Google ScholarGoogle ScholarCross RefCross Ref
  53. Gilbert Strang. 2016. Introduction to Linear Algebra. Wellesley-Cambridge Press.Google ScholarGoogle Scholar
  54. Yogesh Virkar and Aaron Clauset. 2014. Power-law distributions in binned empirical data. The Annals of Applied Statistics 8, 1 (2014), 89–119.Google ScholarGoogle ScholarCross RefCross Ref
  55. Geoffrey B. West, James H. Brown, and Brian J. Enquist. 1997. A General Model for the Origin of Allometric Scaling Laws in Biology. Science 276, 5309 (1997), 122–126.Google ScholarGoogle ScholarCross RefCross Ref
  56. Ethan P. White, Brian J. Enquist, and Jessica L. Green. 2008. On estimating the exponent of power‐law frequency distributions. Ecology 89, 4 (2008), 905–912.Google ScholarGoogle ScholarCross RefCross Ref
  57. J. C. Willis and G. Udny Yule. 1922. Some Statistics of Evolution and Geographical Distribution in Plants and Animals, and their Significance. Nature 109(1922), 177–179.Google ScholarGoogle ScholarCross RefCross Ref
  58. Chengxi Zang, Peng Cui, and Wenwu Zhu. 2018. Learning and Interpreting Complex Distributions in Empirical Data. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2682–2691.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Xiaoshi Zhong. 2020. Time Expression and Named Entity Analysis and Recognition. Ph.D. Dissertation. Nanyang Technological University, Singapore.Google ScholarGoogle Scholar
  60. Tommaso Zillio and Richard Condit. 2007. The impact of neutrality, niche, differentiation and species input on diversity and abundance distributions. Oikos 116(2007), 931–940.Google ScholarGoogle ScholarCross RefCross Ref
  61. George Zipf. 1949. Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Addison-Wesley Press, Inc.Google ScholarGoogle Scholar

Index Terms

  1. Is Least-Squares Inaccurate in Fitting Power-Law Distributions? The Criticism is Complete Nonsense
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            WWW '22: Proceedings of the ACM Web Conference 2022
            April 2022
            3764 pages
            ISBN:9781450390965
            DOI:10.1145/3485447

            Copyright © 2022 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 25 April 2022

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited

            Acceptance Rates

            Overall Acceptance Rate1,899of8,196submissions,23%

            Upcoming Conference

            WWW '24
            The ACM Web Conference 2024
            May 13 - 17, 2024
            Singapore , Singapore
          • Article Metrics

            • Downloads (Last 12 months)85
            • Downloads (Last 6 weeks)8

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format