skip to main content
10.1145/3485447.3511995acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Is Least-Squares Inaccurate in Fitting Power-Law Distributions? The Criticism is Complete Nonsense

Published: 25 April 2022 Publication History

Abstract

Ordinary least-squares estimation is proved to be the best linear unbiased estimator according to the Gauss-Markov theorem. In the last two decades, however, some researchers criticized that least-squares was substantially inaccurate in fitting power-law distributions; such criticism has caused a strong bias in research community. In this paper, we conduct extensive experiments to rebut that such criticism is complete nonsense. Specifically, we sample different sizes of discrete and continuous data from power-law models, showing that even though the long-tailed noises are sampled from power-law models, they cannot be treated as power-law data. We define the correct way to bin continuous power-law data into data points and propose an average strategy for least-squares to fit power-law distributions. Experiments on both simulated and real-world data show that our proposed method fits power-law data perfectly. We uncover a fundamental flaw in the popular method proposed by Clauset et al. [12]: it tends to discard the majority of power-law data and fit the long-tailed noises. Experiments also show that the reverse cumulative distribution function is a bad idea to plot power-law data in practice because it usually hides the true probability distribution of data. We hope that our research can clean up the bias about least-squares fitting power-law distributions.
Source code can be found at https://github.com/xszhong/LSavg.

References

[1]
Lada A. Adamic and Bernardo A. Huberman. 2000. The Nature of Markets in the World Wide Web. Quarterly Journal of Electronic Commerce 1, 1 (2000), 5–12.
[2]
Reka Albert, Hawoong Jeong, and Albert-László Barabási. 1999. Diameter of the World-Wide Web. Nature 401(1999), 130.
[3]
I. Artico, I. Smolyarenko, V. Vinciotti, and E. C. Wit. 2020. How rare are power-law networks really?. In Proceedings of the Royal Society A, Vol. 476. 20190742.
[4]
Eduardo M. Azevedo, Alex Deng, Jose Luis Montiel Olea, Justin Rao, and E. Glen Weyl. 2020. A/B Testing with Fat Tails. Journal of Political Economy 128, 12 (2020), 4614–000.
[5]
Albert-László Barabási and Réka Albert. 1999. Emergence of Scaling in Random Networks. Science 286(1999), 509–512.
[6]
H. Bauke. 2007. Parameter estimation for power-law distributions by maximum likelihood methods. The European Physical Journal B 58 (2007), 167–173.
[7]
Bernd Blaslus. 2020. Power-law distribution in the number of confirmed COVID-19 cases. Chaos: An Interdisciplinary Journal of Nonlinear Science 30, 9(2020).
[8]
Eric Bonnet, Olivier Bour, Noelle E. Odling, Philippe Davy, Ian Main, Patience Cowie, and Brian Berkowitz. 2001. Scaling of fracture systems in geological media. Reviews of geophysics 39, 3 (2001), 347–383.
[9]
Patrick Erik Bradley and Martin Behnisch. 2019. Heavy-tailed distributions for building stock data. Environment and Planning B: Urban Analytics and City Science 46, 7(2019), 1281–1296.
[10]
Anna D. Broido and Aaron Clauset. 2019. Scale-free networks are rare. Nature communications 10, 1 (2019), 1–10.
[11]
Robert Malcolm Clark, S. J. D. Cox, and Geoff M. Laslett. 1999. Generalizations of power-law distributions applicable to sampled fault-trace lengths: model choice, parameter estimation and caveats. Geophysical Journal International 136, 2 (1999), 357–372.
[12]
Aaron Clauset, Cosma Rohilla Shalizi, and M. E. J. Newman. 2009. Power-law Distributions in Empirical Data. SIAM Rev. 51, 4 (2009), 661–703.
[13]
William G. Cochran. 1952. The Chi-square Test of Goodness of Fit. The Annals of Mathematical Statistics 23, 3 (1952), 315–345.
[14]
Donald Cochrane and Guy H. Orcutt. 1949. Application of least squares regression to relationships containing auto-correlated error terms. J. Amer. Statist. Assoc. 44, 245 (1949), 32–61.
[15]
Brian Conrad and Michael Mitzenmacher. 2004. Power laws for monkeys typing randomly: the case of unequal probabilities. IEEE Transactions on information theory 50, 7 (2004), 1403–1414.
[16]
Bernat Corominas-Murtra and Ricard V. Solé. 2010. Universality of Zipf’s Law. Physical Review E 82, 1 (2010), 011102.
[17]
Alvaro Corral and Alvaro Gonzalez. 2019. Power Law Size Distributions in Geoscience Revisited. Earth and Space Science 6, 5 (2019), 673–697.
[18]
Alvaro Corral, Isabel Serra, and Ramon Ferrer i Cancho. 2020. Distinct flavors of Zipf’s law and its maximum likelihood fitting: Rank-size and size-distribution representations. Physical Review E 102, 5 (2020), 052113.
[19]
Frederik Michel Dekking, Cornelis Kraaikamp, Hendrik Paul Lopuhaä, and Ludolf Erwin Meester. 2005. A Modern Introduction to Probability and Statistics: Understanding why and how. Springer Science & Business Media.
[20]
Anna Deluca and Alvaro Corral. 2013. Fitting and Goodness-of-Fit Test of Non-Truncated and Truncated Power-Law Distributions. Acta Geophysica 61, 6 (2013), 1351–1394.
[21]
Nicole Eikmeier and David F. Gleich. 2017. Revisiting Power-law Distributions in Spectra of Real World Networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 817–826.
[22]
Zoltan Eisler, Imre Bartos, and Janos Kertesz. 2008. Fluctuation scaling in complex systems: Taylor’s law and beyond. Advances in Physics 57, 1 (2008), 89–142.
[23]
Brian J. Enquist, Evan P. Economo, Travis E. Huxman, Andrew P. Allen, Danielle D. Ignace, and James F. Gillooly. 2003. Scaling metabolism from organisms to ecosystems. Nature 423, 6940 (2003), 639–642.
[24]
Brian J. Enquist and Karl J. Niklas. 2001. Invariant scaling relations across tree-dominated communities. Nature 410, 6829 (2001), 655–660.
[25]
Xavier Gabaix. 2009. Power Laws in Economics and Finance. Annual Review of Economics 1, 1 (2009), 255–294.
[26]
M.L. Goldstein, S.A. Morris, and G.G. Yen. 2004. Problems with fitting to the power-law distribution. The European Physical Journal B 41 (2004), 255–258.
[27]
Beno Gutenberg and Charles F. Richter. 1944. Frequency of Earthquakes in California. Bulletin of the Seismological Society of America 34, 4 (1944), 185–188.
[28]
Bo-Ping Han and Milan Straskraba. 1998. Size dependence of biomass spectra and population density I. The effects of size scales and size intervals. Journal of Theoretical Biology 191, 3 (1998), 259–265.
[29]
Rudolf Hanel, Bernat Corominas-Murtra, Bo Liu, and Stefan Thurner. 2017. Fitting power-laws in empirical data with estimators that work for all exponents. PLoS ONE 12, 2 (2017), 1–15.
[30]
Charles R. Henderson. 1975. Best Linear Unbiased Estimation and Prediction under a Selection Model. Biometrics 31, 2 (1975), 423–447.
[31]
Hawoong Jeong, Balint Tombor, Reka Albert, Zoltan N. Oltvai, and A-L. Barabasi. 2000. The Large-Scale Organization of Metabolic Networks. Nature 407, 6804 (2000), 651–654.
[32]
Sonia Kefi, Max Rietkerk, Concepcion L. Alados, Yolanda Pueyo, Vasilios P. Papanastasis, Ahmed ElAich, and Peter C. De Ruiter. 2007. Spatial vegetation patterns and imminent desertification in Mediterranean arid ecosystems. Nature 449, 7159 (2007), 213–217.
[33]
Wentian Li. 2002. Zipf’s Law Everywhere. Glottometrics 5(2002), 14–21.
[34]
Edward T. Lu and Russell J. Hamilton. 1991. Avalanches and the Distribution of Solar Flares. The Astrophysical Journal 380 (1991), L89–L92.
[35]
R. Dean Malmgren, Daniel B. Stouffer, Adilson E. Motter, and Luis AN Amaral. 2008. A Poissonian explanation for heavy tails in e-mail communication. Proceedings of the National Academy of Sciences 105, 47(2008), 18153–18158.
[36]
Timothy D. Meehan. 2006. Energy Use and Animal Abundance in Litter and Soil Communities. Ecology 87, 7 (2006), 1650–1658.
[37]
Buddhika Nettasinghe and Vikram Krishnamurthy. 2021. Maximum Likelihood Estimation of Power-law Degree Distributions via Friendship Paradox-based Sampling. ACM Transactions on Knowledge Discovery from Data 15, 6 (2021), 1–28.
[38]
Mark EJ. Newman. 2005. Power laws, Pareto distributions and Zipf’s law. Contemporary physics 46, 5 (2005), 323–351.
[39]
Jan Overgoor, Austin R. Benson, and Johan Ugander. 2019. Choosing to Grow a Graph: Modeling Network Formation as Discrete Choice. In Proceedings of the 2019 World Wide Web Conference. 1409–1420.
[40]
Karl Pearson. 1990. On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling. Philos. Mag. 5, 50 (1990), 157–175.
[41]
Steven T. Piantadosi. 2014. Zipf’s word frequency law in natural language: A critical review and future directions. Psychonomic bulletin & review 21, 5 (2014), 1112–1130.
[42]
G. Pickering, J. M. Bull, and D. J. Sanderson. 1995. Sampling power-law distributions. Tectonophysics 248(1995), 1–20.
[43]
Carla M.A. Pinto, A. Mendes Lopes, and J.A. Tenreiro Machado. 2012. A review of power laws in real life phenomena. Commun Nonlinear Sci Number Simulat 17 (2012), 3558–3578.
[44]
Robin L. Plackett. 1949. A Historical Note on the Method of Least Squares. Biometrika 36, 3/4 (1949), 458–460.
[45]
Derek J. De Solla Price. 1965. Networks of Scientific Papers. Science 149, 3683 (1965), 510–515.
[46]
Salvador Pueyo and Roger Jovani. 2006. Comment on “A Keystone Mutualism Drives Pattern in a Power Function”. Science 313, 5794 (2006), 1739–1739.
[47]
John A. Rice. 2006. Mathematical Statistics and Data Analysis. Cengage Learning.
[48]
Andrea Rinaldo, Amos Maritan, Kent K. Cavender-Bares, and Sallie W. Chisholm. 2002. Cross-scale ecological dynamics and microbial size spectra in marine ecosystems. In Proceedings of the Royal Society of London. Series B: Biological Sciences, Vol. 269. 2051–2059.
[49]
David W. Sims, David Righton, and Jonathan W. Pitchford. 2007. Minimizing errors in identifying Levy flight behaviour of organisms. Journal of Animal Ecology 76, 2 (2007), 222–229.
[50]
Nickolay Smirnov. 1948. Table for Estimating the Goodness of Fit of Empirical Distributions. Annals of Mathematical Statistics 19, 2 (1948), 279–281.
[51]
Michael A Stephens. 1974. EDF statistics for goodness of fit and some comparisons. Journal of the American statistical Association 69, 347(1974), 730–737.
[52]
Alex Stivala, Garry Robins, and Alessandro Lomi. 2020. Exponential random graph model parameter estimation for very large directed networks. PLoS ONE 15, 1 (2020), e0227804.
[53]
Gilbert Strang. 2016. Introduction to Linear Algebra. Wellesley-Cambridge Press.
[54]
Yogesh Virkar and Aaron Clauset. 2014. Power-law distributions in binned empirical data. The Annals of Applied Statistics 8, 1 (2014), 89–119.
[55]
Geoffrey B. West, James H. Brown, and Brian J. Enquist. 1997. A General Model for the Origin of Allometric Scaling Laws in Biology. Science 276, 5309 (1997), 122–126.
[56]
Ethan P. White, Brian J. Enquist, and Jessica L. Green. 2008. On estimating the exponent of power‐law frequency distributions. Ecology 89, 4 (2008), 905–912.
[57]
J. C. Willis and G. Udny Yule. 1922. Some Statistics of Evolution and Geographical Distribution in Plants and Animals, and their Significance. Nature 109(1922), 177–179.
[58]
Chengxi Zang, Peng Cui, and Wenwu Zhu. 2018. Learning and Interpreting Complex Distributions in Empirical Data. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2682–2691.
[59]
Xiaoshi Zhong. 2020. Time Expression and Named Entity Analysis and Recognition. Ph.D. Dissertation. Nanyang Technological University, Singapore.
[60]
Tommaso Zillio and Richard Condit. 2007. The impact of neutrality, niche, differentiation and species input on diversity and abundance distributions. Oikos 116(2007), 931–940.
[61]
George Zipf. 1949. Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Addison-Wesley Press, Inc.

Cited By

View all
  • (2025)Signals of propaganda—Detecting and estimating political influences in information spread in social networksPLOS ONE10.1371/journal.pone.030968820:1(e0309688)Online publication date: 30-Jan-2025
  • (2024)An Empirical Study on Common Sense-Violating Bugs in Mobile AppsACM Transactions on Software Engineering and Methodology10.1145/3709356Online publication date: 21-Dec-2024
  • (2024)On the Scale-Free Property of Citation Networks: An Empirical StudyCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3651541(541-544)Online publication date: 13-May-2024
  • Show More Cited By

Index Terms

  1. Is Least-Squares Inaccurate in Fitting Power-Law Distributions? The Criticism is Complete Nonsense
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          WWW '22: Proceedings of the ACM Web Conference 2022
          April 2022
          3764 pages
          ISBN:9781450390965
          DOI:10.1145/3485447
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Sponsors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 25 April 2022

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. Power-law distributions
          2. average strategy
          3. least-squares estimation (LSE)
          4. long-tailed noises

          Qualifiers

          • Research-article
          • Research
          • Refereed limited

          Conference

          WWW '22
          Sponsor:
          WWW '22: The ACM Web Conference 2022
          April 25 - 29, 2022
          Virtual Event, Lyon, France

          Acceptance Rates

          Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)38
          • Downloads (Last 6 weeks)5
          Reflects downloads up to 08 Mar 2025

          Other Metrics

          Citations

          Cited By

          View all
          • (2025)Signals of propaganda—Detecting and estimating political influences in information spread in social networksPLOS ONE10.1371/journal.pone.030968820:1(e0309688)Online publication date: 30-Jan-2025
          • (2024)An Empirical Study on Common Sense-Violating Bugs in Mobile AppsACM Transactions on Software Engineering and Methodology10.1145/3709356Online publication date: 21-Dec-2024
          • (2024)On the Scale-Free Property of Citation Networks: An Empirical StudyCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3651541(541-544)Online publication date: 13-May-2024
          • (2024)Distribution of mean time intervals between successive neutron counts for different phenomena and power law formsThe European Physical Journal Plus10.1140/epjp/s13360-024-05269-x139:5Online publication date: 27-May-2024

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format.

          HTML Format

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media