research-article

Is Least-Squares Inaccurate in Fitting Power-Law Distributions? The Criticism is Complete Nonsense

Authors:

Hongkun ZhangAuthors Info & Claims

WWW '22: Proceedings of the ACM Web Conference 2022

Pages 2748 - 2758

https://doi.org/10.1145/3485447.3511995

Published: 25 April 2022 Publication History

Abstract

Ordinary least-squares estimation is proved to be the best linear unbiased estimator according to the Gauss-Markov theorem. In the last two decades, however, some researchers criticized that least-squares was substantially inaccurate in fitting power-law distributions; such criticism has caused a strong bias in research community. In this paper, we conduct extensive experiments to rebut that such criticism is complete nonsense. Specifically, we sample different sizes of discrete and continuous data from power-law models, showing that even though the long-tailed noises are sampled from power-law models, they cannot be treated as power-law data. We define the correct way to bin continuous power-law data into data points and propose an average strategy for least-squares to fit power-law distributions. Experiments on both simulated and real-world data show that our proposed method fits power-law data perfectly. We uncover a fundamental flaw in the popular method proposed by Clauset et al. [12]: it tends to discard the majority of power-law data and fit the long-tailed noises. Experiments also show that the reverse cumulative distribution function is a bad idea to plot power-law data in practice because it usually hides the true probability distribution of data. We hope that our research can clean up the bias about least-squares fitting power-law distributions.

Source code can be found at https://github.com/xszhong/LSavg.

References

[1]

Lada A. Adamic and Bernardo A. Huberman. 2000. The Nature of Markets in the World Wide Web. Quarterly Journal of Electronic Commerce 1, 1 (2000), 5–12.

[2]

Reka Albert, Hawoong Jeong, and Albert-László Barabási. 1999. Diameter of the World-Wide Web. Nature 401(1999), 130.

[3]

I. Artico, I. Smolyarenko, V. Vinciotti, and E. C. Wit. 2020. How rare are power-law networks really?. In Proceedings of the Royal Society A, Vol. 476. 20190742.

[4]

Eduardo M. Azevedo, Alex Deng, Jose Luis Montiel Olea, Justin Rao, and E. Glen Weyl. 2020. A/B Testing with Fat Tails. Journal of Political Economy 128, 12 (2020), 4614–000.

[5]

Albert-László Barabási and Réka Albert. 1999. Emergence of Scaling in Random Networks. Science 286(1999), 509–512.

[6]

H. Bauke. 2007. Parameter estimation for power-law distributions by maximum likelihood methods. The European Physical Journal B 58 (2007), 167–173.

[7]

Bernd Blaslus. 2020. Power-law distribution in the number of confirmed COVID-19 cases. Chaos: An Interdisciplinary Journal of Nonlinear Science 30, 9(2020).

[8]

Eric Bonnet, Olivier Bour, Noelle E. Odling, Philippe Davy, Ian Main, Patience Cowie, and Brian Berkowitz. 2001. Scaling of fracture systems in geological media. Reviews of geophysics 39, 3 (2001), 347–383.

[9]

Patrick Erik Bradley and Martin Behnisch. 2019. Heavy-tailed distributions for building stock data. Environment and Planning B: Urban Analytics and City Science 46, 7(2019), 1281–1296.

[10]

Anna D. Broido and Aaron Clauset. 2019. Scale-free networks are rare. Nature communications 10, 1 (2019), 1–10.

[11]

Robert Malcolm Clark, S. J. D. Cox, and Geoff M. Laslett. 1999. Generalizations of power-law distributions applicable to sampled fault-trace lengths: model choice, parameter estimation and caveats. Geophysical Journal International 136, 2 (1999), 357–372.

[12]

Aaron Clauset, Cosma Rohilla Shalizi, and M. E. J. Newman. 2009. Power-law Distributions in Empirical Data. SIAM Rev. 51, 4 (2009), 661–703.

Digital Library

[13]

William G. Cochran. 1952. The Chi-square Test of Goodness of Fit. The Annals of Mathematical Statistics 23, 3 (1952), 315–345.

[14]

Donald Cochrane and Guy H. Orcutt. 1949. Application of least squares regression to relationships containing auto-correlated error terms. J. Amer. Statist. Assoc. 44, 245 (1949), 32–61.

[15]

Brian Conrad and Michael Mitzenmacher. 2004. Power laws for monkeys typing randomly: the case of unequal probabilities. IEEE Transactions on information theory 50, 7 (2004), 1403–1414.

Digital Library

[16]

Bernat Corominas-Murtra and Ricard V. Solé. 2010. Universality of Zipf’s Law. Physical Review E 82, 1 (2010), 011102.

[17]

Alvaro Corral and Alvaro Gonzalez. 2019. Power Law Size Distributions in Geoscience Revisited. Earth and Space Science 6, 5 (2019), 673–697.

[18]

Alvaro Corral, Isabel Serra, and Ramon Ferrer i Cancho. 2020. Distinct flavors of Zipf’s law and its maximum likelihood fitting: Rank-size and size-distribution representations. Physical Review E 102, 5 (2020), 052113.

[19]

Frederik Michel Dekking, Cornelis Kraaikamp, Hendrik Paul Lopuhaä, and Ludolf Erwin Meester. 2005. A Modern Introduction to Probability and Statistics: Understanding why and how. Springer Science & Business Media.

[20]

Anna Deluca and Alvaro Corral. 2013. Fitting and Goodness-of-Fit Test of Non-Truncated and Truncated Power-Law Distributions. Acta Geophysica 61, 6 (2013), 1351–1394.

[21]

Nicole Eikmeier and David F. Gleich. 2017. Revisiting Power-law Distributions in Spectra of Real World Networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 817–826.

Digital Library

[22]

Zoltan Eisler, Imre Bartos, and Janos Kertesz. 2008. Fluctuation scaling in complex systems: Taylor’s law and beyond. Advances in Physics 57, 1 (2008), 89–142.

[23]

Brian J. Enquist, Evan P. Economo, Travis E. Huxman, Andrew P. Allen, Danielle D. Ignace, and James F. Gillooly. 2003. Scaling metabolism from organisms to ecosystems. Nature 423, 6940 (2003), 639–642.

[24]

Brian J. Enquist and Karl J. Niklas. 2001. Invariant scaling relations across tree-dominated communities. Nature 410, 6829 (2001), 655–660.

[25]

Xavier Gabaix. 2009. Power Laws in Economics and Finance. Annual Review of Economics 1, 1 (2009), 255–294.

[26]

M.L. Goldstein, S.A. Morris, and G.G. Yen. 2004. Problems with fitting to the power-law distribution. The European Physical Journal B 41 (2004), 255–258.

[27]

Beno Gutenberg and Charles F. Richter. 1944. Frequency of Earthquakes in California. Bulletin of the Seismological Society of America 34, 4 (1944), 185–188.

[28]

Bo-Ping Han and Milan Straskraba. 1998. Size dependence of biomass spectra and population density I. The effects of size scales and size intervals. Journal of Theoretical Biology 191, 3 (1998), 259–265.

[29]

Rudolf Hanel, Bernat Corominas-Murtra, Bo Liu, and Stefan Thurner. 2017. Fitting power-laws in empirical data with estimators that work for all exponents. PLoS ONE 12, 2 (2017), 1–15.

[30]

Charles R. Henderson. 1975. Best Linear Unbiased Estimation and Prediction under a Selection Model. Biometrics 31, 2 (1975), 423–447.

[31]

Hawoong Jeong, Balint Tombor, Reka Albert, Zoltan N. Oltvai, and A-L. Barabasi. 2000. The Large-Scale Organization of Metabolic Networks. Nature 407, 6804 (2000), 651–654.

[32]

Sonia Kefi, Max Rietkerk, Concepcion L. Alados, Yolanda Pueyo, Vasilios P. Papanastasis, Ahmed ElAich, and Peter C. De Ruiter. 2007. Spatial vegetation patterns and imminent desertification in Mediterranean arid ecosystems. Nature 449, 7159 (2007), 213–217.

[33]

Wentian Li. 2002. Zipf’s Law Everywhere. Glottometrics 5(2002), 14–21.

[34]

Edward T. Lu and Russell J. Hamilton. 1991. Avalanches and the Distribution of Solar Flares. The Astrophysical Journal 380 (1991), L89–L92.

[35]

R. Dean Malmgren, Daniel B. Stouffer, Adilson E. Motter, and Luis AN Amaral. 2008. A Poissonian explanation for heavy tails in e-mail communication. Proceedings of the National Academy of Sciences 105, 47(2008), 18153–18158.

[36]

Timothy D. Meehan. 2006. Energy Use and Animal Abundance in Litter and Soil Communities. Ecology 87, 7 (2006), 1650–1658.

[37]

Buddhika Nettasinghe and Vikram Krishnamurthy. 2021. Maximum Likelihood Estimation of Power-law Degree Distributions via Friendship Paradox-based Sampling. ACM Transactions on Knowledge Discovery from Data 15, 6 (2021), 1–28.

[38]

Mark EJ. Newman. 2005. Power laws, Pareto distributions and Zipf’s law. Contemporary physics 46, 5 (2005), 323–351.

[39]

Jan Overgoor, Austin R. Benson, and Johan Ugander. 2019. Choosing to Grow a Graph: Modeling Network Formation as Discrete Choice. In Proceedings of the 2019 World Wide Web Conference. 1409–1420.

Digital Library

[40]

Karl Pearson. 1990. On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling. Philos. Mag. 5, 50 (1990), 157–175.

[41]

Steven T. Piantadosi. 2014. Zipf’s word frequency law in natural language: A critical review and future directions. Psychonomic bulletin & review 21, 5 (2014), 1112–1130.

[42]

G. Pickering, J. M. Bull, and D. J. Sanderson. 1995. Sampling power-law distributions. Tectonophysics 248(1995), 1–20.

[43]

Carla M.A. Pinto, A. Mendes Lopes, and J.A. Tenreiro Machado. 2012. A review of power laws in real life phenomena. Commun Nonlinear Sci Number Simulat 17 (2012), 3558–3578.

[44]

Robin L. Plackett. 1949. A Historical Note on the Method of Least Squares. Biometrika 36, 3/4 (1949), 458–460.

[45]

Derek J. De Solla Price. 1965. Networks of Scientific Papers. Science 149, 3683 (1965), 510–515.

[46]

Salvador Pueyo and Roger Jovani. 2006. Comment on “A Keystone Mutualism Drives Pattern in a Power Function”. Science 313, 5794 (2006), 1739–1739.

[47]

John A. Rice. 2006. Mathematical Statistics and Data Analysis. Cengage Learning.

[48]

Andrea Rinaldo, Amos Maritan, Kent K. Cavender-Bares, and Sallie W. Chisholm. 2002. Cross-scale ecological dynamics and microbial size spectra in marine ecosystems. In Proceedings of the Royal Society of London. Series B: Biological Sciences, Vol. 269. 2051–2059.

[49]

David W. Sims, David Righton, and Jonathan W. Pitchford. 2007. Minimizing errors in identifying Levy flight behaviour of organisms. Journal of Animal Ecology 76, 2 (2007), 222–229.

[50]

Nickolay Smirnov. 1948. Table for Estimating the Goodness of Fit of Empirical Distributions. Annals of Mathematical Statistics 19, 2 (1948), 279–281.

[51]

Michael A Stephens. 1974. EDF statistics for goodness of fit and some comparisons. Journal of the American statistical Association 69, 347(1974), 730–737.

[52]

Alex Stivala, Garry Robins, and Alessandro Lomi. 2020. Exponential random graph model parameter estimation for very large directed networks. PLoS ONE 15, 1 (2020), e0227804.

[53]

Gilbert Strang. 2016. Introduction to Linear Algebra. Wellesley-Cambridge Press.

[54]

Yogesh Virkar and Aaron Clauset. 2014. Power-law distributions in binned empirical data. The Annals of Applied Statistics 8, 1 (2014), 89–119.

[55]

Geoffrey B. West, James H. Brown, and Brian J. Enquist. 1997. A General Model for the Origin of Allometric Scaling Laws in Biology. Science 276, 5309 (1997), 122–126.

[56]

Ethan P. White, Brian J. Enquist, and Jessica L. Green. 2008. On estimating the exponent of power‐law frequency distributions. Ecology 89, 4 (2008), 905–912.

[57]

J. C. Willis and G. Udny Yule. 1922. Some Statistics of Evolution and Geographical Distribution in Plants and Animals, and their Significance. Nature 109(1922), 177–179.

[58]

Chengxi Zang, Peng Cui, and Wenwu Zhu. 2018. Learning and Interpreting Complex Distributions in Empirical Data. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2682–2691.

Digital Library

[59]

Xiaoshi Zhong. 2020. Time Expression and Named Entity Analysis and Recognition. Ph.D. Dissertation. Nanyang Technological University, Singapore.

[60]

Tommaso Zillio and Richard Condit. 2007. The impact of neutrality, niche, differentiation and species input on diversity and abundance distributions. Oikos 116(2007), 931–940.

[61]

George Zipf. 1949. Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Addison-Wesley Press, Inc.

Cited By

Sela ANeter OLohr VCihelka PWang FZwilling MPhillip Sabou JUlman M(2025)Signals of propaganda—Detecting and estimating political influences in information spread in social networksPLOS ONE10.1371/journal.pone.030968820:1(e0309688)Online publication date: 30-Jan-2025
https://doi.org/10.1371/journal.pone.0309688
Fan FJiang YChen TZhang HZhang YNiu NLiu H(2024)An Empirical Study on Common Sense-Violating Bugs in Mobile AppsACM Transactions on Software Engineering and Methodology10.1145/3709356Online publication date: 21-Dec-2024
https://dl.acm.org/doi/10.1145/3709356
Zhong XLiang HChua TNgo CKumar RLauw HKa-Wei Lee R(2024)On the Scale-Free Property of Citation Networks: An Empirical StudyCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3651541(541-544)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589335.3651541
Show More Cited By

Index Terms

Is Least-Squares Inaccurate in Fitting Power-Law Distributions? The Criticism is Complete Nonsense

Index terms have been assigned to the content through auto-classification.

Recommendations

Power-Law Distributions in Empirical Data

Power-law distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and man-made phenomena. Unfortunately, the detection and characterization of power laws is complicated by the ...
Fitting Johnson distributions using least squares: simulation applications
WSC '85: Proceedings of the 17th conference on Winter simulation

A weighted least squares regression method is proposed for fitting cumulative probability distributions to data. This technique is illustrated for the Johnson translation system of distributions. The least squares procedure minimizes the distance between ...
Revisiting Power-law Distributions in Spectra of Real World Networks
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

By studying a large number of real world graphs, we find empirical evidence that most real world graphs have a statistically significant power-law distribution with a cutoff in the singular values of the adjacency matrix and eigenvalues of the Laplacian ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '22: Proceedings of the ACM Web Conference 2022

April 2022

3764 pages

ISBN:9781450390965

DOI:10.1145/3485447

Editors:
Frédérique Laforest
INSA Lyon, France
,
Raphaël Troncy
EURECOM, France
,
Elena Simperl
King’s College London, UK
,
Deepak Agarwal
Pinterest, USA
,
Aristides Gionis
KTH Royal Institute of Technology, Sweden
,
Ivan Herman
W3C / retired
,
Lionel Médini
Université Lyon 1, France

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 April 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '22

Sponsor:

SIGWEB

WWW '22: The ACM Web Conference 2022

April 25 - 29, 2022

Virtual Event, Lyon, France

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
233
Total Downloads

Downloads (Last 12 months)38
Downloads (Last 6 weeks)5

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Sela ANeter OLohr VCihelka PWang FZwilling MPhillip Sabou JUlman M(2025)Signals of propaganda—Detecting and estimating political influences in information spread in social networksPLOS ONE10.1371/journal.pone.030968820:1(e0309688)Online publication date: 30-Jan-2025
https://doi.org/10.1371/journal.pone.0309688
Fan FJiang YChen TZhang HZhang YNiu NLiu H(2024)An Empirical Study on Common Sense-Violating Bugs in Mobile AppsACM Transactions on Software Engineering and Methodology10.1145/3709356Online publication date: 21-Dec-2024
https://dl.acm.org/doi/10.1145/3709356
Zhong XLiang HChua TNgo CKumar RLauw HKa-Wei Lee R(2024)On the Scale-Free Property of Citation Networks: An Empirical StudyCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3651541(541-544)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589335.3651541
Avdic SDemirovic DHadzimustafic ECickusic ZKunosic S(2024)Distribution of mean time intervals between successive neutron counts for different phenomena and power law formsThe European Physical Journal Plus10.1140/epjp/s13360-024-05269-x139:5Online publication date: 27-May-2024
https://doi.org/10.1140/epjp/s13360-024-05269-x

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten