ABSTRACT
In this article we investigate the properties of the frequency distribution of numbers on the Web. We work with a part of the Common Crawl dataset comprising 3.8 billion Web documents and a recent dump of the English language Wikipedia. We show that, like words, numbers on the Web follow a Power law distribution, and obey Benford's law of first-digits. We show and explain regularities in the distribution, and compare the regularities in Common Crawl to those in Wikipedia. The comparison stresses which patterns in the frequency distributions follow from human thought.
- M. Newman. Power laws, pareto distributions and zipf's law. Contemporary physics, 46(5):323--351, 200. http://www.auditnet.org/articles/JFA-V-1--17--34.pdf}.Google Scholar
Index Terms
- Number frequency on the web
Recommendations
Analyses of baby name popularity distribution in U.S. for the last 131 years
We examine the complete dataset of baby name popularity collected by U.S. Social Security Administration for the last 131 years (1880–2010). The ranked baby name popularity can be fitted empirically by a piecewise function consisting of Beta function ...
An investigation of Zipf's Law for fraud detection (DSS#06-10-1826R(2))
Fraud risk is higher than ever before. Unfortunately, many auditors lack the expertise to deal with the related risks. The objectives of this research are to develop an innovative fraud detection mechanism on the basis of Zipf's Law. The purpose of this ...
The distribution of Web citations
A substantial amount of research has focused on the persistence or availability of Web citations. The present study analyzes Web citation distributions. Web citations are defined as the mentions of the URLs of Web pages (Web resources) as references in ...
Comments