Abstract
At the occasion of the 40th anniversary of George Zipf's premature dead, we reanalyse his data on the frequency of Chinese words. We find the best fitting Lotka, Zipf, Bradford and Leimkuhler distribution and show that only Lotka's function is not rejected by a Kolmogorov-Smirnov test. Using an additional term to Leimkuhler's function leads to a statistically acceptable fit. In this way we can determine a core (nucleus) of most frequently used Chinese words.
Similar content being viewed by others
References
G. Miller, Introduction toThe Psycho-biology of language: An Introduction to Dynamic Philology, by G. K. Zipf, M.I.T. Press, Cambridge (Mass.), 1965.
G. K. Zipf,The Psycho-biology of language: An Introduction to Dynamic Philology, Houghton Mifflin, 1935. Reprinted in 1965 by the M.I.T. Press, Cambridge (Mass.).
P. T. Nicholls, Estimation of Zipf parameters,Journal of the American Society for Information Science, 38 (1987) 443–445.
R. E. Wyllis, Empirical and theoretical bases of Zipf's law.Library Trends, 30 (1) (1981) 53–64.
L. Egghe, R. Rousseau,An Introductions to Informetrics, Elsevier, Amsterdam, 1990.
R. Rousseau, Relations between continuous versions of bibliometric laws,Journal of the American Society for Information Science, 41 (1990) 197–203.
L. Egghe, The exact place of Zipf's and Pareto's law amongst the classical informetric laws,Scientometrics, 20 (1991) 93–106.
B. Mandelbrot,The Fractal Geometry of Nature, Freeman, New York, 1977.
B. M. Hill, The rank-frequency form of Zipf's law,Journal of the American Statistical Association, 69 (1974) 1017–1026.
B. M. Hill, A theoretical derivation of the Zipf (Pareto) law, In:H. Guither, M.V. Arapov (Eds),Studies on Zipf's Law, Brockmeyer, Bochum, 1982, pp. 53–64.
G. K. Zipf, Selected Studies of the Principle of Relative Frequency in Language, Harvard University Press, Cambridge (Mass.), 1932.
S. C. Bradford, Sources of information on specific subjects,Engineering, 137 (1934) 85–86.
J. Meyer, Gilt das Zipfsche Gesetz auch für die chinesische Schriftsprache?NTZArchiv, 11 (1989) 13–16.
W.-X. Xu, Zipf's law and mechanism of distribution of Chinese term frequency. Paper presented at the 2nd International Conference on Bibliometrics, Scientometrics and Informetrics, London (Ontario), July 1989.
Press Digest 2707 p,Current Contents, July 3, 1989.
A. J. Lotka, The frequency distribution of scientific productivity,Journal of the Washington Academy of Sciences, 16 (1926) 317–323.
G. K. Zipf,Relative Frequency as a Determinant of Phonetic Change, Harvard Studies in Classical Philology, Vol. 40, Harvard University Press, Cambridge (Mass.), 1929.
G. K. Zipf,Human Behavior and the Principle of Least Effort. Harfner, New York and London (reprinted edition), 1965.
G. Dewey,Relative Frequency of English Speech Sounds, Harvard University Press, Cambridge (Mass.), 1923.
E. V. Condon, Statistics of vocabulary,Science, 67 (1928) 300.
M. Petruszewycz, L'histoire de la loi d'Estoup-Zipf: documents,Math. Sci. hum., 11 (44) (1973) 41–56.
D. H. Hertzel, Bibliometrics, history of the developments of ideas in. In:Encyclopedia of Library and Information Science,A. Kent (Ed.), Vol. 42; suppl. 7 (1987) 144–219.
L. Egghe, The Duality of Informetric Systems with applications to the Empirical Laws, Ph.D. Thesis, The City University London (UK), 1989.
R. Rousseau, Een vleugje bibliometrie: de equivalentie tussen de wetten van Bradford en Leimkuhler (Some bibliometrics: the equivalence between the Bradford and the Leimkuhler laws),Wiskunde en Onderwijs, 13 (1987) 71–78.
L. Egghe, Applications of the theory of Bradford's law to the calculation of Leimkuhler's law and to the completion of bibliographies,Journal of the American Society for Information Science, 41 (1990) 469–492.
Q. Zhang,Obsolescence and Bradford Distribution of Rice Literature, M.Sc. Thesis, The City University London (UK), 1986.
I. Asai, A general formulation of Bradford's distribution: the graph-oriented approach,Journal of the American Society for Information Science, 32 (1981) 113–119.
R. Rousseau, The nuclear zone of a Leimkuhler curve,Journal of Documentation, 43 (1987) 322–333.
M. V. Arapov, A variational approach to frequency-rank distributions of text elements, In:H. Guiter, M.V. Arapov (Eds),Studies on Zipf's law, Brockmeyer, Bochum, 1982, pp. 29–52.
R. E. Prather, Comparison and extension of theories of Zipf and Halstead,The Computer Journal, 31 (1988) 248–252.
B. C. Brookes, Quantitative analysis in the humanities: the advantage of ranking techniques, In:H. Guiter, M.V. Arapov (Eds),Studies on Zipf's law, Brockmeyer, Bochum, 1982, pp. 65–115.
B. C. Brookes, Comments on the scope of bibliometrics. In:Informetrics 87/88, L. Egghe, R. Rousseau (Eds), Elsevier, Amsterdam, 1988, pp. 29–41.
R. Rousseau, Lotka's law and its Leimkuhler representation,Library Science with a Slant to Documentation and Information Studies, 25 (1988) 150–178.
L. Egghe, New Bradfordian laws equivalent with old Lotka laws, evolving from a source-item duality argument, In:Informetrics 89/90, L. Egghe, R. Rousseau (Eds), Elsevier, Amsterdam, 1990, 79–96.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Rousseau, R., Zhang, Q. Zipf's data on the frequency of Chinese words revisited. Scientometrics 24, 201–220 (1992). https://doi.org/10.1007/BF02017909
Received:
Issue Date:
DOI: https://doi.org/10.1007/BF02017909