Skip to main content
Log in

Zipf's data on the frequency of Chinese words revisited

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

At the occasion of the 40th anniversary of George Zipf's premature dead, we reanalyse his data on the frequency of Chinese words. We find the best fitting Lotka, Zipf, Bradford and Leimkuhler distribution and show that only Lotka's function is not rejected by a Kolmogorov-Smirnov test. Using an additional term to Leimkuhler's function leads to a statistically acceptable fit. In this way we can determine a core (nucleus) of most frequently used Chinese words.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. G. Miller, Introduction toThe Psycho-biology of language: An Introduction to Dynamic Philology, by G. K. Zipf, M.I.T. Press, Cambridge (Mass.), 1965.

    Google Scholar 

  2. G. K. Zipf,The Psycho-biology of language: An Introduction to Dynamic Philology, Houghton Mifflin, 1935. Reprinted in 1965 by the M.I.T. Press, Cambridge (Mass.).

  3. P. T. Nicholls, Estimation of Zipf parameters,Journal of the American Society for Information Science, 38 (1987) 443–445.

    Google Scholar 

  4. R. E. Wyllis, Empirical and theoretical bases of Zipf's law.Library Trends, 30 (1) (1981) 53–64.

    Google Scholar 

  5. L. Egghe, R. Rousseau,An Introductions to Informetrics, Elsevier, Amsterdam, 1990.

    Google Scholar 

  6. R. Rousseau, Relations between continuous versions of bibliometric laws,Journal of the American Society for Information Science, 41 (1990) 197–203.

    Google Scholar 

  7. L. Egghe, The exact place of Zipf's and Pareto's law amongst the classical informetric laws,Scientometrics, 20 (1991) 93–106.

    Google Scholar 

  8. B. Mandelbrot,The Fractal Geometry of Nature, Freeman, New York, 1977.

    Google Scholar 

  9. B. M. Hill, The rank-frequency form of Zipf's law,Journal of the American Statistical Association, 69 (1974) 1017–1026.

    Google Scholar 

  10. B. M. Hill, A theoretical derivation of the Zipf (Pareto) law, In:H. Guither, M.V. Arapov (Eds),Studies on Zipf's Law, Brockmeyer, Bochum, 1982, pp. 53–64.

    Google Scholar 

  11. G. K. Zipf, Selected Studies of the Principle of Relative Frequency in Language, Harvard University Press, Cambridge (Mass.), 1932.

    Google Scholar 

  12. S. C. Bradford, Sources of information on specific subjects,Engineering, 137 (1934) 85–86.

    Google Scholar 

  13. J. Meyer, Gilt das Zipfsche Gesetz auch für die chinesische Schriftsprache?NTZArchiv, 11 (1989) 13–16.

    Google Scholar 

  14. W.-X. Xu, Zipf's law and mechanism of distribution of Chinese term frequency. Paper presented at the 2nd International Conference on Bibliometrics, Scientometrics and Informetrics, London (Ontario), July 1989.

  15. Press Digest 2707 p,Current Contents, July 3, 1989.

  16. A. J. Lotka, The frequency distribution of scientific productivity,Journal of the Washington Academy of Sciences, 16 (1926) 317–323.

    Google Scholar 

  17. G. K. Zipf,Relative Frequency as a Determinant of Phonetic Change, Harvard Studies in Classical Philology, Vol. 40, Harvard University Press, Cambridge (Mass.), 1929.

    Google Scholar 

  18. G. K. Zipf,Human Behavior and the Principle of Least Effort. Harfner, New York and London (reprinted edition), 1965.

    Google Scholar 

  19. G. Dewey,Relative Frequency of English Speech Sounds, Harvard University Press, Cambridge (Mass.), 1923.

    Google Scholar 

  20. E. V. Condon, Statistics of vocabulary,Science, 67 (1928) 300.

    Google Scholar 

  21. M. Petruszewycz, L'histoire de la loi d'Estoup-Zipf: documents,Math. Sci. hum., 11 (44) (1973) 41–56.

    Google Scholar 

  22. D. H. Hertzel, Bibliometrics, history of the developments of ideas in. In:Encyclopedia of Library and Information Science,A. Kent (Ed.), Vol. 42; suppl. 7 (1987) 144–219.

  23. L. Egghe, The Duality of Informetric Systems with applications to the Empirical Laws, Ph.D. Thesis, The City University London (UK), 1989.

    Google Scholar 

  24. R. Rousseau, Een vleugje bibliometrie: de equivalentie tussen de wetten van Bradford en Leimkuhler (Some bibliometrics: the equivalence between the Bradford and the Leimkuhler laws),Wiskunde en Onderwijs, 13 (1987) 71–78.

    Google Scholar 

  25. L. Egghe, Applications of the theory of Bradford's law to the calculation of Leimkuhler's law and to the completion of bibliographies,Journal of the American Society for Information Science, 41 (1990) 469–492.

    Google Scholar 

  26. Q. Zhang,Obsolescence and Bradford Distribution of Rice Literature, M.Sc. Thesis, The City University London (UK), 1986.

    Google Scholar 

  27. I. Asai, A general formulation of Bradford's distribution: the graph-oriented approach,Journal of the American Society for Information Science, 32 (1981) 113–119.

    Google Scholar 

  28. R. Rousseau, The nuclear zone of a Leimkuhler curve,Journal of Documentation, 43 (1987) 322–333.

    Google Scholar 

  29. M. V. Arapov, A variational approach to frequency-rank distributions of text elements, In:H. Guiter, M.V. Arapov (Eds),Studies on Zipf's law, Brockmeyer, Bochum, 1982, pp. 29–52.

    Google Scholar 

  30. R. E. Prather, Comparison and extension of theories of Zipf and Halstead,The Computer Journal, 31 (1988) 248–252.

    Google Scholar 

  31. B. C. Brookes, Quantitative analysis in the humanities: the advantage of ranking techniques, In:H. Guiter, M.V. Arapov (Eds),Studies on Zipf's law, Brockmeyer, Bochum, 1982, pp. 65–115.

    Google Scholar 

  32. B. C. Brookes, Comments on the scope of bibliometrics. In:Informetrics 87/88, L. Egghe, R. Rousseau (Eds), Elsevier, Amsterdam, 1988, pp. 29–41.

    Google Scholar 

  33. R. Rousseau, Lotka's law and its Leimkuhler representation,Library Science with a Slant to Documentation and Information Studies, 25 (1988) 150–178.

    Google Scholar 

  34. L. Egghe, New Bradfordian laws equivalent with old Lotka laws, evolving from a source-item duality argument, In:Informetrics 89/90, L. Egghe, R. Rousseau (Eds), Elsevier, Amsterdam, 1990, 79–96.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rousseau, R., Zhang, Q. Zipf's data on the frequency of Chinese words revisited. Scientometrics 24, 201–220 (1992). https://doi.org/10.1007/BF02017909

Download citation

  • Received:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02017909

Keywords

Navigation