Skip to main content

Study of Chinese Words in Diachronic Corpus of Newspaper

  • Conference paper
  • First Online:
Chinese Lexical Semantics (CLSW 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13495))

Included in the following conference series:

Abstract

This research examines the vocabulary used in Chinese newspapers using a diachronic corpus spanning 77 years, from 1872 to 1949. The Zipfian distribution in word use can be observed in the corpus, and the top frequency of words varies dramatically throughout epochs. The frequency of word types and tokens exhibits an inverted V-shaped trend. In terms of word entropy, a similar tendency has been discovered. Words that existed only once in history do well in representing linguistic life in the period. At the same time, the proportion of new words in the entire corpus is decreasing, reflecting the steady stabilizing of word growth.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Xun, E.-D., Rao, G.-Q., Xie, J.-L., Huang, Z.-E.: Diachronic retrieval for modern Chinese word: system construction and its application. J. China Inf. Process. 29(03), 169–176 (2015). (In Chinese)

    Google Scholar 

  2. Xun, E.-D., Rao, G.-Q., Xiao, X.-Y., Zang, J.-J.: The construction of the BCC corpus in the age of big data. Corpus Linguist. 3(1), 93–109 (2016). (In Chinese)

    Google Scholar 

  3. Ye, F.-S., Xu, T.-Q.: Linguistics’ Outline (Revision), 2nd edn. Peking University Press, Beijing (2010). (In Chinese)

    Google Scholar 

  4. Chinese Teaching and Research Group, Chinese Department, Beijing Normal University.: The Changes and Development of Chinese Written Language Since the May 4th Movement. The Commercial Press, Beijing (1959). (In Chinese)

    Google Scholar 

  5. Yang, D.-Z.: The origin and development of “modern” neologisms as seen from new scientific terminologies in new novels of the period from late Qing dynasty to the republic of China. J. Shandong Univ. (Philos. Soc. Sci.) 01, 147–153 (2007). (In Chinese)

    Google Scholar 

  6. Zhang, L.: New expressions and change in urban life focusing on Shanghai in late Qing dynasty (1843–1925). J. Shanghai Normal Univ. (Philos. Soc. Sci.) 03, 110–115 (2008). (In Chinese)

    Google Scholar 

  7. Liu, X.-Y.: Research on neologism between the late Qing and the early republic of China. J. Hebei Univ. (Philos. Soc. Sci.) 40(04), 55–59 (2015). (in Chinese)

    Google Scholar 

  8. Shi, Z.-Q., Zhang, P.: Comparative analysis of circulation lexicon and usage lexicon based on DCC dynamic circulation newspaper corpus. In: 8th National Joint Conference on Computational Linguistics on Proceedings, pp. 212–218. Tsinghua University Press, Beijing (2005). (In Chinese)

    Google Scholar 

  9. Hou, M.: Language monitoring and quantitative study of words. In: 25th Annual Conference of CIPSC on Proceedings, pp. 106–114. Tsinghua University Press, Beijing (2006). (In Chinese)

    Google Scholar 

  10. Su, X.-C., Yang, E.-H.: An analysis of the statistics of the Chinese vocabulary in 2005. J. Xiamen Univ. (Arts Soc. Sci.) 06, 84–91 (2006). (In Chinese)

    Google Scholar 

  11. Liu, C.-Z., Qin, P.: A survey of idiom usage based on the DCC of Chinese mainstream newspapers. Appl. Linguist. 3, 78–86 (2007). (In Chinese)

    Google Scholar 

  12. Rao, G.-Q., Li, Y.-M.: Extraction and investigation of modern Chinese long-lasting stable words based on 70 years newspaper corpus. J. Chinese Inf. Process. 6, 49–58 (2016). (In Chinese)

    Google Scholar 

  13. Zhao, X., Gu, X.-Y.: A computational stylistic analysis on language of newspaper news headlines during the war of resistance against Japanese aggression: exemplified by xinhua daily and central daily news. Theory Modernization 05, 114–119 (2015). (In Chinese)

    Google Scholar 

  14. Wang, T.-K., Hou, M., Yang, E.-H.: A survey on the use of Chinese characters in newspapers, radio and television, and the internet. Appl. Linguist. 1, 29–37 (2007). (In Chinese)

    Google Scholar 

  15. Chang, Z.-B.: A brief analysis of the spreading of the new words and phrases in the new era among the newspapers and some other media. J. SJTU (Soc. Sci. Ed.) 04, 97–101 (2001). (In Chinese)

    Google Scholar 

  16. Han, X.-J.: Research on Word Distribution of General Words and Relations among Characters Words and Phrases Based on Dynamic Circulating Corpus. Beijing Language and Culture University (2007). (In Chinese)

    Google Scholar 

  17. Zhao, R.-Q.: Language Variation of Chinese Newspapers in Hong Kong since the 20th century. Minzu University of China (2005). (In Chinese)

    Google Scholar 

  18. Feng, Z.: A Research on the Vocabulary of Shen Daily in the Period of Late Qing Dynasty and the Early Republic of China. Jilin University(2021). (In Chinese)

    Google Scholar 

  19. Yu, S.-W., Duan, H.-M., Zhu, X.-F., Sun, B.: The basic processing of contemporary Chinese corpus at Peking university Specification. J. China Inf. Process. 05, 49–64 (2002). (In Chinese)

    Google Scholar 

  20. Yu, S.-W., Duan, H.-M., Zhu, X.-F., Sun, B.: The basic processing of contemporary Chinese corpus at peking university specification(sequel). J. China Inf. Process. 06, 58–64 (2002). (In Chinese)

    Google Scholar 

Download references

Acknowledgments

This paper is supported by MOE Funds of Humanity and Social Sciences “Quantitative Research on Words Usage in Newspaper since late Qing Dynasty” (20YJC740050).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gaoqi Rao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, Z., Gao, T., Huang, G., Rao, G. (2023). Study of Chinese Words in Diachronic Corpus of Newspaper. In: Su, Q., Xu, G., Yang, X. (eds) Chinese Lexical Semantics. CLSW 2022. Lecture Notes in Computer Science(), vol 13495. Springer, Cham. https://doi.org/10.1007/978-3-031-28953-8_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-28953-8_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-28952-1

  • Online ISBN: 978-3-031-28953-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics