Abstract
This research examines the vocabulary used in Chinese newspapers using a diachronic corpus spanning 77 years, from 1872 to 1949. The Zipfian distribution in word use can be observed in the corpus, and the top frequency of words varies dramatically throughout epochs. The frequency of word types and tokens exhibits an inverted V-shaped trend. In terms of word entropy, a similar tendency has been discovered. Words that existed only once in history do well in representing linguistic life in the period. At the same time, the proportion of new words in the entire corpus is decreasing, reflecting the steady stabilizing of word growth.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Xun, E.-D., Rao, G.-Q., Xie, J.-L., Huang, Z.-E.: Diachronic retrieval for modern Chinese word: system construction and its application. J. China Inf. Process. 29(03), 169–176 (2015). (In Chinese)
Xun, E.-D., Rao, G.-Q., Xiao, X.-Y., Zang, J.-J.: The construction of the BCC corpus in the age of big data. Corpus Linguist. 3(1), 93–109 (2016). (In Chinese)
Ye, F.-S., Xu, T.-Q.: Linguistics’ Outline (Revision), 2nd edn. Peking University Press, Beijing (2010). (In Chinese)
Chinese Teaching and Research Group, Chinese Department, Beijing Normal University.: The Changes and Development of Chinese Written Language Since the May 4th Movement. The Commercial Press, Beijing (1959). (In Chinese)
Yang, D.-Z.: The origin and development of “modern” neologisms as seen from new scientific terminologies in new novels of the period from late Qing dynasty to the republic of China. J. Shandong Univ. (Philos. Soc. Sci.) 01, 147–153 (2007). (In Chinese)
Zhang, L.: New expressions and change in urban life focusing on Shanghai in late Qing dynasty (1843–1925). J. Shanghai Normal Univ. (Philos. Soc. Sci.) 03, 110–115 (2008). (In Chinese)
Liu, X.-Y.: Research on neologism between the late Qing and the early republic of China. J. Hebei Univ. (Philos. Soc. Sci.) 40(04), 55–59 (2015). (in Chinese)
Shi, Z.-Q., Zhang, P.: Comparative analysis of circulation lexicon and usage lexicon based on DCC dynamic circulation newspaper corpus. In: 8th National Joint Conference on Computational Linguistics on Proceedings, pp. 212–218. Tsinghua University Press, Beijing (2005). (In Chinese)
Hou, M.: Language monitoring and quantitative study of words. In: 25th Annual Conference of CIPSC on Proceedings, pp. 106–114. Tsinghua University Press, Beijing (2006). (In Chinese)
Su, X.-C., Yang, E.-H.: An analysis of the statistics of the Chinese vocabulary in 2005. J. Xiamen Univ. (Arts Soc. Sci.) 06, 84–91 (2006). (In Chinese)
Liu, C.-Z., Qin, P.: A survey of idiom usage based on the DCC of Chinese mainstream newspapers. Appl. Linguist. 3, 78–86 (2007). (In Chinese)
Rao, G.-Q., Li, Y.-M.: Extraction and investigation of modern Chinese long-lasting stable words based on 70 years newspaper corpus. J. Chinese Inf. Process. 6, 49–58 (2016). (In Chinese)
Zhao, X., Gu, X.-Y.: A computational stylistic analysis on language of newspaper news headlines during the war of resistance against Japanese aggression: exemplified by xinhua daily and central daily news. Theory Modernization 05, 114–119 (2015). (In Chinese)
Wang, T.-K., Hou, M., Yang, E.-H.: A survey on the use of Chinese characters in newspapers, radio and television, and the internet. Appl. Linguist. 1, 29–37 (2007). (In Chinese)
Chang, Z.-B.: A brief analysis of the spreading of the new words and phrases in the new era among the newspapers and some other media. J. SJTU (Soc. Sci. Ed.) 04, 97–101 (2001). (In Chinese)
Han, X.-J.: Research on Word Distribution of General Words and Relations among Characters Words and Phrases Based on Dynamic Circulating Corpus. Beijing Language and Culture University (2007). (In Chinese)
Zhao, R.-Q.: Language Variation of Chinese Newspapers in Hong Kong since the 20th century. Minzu University of China (2005). (In Chinese)
Feng, Z.: A Research on the Vocabulary of Shen Daily in the Period of Late Qing Dynasty and the Early Republic of China. Jilin University(2021). (In Chinese)
Yu, S.-W., Duan, H.-M., Zhu, X.-F., Sun, B.: The basic processing of contemporary Chinese corpus at Peking university Specification. J. China Inf. Process. 05, 49–64 (2002). (In Chinese)
Yu, S.-W., Duan, H.-M., Zhu, X.-F., Sun, B.: The basic processing of contemporary Chinese corpus at peking university specification(sequel). J. China Inf. Process. 06, 58–64 (2002). (In Chinese)
Acknowledgments
This paper is supported by MOE Funds of Humanity and Social Sciences “Quantitative Research on Words Usage in Newspaper since late Qing Dynasty” (20YJC740050).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Li, Z., Gao, T., Huang, G., Rao, G. (2023). Study of Chinese Words in Diachronic Corpus of Newspaper. In: Su, Q., Xu, G., Yang, X. (eds) Chinese Lexical Semantics. CLSW 2022. Lecture Notes in Computer Science(), vol 13495. Springer, Cham. https://doi.org/10.1007/978-3-031-28953-8_29
Download citation
DOI: https://doi.org/10.1007/978-3-031-28953-8_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28952-1
Online ISBN: 978-3-031-28953-8
eBook Packages: Computer ScienceComputer Science (R0)