Abstract
Honorifics in this paper refer to names of official positions and titles of nobility or honor. They can be found in various written records in different periods and have great historical significance. This paper introduces a machine learning system to recognize the honorifics in diachronic corpora. A tagged corpus of four classic novels written in the Ming and Qing dynasties is used to train the system. The system is then used to automatically recognize and extract the honorifics in pre-Qin classics, Tang-dynasty poems, and modern Chinese news. Experimental results show that the system can achieve relatively good results in recognizing the honorifics in the pre-Qin classics and Tang-dynasty poems. This work is an attempt to improve the performance of automatic recognition of honorifics in diachronic corpora. The system can be a helpful tool in the studies on the evolution of honorifics throughout Chinese history.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Xiong, D., Lu, Q., Lo, F., Shi, D., Chiu, T.-s., Li, W.: Specification for Segmentation and Named Entity Annotation of Chinese Classics in the Ming and Qing Dynasties. In: Ji, D., Xiao, G. (eds.) CLSW 2012. LNCS, vol. 7717, pp. 280–293. Springer, Heidelberg (2013)
Yu, L.N.: Dictionary of Chinese Bureaucracy. Heilongjiang People’s Publishing House, Harbin (俞鹿年:中國官制大辭典.黑龍江人民出版社,哈爾濱) (1992). (in Chinese)
Zhang, Z.L., Lü, Z.L.: A Comprehensive Dictionary of Official Title System in Imperial China. Beijing Publishing House, Beijing (張政烺,呂宗力:中國歷代官制大辭典.北京出版社,北京) (1994). (in Chinese)
Xu, L.D.: A Dictionary of Chinese Official Title System. Shanghai University Press, Shanghai (徐連達: 中國官制大辭典.上海大學出版社,上海) (2010). (in Chinese)
Yu, S.W., Duan, H.M., Zhu, X.F., Swen, B., Chang, B.B.: Specification for Corpus Processing at Peking University: Word Segmentation, POS Tagging and Phonetic Notation. Journal of Chinese Language and Computing 13(2), 121–158 (2003). (in Chinese)
Wei, P.C., Thompson, P.M., Liu, C.H., Huang, C.R., Sun, C.F.: Historical Corpora for Synchronic and Diachronic Linguistics Studies. International Journal of Computational Linguistics and Chinese Language Processing 2(1), 131–145 (1997). (in Chinese)
Academia Sinica Tagged Corpus of Early Mandarin Chinese. http://app.sinica.edu.tw/cgi-bin/kiwi/pkiwi/kiwi.sh
Academia Sinica Ancient Chinese Corpus. http://app.sinica.edu.tw/cgi-bin/kiwi/akiwi/kiwi.sh
Xiong, D., Lu, Q., Lo, F.J., Shi, D.X., Chiu, T.S.: A Corpus-Based Study of Personal Names and Terms of Address in Chinese Classical Novels. Journal of Chinese Information Processing (to be published). (in Chinese)
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289. Morgan Kaufmann, San Francisco (2001)
Wick, M., Rohanimanesh, K., Culotta, A., McCallum, A.: SampleRank: Learning Preferences from Atomic Gradients. In: Neural Information Processing Systems (NIPS), Workshop on Advances in Ranking (2009)
Wallach, H.: Efficient Training of Conditional Random Fields. In: Proc. 6th Annual CLUK Research Colloquium (2002)
McCallum, A., Schultz, K., Singh, S.: FACTORIE: Probabilistic Programming via Imperatively Defined Factor Graphs. In: Advances in Neural Information Processing Systems 22 (NIPS 2009 Proceedings), pp. 1249–1257 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Xiong, D., Xu, J., Lu, Q., Lo, F. (2014). Recognition and Extraction of Honorifics in Chinese Diachronic Corpora. In: Su, X., He, T. (eds) Chinese Lexical Semantics. CLSW 2014. Lecture Notes in Computer Science(), vol 8922. Springer, Cham. https://doi.org/10.1007/978-3-319-14331-6_31
Download citation
DOI: https://doi.org/10.1007/978-3-319-14331-6_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14330-9
Online ISBN: 978-3-319-14331-6
eBook Packages: Computer ScienceComputer Science (R0)