Abstract
To address the dating problem of ancient Chinese texts, this paper proposes to model the text with RoBERTa(Robustly Optimized BERT Pretraining Approach) model, fully learn the contextual information of the text and combine with the ancient Chinese pre-training model for training. The experiments demonstrated the effectiveness of RoBERTa model for the chronological classification of ancient Chinese, and the accuracy of the classification reaches 93.98%. Our work can subsequently help researchers of ancient Chinese to perform automatic dating of ancient Chinese.
Supported by Social Science Foundation Project of Beijing (Grant No. 18YYB003). The corresponding author is Prof. Wei Huangfu.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Dobson, W.A.C.H.: Authenticating and dating archaic Chinese texts. Toung Pao (Second Ser.) 53(4–5), 233–242 (1967)
Lihua, S., Yuqin, L.: Rule-based automatic category application on text category. J. Chin. Inf. Process. 18(4), 9–14 (2004)
Maron, M.E.: Automatic indexing: an experimental inquiry. J. ACM 8(3), 404–417 (1961)
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features (1998)
McCallum, A., Nigam, K.: A comparison of event models for Naive Bayes text classification (1998)
Bin, L., Tiejun, H., Jun, C., Wen, G.: A new statistical-based method in automatic text classification. J. Chin. Inf. Process. 16(6), 19 (2002)
Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751, Doha, Qatar, October 2014
Tang, D., Qin, B., Liu, T.: Document modeling with gated recurrent neural network for sentiment classification. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1422–1432, Lisbon, Portugal, September 2015
Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM networks. In: Proceedings of the IEEE International Joint Conference on Neural Networks, vol. 4, pp. 2047–2052 (2005)
Vaswani, A., et al.: Attention is all you need. arXiv (2017)
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186, Minneapolis, Minnesota, June 2019
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach (2019)
Xuejin, Y., Huangfu, W.: A machine learning model for the dating of ancient Chinese texts. In: Proceedings of the 2019 International Conference on Asian Language Processing, pp. 115–120 (2019)
Tan, Y.: GuwenBERT: a pre-trained language model for classical Chinese (literary Chinese). https://github.com/Ethan-yt/guwenbert(2021-07-06)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Li, M., Qin, Y., Huangfu, W. (2023). RoBERTa: An Efficient Dating Method of Ancient Chinese Texts. In: Su, Q., Xu, G., Yang, X. (eds) Chinese Lexical Semantics. CLSW 2022. Lecture Notes in Computer Science(), vol 13496. Springer, Cham. https://doi.org/10.1007/978-3-031-28956-9_23
Download citation
DOI: https://doi.org/10.1007/978-3-031-28956-9_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28955-2
Online ISBN: 978-3-031-28956-9
eBook Packages: Computer ScienceComputer Science (R0)