ABSTRACT
In this paper, we explore the authorship attribution of The Golden Lotus using the traditional machine learning method of text classification. There are four candidate authors: Shizhen Wang, Wei Xu, Kaixian Li and Zhideng Wang. We choose The Golden Lotus's poems and four candidate authors' poems as data set. According to the characteristics of Chinese ancient poem, we choose Chinese character, rhyme, genre and overlapped word as features. We use six supervised machine learning algorithms, including Logistic Regression, Random Forests, Decision Tree and Naive Bayes, SVM and KNN classifiers respectively for text binary classification and multi-classification. According to two experiments results, the style of writing of Wei Xu's poems is the most similar to that of The Golden Lotus. It is proved that among four authors, Wei Xu most likely be the author of The Golden Lotus.
- Ðlker Nadi Bozkurt, Özgür Bağlioğlu, Erkan Uyar. Authorship Attribution Performance of various features and classification methods. ACIJ.2013.Google Scholar
- Mendenhall T C. The characteristic curves of composition{J}. Science, 1887: 237--246.Google Scholar
- Yule G U. On sentence-length as a statistical characteristic of style in prose: With application to two cases of disputed authorship{J}. Biometrika, 1939: 363--390.Google Scholar
- Jianjun Shi. The Author Attribution of a Dream of Red Mansions Based on SVM. Journal of A Dream of Red Mansions.2005Google Scholar
- Hassan F H. Chaurasia M A. Author assertion of furtive write print using character n-grams{C}/ /International Conference on Future Information Technology IPCSIT. Singapore: IACSIT PRESS, 2011: 212--216.Google Scholar
- Gamon M. Linguistic correlates of style: Authorship classification with deep linguistic analysis features{C}/ /Proceedings of the 20th International Conference on Computational Linguistics. Strouds-burg: Association for Computational Linguistics, 2004: 611--617. Google ScholarDigital Library
- Shen Li, Zhe Zhao, Renfen Hu, Wensi Li, TaoLiu, Xiaoyong Du. Analogical Reasoning on Chinese Morphological and SemanticRelations, ACL 2018Google Scholar
- Diederich Joachim, Kindermenn Jörg, Leopold Edda, and Pass Gerhard. Authorship attribution with Support Vector Machines". Applied Intelligence. 2003 pp.109--123. Google ScholarDigital Library
- Pattern Recognition. Wikipedia.http://en.wikipedia.org/wiki/Pattern_recognitionGoogle Scholar
- Fanjun Bu, Improvement of KNN and Its Application to Text Classification{D}. Jiangnan University, 2009Google Scholar
- Tianjiu Xiao, Ying Liu. A Stylistic Analysis of Jin Yong's and Gu Long's Fictions Based on Text Clustering and Classification{J}. Journal of Chinese Information Processing, 2015, 29(5):167--177.Google Scholar
- Benzhen Ou. Research on Author Style of the Dream of the Red Chamber from the Contemporary Writingology{D}. Sichuan Normal University, 2007.Google Scholar
- Sanderson J. and Simon G., "Short Text Authorship Attribution via Sequence Kernels, Markov Chains and Author Unmasking: An Investigation".Google Scholar
- Jianping Xu. The study of The Golden Lotus's author for 80 years. Hebei Academic Journa.2004(1).Google Scholar
- D. I. Holmes, "Authorship attribution," Computers and the Humanities, vol. 28, no. 2, pp. 87--106, 1994.Google ScholarCross Ref
- G. Avneri, S. Argamon, M. Koppel: Routing documents according to their style. Intl. Workshop on Innovative Internet Information Systems, 1998.Google Scholar
- Qi Ruihua, Huo Yuehong, Hu Runbo: Review on text authorship identification{J}. Library and Information Service 2015, 59(16):143--148.Google Scholar
Index Terms
- Authorship Attribution of The Golden Lotus Based on Text Classification Methods
Recommendations
Naïve Bayes classifiers for authorship attribution of Arabic texts
Authorship attribution is the process of assigning an author to an anonymous text based on writing characteristics. Several authorship attribution methods were developed for natural languages, such as English, Chinese and Dutch. However, the number of ...
Chinese text classification by the Naïve Bayes Classifier and the associative classifier with multiple confidence threshold values
Each type of classifier has its own advantages as well as certain shortcomings. In this paper, we take the advantages of the associative classifier and the Naive Bayes Classifier to make up the shortcomings of each other, thus improving the accuracy of ...
Boosting to correct inductive bias in text classification
CIKM '02: Proceedings of the eleventh international conference on Information and knowledge managementThis paper studies the effects of boosting in the context of different classification methods for text categorization, including Decision Trees, Naive Bayes, Support Vector Machines (SVMs) and a Rocchio-style classifier. We identify the inductive biases ...
Comments