skip to main content
10.1145/3319921.3319958acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiciaiConference Proceedingsconference-collections
research-article

Authorship Attribution of The Golden Lotus Based on Text Classification Methods

Published: 15 March 2019 Publication History

Abstract

In this paper, we explore the authorship attribution of The Golden Lotus using the traditional machine learning method of text classification. There are four candidate authors: Shizhen Wang, Wei Xu, Kaixian Li and Zhideng Wang. We choose The Golden Lotus's poems and four candidate authors' poems as data set. According to the characteristics of Chinese ancient poem, we choose Chinese character, rhyme, genre and overlapped word as features. We use six supervised machine learning algorithms, including Logistic Regression, Random Forests, Decision Tree and Naive Bayes, SVM and KNN classifiers respectively for text binary classification and multi-classification. According to two experiments results, the style of writing of Wei Xu's poems is the most similar to that of The Golden Lotus. It is proved that among four authors, Wei Xu most likely be the author of The Golden Lotus.

References

[1]
Ðlker Nadi Bozkurt, Özgür Bağlioğlu, Erkan Uyar. Authorship Attribution Performance of various features and classification methods. ACIJ.2013.
[2]
Mendenhall T C. The characteristic curves of composition{J}. Science, 1887: 237--246.
[3]
Yule G U. On sentence-length as a statistical characteristic of style in prose: With application to two cases of disputed authorship{J}. Biometrika, 1939: 363--390.
[4]
Jianjun Shi. The Author Attribution of a Dream of Red Mansions Based on SVM. Journal of A Dream of Red Mansions.2005
[5]
Hassan F H. Chaurasia M A. Author assertion of furtive write print using character n-grams{C}/ /International Conference on Future Information Technology IPCSIT. Singapore: IACSIT PRESS, 2011: 212--216.
[6]
Gamon M. Linguistic correlates of style: Authorship classification with deep linguistic analysis features{C}/ /Proceedings of the 20th International Conference on Computational Linguistics. Strouds-burg: Association for Computational Linguistics, 2004: 611--617.
[7]
Shen Li, Zhe Zhao, Renfen Hu, Wensi Li, TaoLiu, Xiaoyong Du. Analogical Reasoning on Chinese Morphological and SemanticRelations, ACL 2018
[8]
Diederich Joachim, Kindermenn Jörg, Leopold Edda, and Pass Gerhard. Authorship attribution with Support Vector Machines". Applied Intelligence. 2003 pp.109--123.
[9]
Pattern Recognition. Wikipedia.http://en.wikipedia.org/wiki/Pattern_recognition
[10]
Fanjun Bu, Improvement of KNN and Its Application to Text Classification{D}. Jiangnan University, 2009
[11]
Tianjiu Xiao, Ying Liu. A Stylistic Analysis of Jin Yong's and Gu Long's Fictions Based on Text Clustering and Classification{J}. Journal of Chinese Information Processing, 2015, 29(5):167--177.
[12]
Benzhen Ou. Research on Author Style of the Dream of the Red Chamber from the Contemporary Writingology{D}. Sichuan Normal University, 2007.
[13]
Sanderson J. and Simon G., "Short Text Authorship Attribution via Sequence Kernels, Markov Chains and Author Unmasking: An Investigation".
[14]
Jianping Xu. The study of The Golden Lotus's author for 80 years. Hebei Academic Journa.2004(1).
[15]
D. I. Holmes, "Authorship attribution," Computers and the Humanities, vol. 28, no. 2, pp. 87--106, 1994.
[16]
G. Avneri, S. Argamon, M. Koppel: Routing documents according to their style. Intl. Workshop on Innovative Internet Information Systems, 1998.
[17]
Qi Ruihua, Huo Yuehong, Hu Runbo: Review on text authorship identification{J}. Library and Information Service 2015, 59(16):143--148.

Cited By

View all
  • (2024)Authorship Attribution for English Short TextsEngineering, Technology & Applied Science Research10.48084/etasr.830214:5(16419-16426)Online publication date: 9-Oct-2024
  • (2024)Automatic authorship attribution in Albanian textsPLOS ONE10.1371/journal.pone.031005719:10(e0310057)Online publication date: 22-Oct-2024
  • (2024)Combining Convolutional Neural Networks and Random Forest for Lotus Multi-Classification2024 International Conference on Automation and Computation (AUTOCOM)10.1109/AUTOCOM60220.2024.10486155(20-23)Online publication date: 14-Mar-2024
  • Show More Cited By

Index Terms

  1. Authorship Attribution of The Golden Lotus Based on Text Classification Methods

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICIAI '19: Proceedings of the 2019 3rd International Conference on Innovation in Artificial Intelligence
    March 2019
    279 pages
    ISBN:9781450361286
    DOI:10.1145/3319921
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    • Xi'an Jiaotong-Liverpool University: Xi'an Jiaotong-Liverpool University
    • University of Texas-Dallas: University of Texas-Dallas

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 March 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Authorship attribution
    2. The Golden Lotus
    3. machine learning
    4. text classification

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICIAI 2019

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 15 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Authorship Attribution for English Short TextsEngineering, Technology & Applied Science Research10.48084/etasr.830214:5(16419-16426)Online publication date: 9-Oct-2024
    • (2024)Automatic authorship attribution in Albanian textsPLOS ONE10.1371/journal.pone.031005719:10(e0310057)Online publication date: 22-Oct-2024
    • (2024)Combining Convolutional Neural Networks and Random Forest for Lotus Multi-Classification2024 International Conference on Automation and Computation (AUTOCOM)10.1109/AUTOCOM60220.2024.10486155(20-23)Online publication date: 14-Mar-2024
    • (2023)Machine Learning for Ancient Languages: A SurveyComputational Linguistics10.1162/coli_a_0048149:3(703-747)Online publication date: 1-Sep-2023
    • (2023)Albanian Authorship Attribution Model2023 12th Mediterranean Conference on Embedded Computing (MECO)10.1109/MECO58584.2023.10155046(1-5)Online publication date: 6-Jun-2023
    • (2022)Inter country poetry classification using Topic modeling2022 First International Conference on Artificial Intelligence Trends and Pattern Recognition (ICAITPR)10.1109/ICAITPR51569.2022.9844213(1-6)Online publication date: 10-Mar-2022
    • (2021)Экспертная, журнальная и автоматическая классификация полных текстов и аннотаций научных статейEXPERT, JOURNAL AND AUTOMATIC CLASSIFICATION OF FULL TEXTS AND ANNOTATIONS OF SCIENTIFIC ARTICLESНаучно-техническая информация. Серия 2: Информационные процессы и системы10.36535/0548-0027-2021-08-3(15-27)Online publication date: 2021
    • (2021)Expert, Journal, and Automatic Classification of Full Texts and Annotations of Scientific ArticlesAutomatic Documentation and Mathematical Linguistics10.3103/S000510552104007555:4(178-189)Online publication date: 1-Jul-2021
    • (2021)Authorship Attribution for a Resource Poor Language—UrduACM Transactions on Asian and Low-Resource Language Information Processing10.1145/348706121:3(1-23)Online publication date: 13-Dec-2021
    • (2021)Authorship Identification of Electronic TextsIEEE Access10.1109/ACCESS.2021.30981929(101124-101146)Online publication date: 2021

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media