Abstract:
A novel approach in this research is to use a combination of a classical method - the chi-square test and a machine learning technique - data clustering for revealing sta...Show MoreMetadata
Abstract:
A novel approach in this research is to use a combination of a classical method - the chi-square test and a machine learning technique - data clustering for revealing statistically significant differences in word length distribution and word distribution of authorial styles. The chi-square test has been tested for differentiating texts of different historical periods. The research has proved that at 0.5; 0.15; 0.1; 0.05; 0.025; 0.01; 0.001 significance levels the differences between literary texts “Robinson Crusoe” by Daniel Defoe (XVIII th century) and “Me before You” by Pauline Sara Jo Moyes (XXI th century) are statistically significant. The calculations have been done in Python. The level of test validity is 95%. The author identification is based on word length distribution. The use of data clustering technique is also efficient for authorial styles identification. The results of the research have shown that the authorial style is similar in literary works “The Furnished Room” and “The Last Leaf” by 0. Henry. The research may be practically applied in judiciary.
Published in: 2023 IEEE 18th International Conference on Computer Science and Information Technologies (CSIT)
Date of Conference: 19-21 October 2023
Date Added to IEEE Xplore: 27 November 2023
ISBN Information: