Loading [a11y]/accessibility-menu.js
The Chi-square Test and Data Clustering Combined for Author Identification | IEEE Conference Publication | IEEE Xplore

The Chi-square Test and Data Clustering Combined for Author Identification


Abstract:

A novel approach in this research is to use a combination of a classical method - the chi-square test and a machine learning technique - data clustering for revealing sta...Show More

Abstract:

A novel approach in this research is to use a combination of a classical method - the chi-square test and a machine learning technique - data clustering for revealing statistically significant differences in word length distribution and word distribution of authorial styles. The chi-square test has been tested for differentiating texts of different historical periods. The research has proved that at 0.5; 0.15; 0.1; 0.05; 0.025; 0.01; 0.001 significance levels the differences between literary texts “Robinson Crusoe” by Daniel Defoe (XVIII th century) and “Me before You” by Pauline Sara Jo Moyes (XXI th century) are statistically significant. The calculations have been done in Python. The level of test validity is 95%. The author identification is based on word length distribution. The use of data clustering technique is also efficient for authorial styles identification. The results of the research have shown that the authorial style is similar in literary works “The Furnished Room” and “The Last Leaf” by 0. Henry. The research may be practically applied in judiciary.
Date of Conference: 19-21 October 2023
Date Added to IEEE Xplore: 27 November 2023
ISBN Information:

ISSN Information:

Conference Location: Lviv, Ukraine

References

References is not available for this document.