Paper
27 March 2001 Data mining of text as a tool in authorship attribution
Ari J. E. Visa, Jarmo Toivonen, Sami Autio, Jarno Maekinen, Barbro Back, Hannu Vanharanta
Author Affiliations +
Abstract
It is common that text documents are characterized and classified by keywords that the authors use to give them. Visa et al. have developed a new methodology based on prototype matching. The prototype is an interesting document or a part of an extracted, interesting text. This prototype is matched with the document database of the monitored document flow. The new methodology is capable of extracting the meaning of the document in a certain degree. Our claim is that the new methodology is also capable of authenticating the authorship. To verify this claim two tests were designed. The test hypothesis was that the words and the word order in the sentences could authenticate the author. In the first test three authors were selected. The selected authors were William Shakespeare, Edgar Allan Poe, and George Bernard Shaw. Three texts from each author were examined. Every text was one by one used as a prototype. The two nearest matches with the prototype were noted. The second test uses the Reuters-21578 financial news database. A group of 25 short financial news reports from five different authors are examined. Our new methodology and the interesting results from the two tests are reported in this paper. In the first test, for Shakespeare and for Poe all cases were successful. For Shaw one text was confused with Poe. In the second test the Reuters-21578 financial news were identified by the author relatively well. The resolution is that our text mining methodology seems to be capable of authorship attribution.
© (2001) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Ari J. E. Visa, Jarmo Toivonen, Sami Autio, Jarno Maekinen, Barbro Back, and Hannu Vanharanta "Data mining of text as a tool in authorship attribution", Proc. SPIE 4384, Data Mining and Knowledge Discovery: Theory, Tools, and Technology III, (27 March 2001); https://doi.org/10.1117/12.421068
Lens.org Logo
CITATIONS
Cited by 8 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Prototyping

Quantization

Data mining

Databases

Mining

Computer programming

Meteorology

RELATED CONTENT


Back to Top