Abstract
In this paper, we study the application of intrinsic stylometric methods to the task of plagiarism detection in Armenian texts. We use two task setups—style change detection and style breach detection—from PAN’s series of conferences on text forensics and stylometry. For these tasks, we generate synthetic test sets for texts of three genres (academic, literary, and news) and then use them to evaluate the effectiveness of hierarchical clustering and other relevant models presented at PAN conferences.
Similar content being viewed by others
REFERENCES
Kestemont, M., Tschuggnall, M., Stamatatos, E., Daelemans, W., Specht, G., Stein, B., and Potthast, M., Overview of the author identification task at PAN 2018: Cross-domain authorship attribution and style change detection, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2018, vol. 2125.
Zangerle, E., Tschuggnall, M., Specht, G., Potthast, M., and Stein, B., Overview of the style change detection task at PAN 2019, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2019, vol. 2380.
Tschuggnall, M., Stamatatos, E., Verhoeven, B., Daelemans, W., Specht, G., Stein, B., and Potthast, M., Overview of the author identification task at PAN 2017: Style breach detection and author clustering, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2017, vol. 1866.
Rosso, P., Rangel, F., Potthast, M., Stamatatos, E., Tschuggnall, M., and Stein, B., Overview of PAN 2016 – New challenges for authorship analysis: Cross-genre profiling, clustering, diarization, and obfuscation, Lect. Notes Comput. Sci., 2016, vol. 9822, pp. 332–350.
Nath, S., Style change detection by threshold based and window merge clustering methods, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2019, vol. 2380.
Zlatkova, D., Kopev, D., Mitov, K., Atanasov, A., Hardalov, M., Koychev, I., and Nakov, P., An ensemble-rich multi-aspect approach for robust style change detection – Notebook for PAN at CLEF 2018, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2018, vol. 2125.
Hosseinia, M. and Mukherjee, A., A parallel hierarchical attention network for style change detection – Notebook for PAN at CLEF 2018, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2018, vol. 2125.
Safin, K. and Ogaltsov, A., Detecting a change of style using text statistics – Notebook for PAN at CLEF 2018, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2018, vol. 2125.
Karaś, D., Śpiewak, M., and Sobecki, P., OPI-JSA at CLEF 2017: Author clustering and style breach detection – Notebook for PAN at CLEF 2017, Work. Notes Conf. Labs Eval. Forum (CLEF), 2017, vol. 1866.
Khan, J.A., Style breach detection: An unsupervised detection model – Notebook for PAN at CLEF 2017, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2017, vol. 1866.
Safin, K. and Kuznetsova, R., Style breach detection with neural sentence embeddings – Notebook for PAN at CLEF 2017, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2017, vol. 1866.
Gómez-Adorno, H., Alemán, Yu., Ayala, D.V., Sanchez-Perez, M.A., Pinto, D., and Sidorov, G., Author clustering using hierarchical clustering analysis – Notebook for PAN at CLEF 2017, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2017, vol. 1866.
García-Mondeja, Ya., Castro-Castro, D., Lavielle-Castro, V., and Muñoz, R., Discovering author groups using a B-compact graph-based clustering – Notebook for PAN at CLEF 2017, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2017, vol. 1866.
Kocher, M. and Savoy, J., UniNE at CLEF 2017: Author clustering – Notebook for PAN at CLEF 2017, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2017, vol. 1866.
Farkhund, I., Binsalleeh, H., Fung, B.C.M., and Debbabi, M., Mining writeprints from anonymous e-mails for forensic investigation, Digital Invest., 2010, vol. 7, nos. 1–2, pp. 56–64.
Chaoyuan, Z., Zhao, Yu, and Banerjee, R., Style change detection with feed-forward neural networks, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2019, vol. 2125.
Graeme, H. and Feiguina, O., Bigrams of syntactic labels for authorship discrimination of short texts, Lit. Linguist. Comput., 2007, vol. 22, no. 4, pp. 405–417.
Dewang, R.K. and Singh, A.K., Identification of fake reviews using new set of lexical and syntactic features, Proc. 6th Int. Conf. Computer and Communication Technology (ICCCT), 2015, pp. 115–119.
Zhao, C., Song, W., Liu, L., Du, C., and Zhao, X., Research on author identification based on deep syntactic features, Proc. 10th Int. Symp. Computational Intelligence and Design (ISCID), 2017, pp. 276–279.
Avetisyan, K. and Ghukasyan, T., Word embeddings for the Armenian language: Intrinsic and extrinsic evaluation, Bull. Russ.-Arm. Univ.: Phys.-Math. Nat. Sci., 2019, no. 1, pp. 59–72.
Flurin, G., Using hashtags and POS-tags for author profiling, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2019, vol. 2125.
ACKNOWLEDGMENTS
We thank Ya.R. Nedumov, K.A. Skorniakov, and D.Yu. Turdakov for insightful feedback and discussions.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
The authors declare that they have no conflicts of interest.
Additional information
Translated by Yu. Kornienko
APPENDIX
APPENDIX
1.1 Appendix A
Rights and permissions
About this article
Cite this article
Yeshilbashian, Y.M., Asatryan, A.A. & Ghukasyan, T.G. Plagiarism Detection in Armenian Texts Using Intrinsic Stylometric Analysis. Program Comput Soft 48, 435–444 (2022). https://doi.org/10.1134/S0361768822070039
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S0361768822070039