Plagiarism Detection in Armenian Texts Using Intrinsic Stylometric Analysis

Yeshilbashian, Ye. M.; Asatryan, A. A.; Ghukasyan, Ts. G.

doi:10.1134/S0361768822070039

Plagiarism Detection in Armenian Texts Using Intrinsic Stylometric Analysis

Published: 29 November 2022

Volume 48, pages 435–444, (2022)
Cite this article

Programming and Computer Software Aims and scope Submit manuscript

93 Accesses
Explore all metrics

Abstract

In this paper, we study the application of intrinsic stylometric methods to the task of plagiarism detection in Armenian texts. We use two task setups—style change detection and style breach detection—from PAN’s series of conferences on text forensics and stylometry. For these tasks, we generate synthetic test sets for texts of three genres (academic, literary, and news) and then use them to evaluate the effectiveness of hierarchical clustering and other relevant models presented at PAN conferences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamics of Style and the Case of the Diario Postumo by Eugenio Montale: A Quantitative Approach

On the use of character n-grams as the only intrinsic evidence of plagiarism

Article 31 January 2019

Overview of PAN 2022: Authorship Verification, Profiling Irony and Stereotype Spreaders, and Style Change Detection

Notes

REFERENCES

Kestemont, M., Tschuggnall, M., Stamatatos, E., Daelemans, W., Specht, G., Stein, B., and Potthast, M., Overview of the author identification task at PAN 2018: Cross-domain authorship attribution and style change detection, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2018, vol. 2125.
Zangerle, E., Tschuggnall, M., Specht, G., Potthast, M., and Stein, B., Overview of the style change detection task at PAN 2019, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2019, vol. 2380.
Tschuggnall, M., Stamatatos, E., Verhoeven, B., Daelemans, W., Specht, G., Stein, B., and Potthast, M., Overview of the author identification task at PAN 2017: Style breach detection and author clustering, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2017, vol. 1866.
Rosso, P., Rangel, F., Potthast, M., Stamatatos, E., Tschuggnall, M., and Stein, B., Overview of PAN 2016 – New challenges for authorship analysis: Cross-genre profiling, clustering, diarization, and obfuscation, Lect. Notes Comput. Sci., 2016, vol. 9822, pp. 332–350.
Article Google Scholar
Nath, S., Style change detection by threshold based and window merge clustering methods, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2019, vol. 2380.
Zlatkova, D., Kopev, D., Mitov, K., Atanasov, A., Hardalov, M., Koychev, I., and Nakov, P., An ensemble-rich multi-aspect approach for robust style change detection – Notebook for PAN at CLEF 2018, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2018, vol. 2125.
Hosseinia, M. and Mukherjee, A., A parallel hierarchical attention network for style change detection – Notebook for PAN at CLEF 2018, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2018, vol. 2125.
Safin, K. and Ogaltsov, A., Detecting a change of style using text statistics – Notebook for PAN at CLEF 2018, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2018, vol. 2125.
Karaś, D., Śpiewak, M., and Sobecki, P., OPI-JSA at CLEF 2017: Author clustering and style breach detection – Notebook for PAN at CLEF 2017, Work. Notes Conf. Labs Eval. Forum (CLEF), 2017, vol. 1866.
Khan, J.A., Style breach detection: An unsupervised detection model – Notebook for PAN at CLEF 2017, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2017, vol. 1866.
Safin, K. and Kuznetsova, R., Style breach detection with neural sentence embeddings – Notebook for PAN at CLEF 2017, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2017, vol. 1866.
Gómez-Adorno, H., Alemán, Yu., Ayala, D.V., Sanchez-Perez, M.A., Pinto, D., and Sidorov, G., Author clustering using hierarchical clustering analysis – Notebook for PAN at CLEF 2017, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2017, vol. 1866.
García-Mondeja, Ya., Castro-Castro, D., Lavielle-Castro, V., and Muñoz, R., Discovering author groups using a B-compact graph-based clustering – Notebook for PAN at CLEF 2017, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2017, vol. 1866.
Kocher, M. and Savoy, J., UniNE at CLEF 2017: Author clustering – Notebook for PAN at CLEF 2017, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2017, vol. 1866.
Farkhund, I., Binsalleeh, H., Fung, B.C.M., and Debbabi, M., Mining writeprints from anonymous e-mails for forensic investigation, Digital Invest., 2010, vol. 7, nos. 1–2, pp. 56–64.
Article Google Scholar
Chaoyuan, Z., Zhao, Yu, and Banerjee, R., Style change detection with feed-forward neural networks, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2019, vol. 2125.
Graeme, H. and Feiguina, O., Bigrams of syntactic labels for authorship discrimination of short texts, Lit. Linguist. Comput., 2007, vol. 22, no. 4, pp. 405–417.
Article Google Scholar
Dewang, R.K. and Singh, A.K., Identification of fake reviews using new set of lexical and syntactic features, Proc. 6th Int. Conf. Computer and Communication Technology (ICCCT), 2015, pp. 115–119.
Zhao, C., Song, W., Liu, L., Du, C., and Zhao, X., Research on author identification based on deep syntactic features, Proc. 10th Int. Symp. Computational Intelligence and Design (ISCID), 2017, pp. 276–279.
Avetisyan, K. and Ghukasyan, T., Word embeddings for the Armenian language: Intrinsic and extrinsic evaluation, Bull. Russ.-Arm. Univ.: Phys.-Math. Nat. Sci., 2019, no. 1, pp. 59–72.
Flurin, G., Using hashtags and POS-tags for author profiling, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2019, vol. 2125.

Download references

ACKNOWLEDGMENTS

We thank Ya.R. Nedumov, K.A. Skorniakov, and D.Yu. Turdakov for insightful feedback and discussions.

Author information

Authors and Affiliations

Russian-Armenian University, ul. Ovsepa Emina 123, 119991, Yerevan, Republic of Armenia
Ye. M. Yeshilbashian, A. A. Asatryan & Ts. G. Ghukasyan

Authors

Ye. M. Yeshilbashian
View author publications
You can also search for this author in PubMed Google Scholar
A. A. Asatryan
View author publications
You can also search for this author in PubMed Google Scholar
Ts. G. Ghukasyan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Ye. M. Yeshilbashian, A. A. Asatryan or Ts. G. Ghukasyan.

Ethics declarations

The authors declare that they have no conflicts of interest.

Additional information

Translated by Yu. Kornienko

APPENDIX

1.1 Appendix A

Table 3. List of features used in this study

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yeshilbashian, Y.M., Asatryan, A.A. & Ghukasyan, T.G. Plagiarism Detection in Armenian Texts Using Intrinsic Stylometric Analysis. Program Comput Soft 48, 435–444 (2022). https://doi.org/10.1134/S0361768822070039

Download citation

Received: 05 July 2021
Revised: 16 July 2021
Accepted: 22 July 2021
Published: 29 November 2022
Issue Date: December 2022
DOI: https://doi.org/10.1134/S0361768822070039

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions