Skip to main content
Log in

Plagiarism Detection in Armenian Texts Using Intrinsic Stylometric Analysis

  • Published:
Programming and Computer Software Aims and scope Submit manuscript

Abstract

In this paper, we study the application of intrinsic stylometric methods to the task of plagiarism detection in Armenian texts. We use two task setups—style change detection and style breach detection—from PAN’s series of conferences on text forensics and stylometry. For these tasks, we generate synthetic test sets for texts of three genres (academic, literary, and news) and then use them to evaluate the effectiveness of hierarchical clustering and other relevant models presented at PAN conferences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.

Similar content being viewed by others

Notes

  1. https://github.com/ivannikov-lab/style-change-analysis

  2. http://ufal.mff.cuni.cz/udpipe/2

  3. https://stanfordnlp.github.io/stanza/

  4. http://etd.asj-oa.am/

  5. http://www.encyclopedia.am/

  6. https://lib.armedu.am/

  7. https://newshub.am/

REFERENCES

  1. Kestemont, M., Tschuggnall, M., Stamatatos, E., Daelemans, W., Specht, G., Stein, B., and Potthast, M., Overview of the author identification task at PAN 2018: Cross-domain authorship attribution and style change detection, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2018, vol. 2125.

  2. Zangerle, E., Tschuggnall, M., Specht, G., Potthast, M., and Stein, B., Overview of the style change detection task at PAN 2019, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2019, vol. 2380.

  3. Tschuggnall, M., Stamatatos, E., Verhoeven, B., Daelemans, W., Specht, G., Stein, B., and Potthast, M., Overview of the author identification task at PAN 2017: Style breach detection and author clustering, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2017, vol. 1866.

  4. Rosso, P., Rangel, F., Potthast, M., Stamatatos, E., Tschuggnall, M., and Stein, B., Overview of PAN 2016 – New challenges for authorship analysis: Cross-genre profiling, clustering, diarization, and obfuscation, Lect. Notes Comput. Sci., 2016, vol. 9822, pp. 332–350.

    Article  Google Scholar 

  5. Nath, S., Style change detection by threshold based and window merge clustering methods, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2019, vol. 2380.

  6. Zlatkova, D., Kopev, D., Mitov, K., Atanasov, A., Hardalov, M., Koychev, I., and Nakov, P., An ensemble-rich multi-aspect approach for robust style change detection – Notebook for PAN at CLEF 2018, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2018, vol. 2125.

  7. Hosseinia, M. and Mukherjee, A., A parallel hierarchical attention network for style change detection – Notebook for PAN at CLEF 2018, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2018, vol. 2125.

  8. Safin, K. and Ogaltsov, A., Detecting a change of style using text statistics – Notebook for PAN at CLEF 2018, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2018, vol. 2125.

  9. Karaś, D., Śpiewak, M., and Sobecki, P., OPI-JSA at CLEF 2017: Author clustering and style breach detection – Notebook for PAN at CLEF 2017, Work. Notes Conf. Labs Eval. Forum (CLEF), 2017, vol. 1866.

  10. Khan, J.A., Style breach detection: An unsupervised detection model – Notebook for PAN at CLEF 2017, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2017, vol. 1866.

  11. Safin, K. and Kuznetsova, R., Style breach detection with neural sentence embeddings – Notebook for PAN at CLEF 2017, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2017, vol. 1866.

  12. Gómez-Adorno, H., Alemán, Yu., Ayala, D.V., Sanchez-Perez, M.A., Pinto, D., and Sidorov, G., Author clustering using hierarchical clustering analysis – Notebook for PAN at CLEF 2017, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2017, vol. 1866.

  13. García-Mondeja, Ya., Castro-Castro, D., Lavielle-Castro, V., and Muñoz, R., Discovering author groups using a B-compact graph-based clustering – Notebook for PAN at CLEF 2017, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2017, vol. 1866.

  14. Kocher, M. and Savoy, J., UniNE at CLEF 2017: Author clustering – Notebook for PAN at CLEF 2017, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2017, vol. 1866.

  15. Farkhund, I., Binsalleeh, H., Fung, B.C.M., and Debbabi, M., Mining writeprints from anonymous e-mails for forensic investigation, Digital Invest., 2010, vol. 7, nos. 1–2, pp. 56–64.

    Article  Google Scholar 

  16. Chaoyuan, Z., Zhao, Yu, and Banerjee, R., Style change detection with feed-forward neural networks, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2019, vol. 2125.

  17. Graeme, H. and Feiguina, O., Bigrams of syntactic labels for authorship discrimination of short texts, Lit. Linguist. Comput., 2007, vol. 22, no. 4, pp. 405–417.

    Article  Google Scholar 

  18. Dewang, R.K. and Singh, A.K., Identification of fake reviews using new set of lexical and syntactic features, Proc. 6th Int. Conf. Computer and Communication Technology (ICCCT), 2015, pp. 115–119.

  19. Zhao, C., Song, W., Liu, L., Du, C., and Zhao, X., Research on author identification based on deep syntactic features, Proc. 10th Int. Symp. Computational Intelligence and Design (ISCID), 2017, pp. 276–279.

  20. Avetisyan, K. and Ghukasyan, T., Word embeddings for the Armenian language: Intrinsic and extrinsic evaluation, Bull. Russ.-Arm. Univ.: Phys.-Math. Nat. Sci., 2019, no. 1, pp. 59–72.

  21. Flurin, G., Using hashtags and POS-tags for author profiling, Work. Notes Conf. Labs Eval. Forum (CLEF), CEUR Workshop Proc., 2019, vol. 2125.

Download references

ACKNOWLEDGMENTS

We thank Ya.R. Nedumov, K.A. Skorniakov, and D.Yu. Turdakov for insightful feedback and discussions.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Ye. M. Yeshilbashian, A. A. Asatryan or Ts. G. Ghukasyan.

Ethics declarations

The authors declare that they have no conflicts of interest.

Additional information

Translated by Yu. Kornienko

APPENDIX

APPENDIX

1.1 Appendix A

Table 3. List of features used in this study

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yeshilbashian, Y.M., Asatryan, A.A. & Ghukasyan, T.G. Plagiarism Detection in Armenian Texts Using Intrinsic Stylometric Analysis. Program Comput Soft 48, 435–444 (2022). https://doi.org/10.1134/S0361768822070039

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S0361768822070039

Navigation