Skip to main content
Log in

Automatic plagiarism detection in obfuscated text

  • Theoretical advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Plagiarism is a serious problem in education, research, publishing and other fields. Automatic plagiarism detection systems are crucial for ensuring the integrity and genuineness of intellectual work. There are different types of plagiarism, such as copy–paste, obfuscation and translation. In particular, obfuscated text is one of the hardest types of plagiarism to detect. In this paper, we propose an automatic plagiarism detection system for obfuscated text based on a support vector machine classifier that exploits a set of lexical, syntactic and semantic features. We evaluated the performance of the proposed system on benchmark English and Arabic corpora made available by the PAN Workshop series: PAN 2012, PAN 2013, PAN 2014 and PAN@FIRE2015. We also compared the performance of our system to the performances of other systems that participated in the PAN competitions. The obtained results show that our system had the best performance in terms of the F-measure on the PAN 2012 and on the PAN@FIRE2015 obfuscated sub-corpora, was among the top four on the PAN 2013 corpus and was among the top two on the PAN 2014 corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Abnar S, Dehghani M, Zamani H, Shakery A, (2014) Expanded N-grams for semantic text alignment-notebook for PAN at CLEF. In: Cappellato L, Ferro N, Halvey M, Kraaij W (eds) CLEF 2014 evaluation labs and workshop—working notes papers, 15–18 September, CEUR-WS.org, Sheffield, UK, pp 928–938

  2. Adams R, Nicolae G, Nicolae C, Harabagiu S (2007) Textual entailment through extended lexical overlap and lexico-semantic matching. In: Proceedings of the ACL-PASCAL workshop on textual entailment and paraphrasing, Association for Computational Linguistics, Stroudsburg, PA, USA, RTE ’07, pp 119–124

  3. Al-Sulaiti L, Atwell ES (2006) The design of a corpus of contemporary arabic. Int J Corpus Linguist 11(2):135–171

    Article  Google Scholar 

  4. Alvi F, Stevenson M, Clough P, (2014) Hashing and merging heuristics for text reuse detection-notebook for PAN at CLEF. In: Cappellato L, Ferro N, Halvey M, Kraaij W (eds) CLEF 2014 evaluation labs and workshop—working notes papers, 15–18 September, CEUR-WS.org, Sheffield, UK, pp 939–946

  5. Alzahrani S (2015) Arabic plagiarism detection using word correlation in N-grams with K-overlapping approach—working notes for PAN-AraPlagDet at FIRE 2015. In: FIRE 2015 working notes papers, 4–6 December, Gandhinagar, India

  6. Alzahrani S, Salim N, Abraham A (2012) Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE Trans Syst Man Cybern Part C App Rev 42(2):133–149. https://doi.org/10.1109/TSMCC.2011.2134847

    Article  Google Scholar 

  7. Banea C, Chen D, Mihalcea R, Cardie C, Wiebe J (2014) Simcompass: using deep learning word embeddings to assess cross-level similarity. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014), Association for Computational Linguistics, pp 560–565. https://doi.org/10.3115/v1/S14-2098.URL http://aclweb.org/anthology/S14-2098

  8. Bensalem I, Boukhalfa I, Rosso P, Abouenour L, Darwish K, Chikhi S (2015) Overview of the AraPlagDet PAN@FIRE2015 shared task on Arabic plagiarism detection. FIRE 2015 working notes papers, 4–6 December, Gandhinagar, India, pp 111–122

  9. Billah Nagoudi EM, Khorsi A, Cherroun H, Schwab D (2018) A two-level plagiarism detection system for Arabic documents. Cybern Inf Technol 18(1). https://hal.archives-ouvertes.fr/hal-01706138

  10. Bochkarev VV, Shevlyakova AV, Solovyev VD (2015) The average word length dynamics as an indicator of cultural changes in society. Soc Evol Hist 14(2):153–175

    Google Scholar 

  11. Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:27:1–27:27

    Article  Google Scholar 

  12. Das D, Smith NA (2009) Paraphrase identification as probabilistic quasi-synchronous recognition. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP: vol 1—Volume 1, Association for Computational Linguistics, Stroudsburg, PA, USA, ACL ’09, pp 468–476

  13. Daud A, Khan JA, Nasir JA, Abbasi RA, Aljohani NR, Alowibdi JS (2018) Latent dirichlet allocation and pos tags based method for external plagiarism detection: LDA and POS tags based plagiarism detection. Int J Semant Web Inf Sys 14(3):53–69

    Article  Google Scholar 

  14. de Marneffe MC, MacCartney B, Manning CD (2006) Generating typed dependency parses from phrase structure parses. In: LREC, pp 449–454

  15. Denkowski M, Lavie A (2014) Meteor universal: language specific translation evaluation for any target language. In: Proceedings of the EACL 2014 workshop on statistical machine translation, pp 376–380

  16. Eissen SMZ, Stein B (2006) Intrinsic plagiarism detection. In: Lalmas M, MacFarlane A, Rüger S, Tombros A, Tsikrika T, Yavlinsky A (eds) Advances in information retrieval. Springer, Berlin, pp 565–569

    Chapter  Google Scholar 

  17. Finch A, Hwang YS, Sumita E (2005) Using machine translation evaluation techniques to determine sentence-level semantic equivalence. In: Proceedings of the 3rd international workshop on paraphrasing (IWP2005), pp 17–24

  18. Gharavi E, Veisi H, Rosso P (2019) Scalable and language-independent embedding-based approach for plagiarism detection considering obfuscation type: no training phase. Neural Comput Appl. https://doi.org/10.1007/s00521-019-04594-y

    Article  Google Scholar 

  19. Gillam L (2013) Guess again and see if they line up: surrey’s runs at plagiarism detection—notebook for PAN at CLEF 2013

  20. Gillam L, Notley S (2014) Evaluating robustness for ’IPCRESS’: surrey’s text alignment for plagiarism detection-notebook for PAN at CLEF. In: Cappellato L, Ferro N, Halvey M, Kraaij W (eds) CLEF 2014 evaluation labs and workshop—working notes papers, 15–18 September, CEUR-WS.org, Sheffield, UK, pp 951–957

  21. Gillam L, Newbold N, Cooke N (2012) Educated guesses and equality judgements: using search engines and pairwise match for external plagiarism detection—notebook for PAN at CLEF 2012. http://www.clef-initiative.eu/publication/working-notes. Accessed 10 Nov 2018

  22. Glinos D, (2014) A hybrid architecture for plagiarism detection-notebook for PAN at CLEF. In: Cappellato L, Ferro N, Halvey M, Kraaij W (eds) CLEF 2014 evaluation labs and workshop—working notes papers, 15–18 September, CEUR-WS.org, Sheffield, UK, pp 958–965

  23. Gross P, Modaresi P (2014) Plagiarism alignment detection by merging context seeds-notebook for PAN at CLEF. In: Cappellato L, Ferro N, Halvey M, Kraaij W (eds) CLEF 2014 evaluation labs and workshop—working notes papers, 15–18 September, CEUR-WS.org, Sheffield, UK, pp 966–972

  24. Grozea C, Popescu M (2012) Encoplot—tuned for high recall (also proposing a new plagiarism detection score). In: Forner P, Karlgren J, Womser-Hacker C (eds) CLEF 2012 Evaluation labs and workshop—working notes papers, 17–20 September, Rome, Italy, pp 538–556. http://www.clef-initiative.eu/publication/working-notes. Accessed 9 Apr 2018

  25. Jayapal A (2012) Similarity overlap metric and greedy string tiling at PAN 2012: Plagiarism detection—notebook for PAN at CLEF 2012. http://www.clef-initiative.eu/publication/working-notes. Accessed 24 Jan 2018

  26. Jayapal A, Goswami B (2013) Submission to the 5th international competition on plagiarism detection. From Nuance Communications, USA. http://www.uni-weimar.de/medien/webis/events/pan-13. Accessed 26 Jan 2018

  27. Ji Y, Eisenstein J (2013) Discriminative improvements to distributional sentence similarity. In: Proceedings of the 2013 conference on empirical methods in natural language processing, Association for Computational Linguistics, pp 891–896. http://aclweb.org/anthology/D13-1090. Accessed 20 Apr 2018

  28. Kenter T, de Rijke M (2015) Short text similarity with word embeddings. In: Proceedings of the 24th ACM international on conference on information and knowledge management, ACM, New York, NY, USA, CIKM ’15, pp 1411–1420. https://doi.org/10.1145/2806416.2806475,

  29. Kong L, Qi H, Wang S, Du C, Wang S, Han Y (2012) Approaches for candidate document retrieval and detailed comparison of plagiarism detection—Notebook for PAN at CLEF 2012. http://www.clef-initiative.eu/publication/working-notes. Accessed 25 Nov 2018

  30. Kong L, Qi H, Du C, Wang M, Han Z (2013) Approaches for source retrieval and text alignment of plagiarism detection—notebook for PAN at CLEF 2013

  31. Kong L, Han Y, Han Z, Yu H, Wang Q, Zhang T, Qi H, (2014) Source retrieval based on learning to rank and text alignment based on plagiarism type recognition for plagiarism detection-notebook for PAN at CLEF. In: Cappellato L, Ferro N, Halvey M, Kraaij W (eds) CLEF 2014 evaluation labs and workshop—working notes papers, 15–18 September, CEUR-WS.org, Sheffield, UK, pp 973–976

  32. Küppers R, Conrad S (2012) A set-based approach to plagiarism detection—notebook for PAN at CLEF 2012. http://www.clef-initiative.eu/publication/working-notes. Accessed 25 May 2018

  33. Larock MH, Tressler JC, Lewis CE (1980) Mastering effective English. Copp Clark Pitman, Mississauga

    Google Scholar 

  34. Madnani N, Tetreault J, Chodorow M (2012) Re-examining machine translation metrics for paraphrase identification. In: Proceedings of the 2012 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, Association for Computational Linguistics, Stroudsburg, PA, USA, NAACL HLT ’12, pp 182–190. http://dl.acm.org/citation.cfm?id=2382029.2382055. Accessed 24 Sept 2018

  35. Magooda A, Mahgoub A, Rashwan M, Fayek M (2015) Rdi system for extrinsic plagiarism detection (rdi\_red). FIRE 2015 working notes papers, 4–6 December. Gandhinagar, India, pp 126–128

  36. Maurer HA, Kappe F, Zaka B (2006) Plagiarism-a survey. J UCS 12(8):1050–1084

    Google Scholar 

  37. Mollá D (2003) Towards semantic-based overlap measures for question-answering. In: Proceedings of the Australasian language technology workshop, pp 110–117. http://aclweb.org/anthology/U03-1014. Accessed 3 Oct 2018

  38. Mollá D, Gardiner M (2004) Answerfinder: question answering by combining lexical, syntactic and semantic information. Proc Aus Lang Technol Workshop 2004:9–16

    Google Scholar 

  39. Nourian A (2013) Submission to the 5th international competition on plagiarism detection. http://www.uni-weimar.de/medien/webis/events/pan-13. From the Iran University of Science and Technology. Accessed 11 Dec 2018

  40. Oberreuter G, Eiselt A (2014) Submission to the 6th international competition on plagiarism detection. http://www.uni-weimar.de/medien/webis/events/pan-14. From Innovand.io, Chile. Accssed 21 May 2018

  41. Oberreuter G, Carrillo-Cisneros D, Scherson I, Velásquez J (2012) Submission to the 4th international competition on plagiarism detection. http://www.uni-weimar.de/medien/webis/events/pan-12. From the University of Chile, Chile, and the University of California, USA

  42. Palkovskii Y, Belov A (2012) Applying specific clusterization and fingerprint density distribution with genetic algorithm overall tuning in external plagiarism detection—notebook for PAN at CLEF 2012. http://www.clef-initiative.eu/publication/working-notes. Accessed 24 Nov 2018

  43. Palkovskii Y, Belov A (2013) Using hybrid similarity methods for plagiarism detection—notebook for PAN at CLEF 2013

  44. Palkovskii Y, Belov A (2014) Developing high-resolution universal multi-type N-gram plagiarism detector-notebook for PAN at CLEF. In: Cappellato L, Ferro N, Halvey M, Kraaij W (eds) CLEF 2014 evaluation labs and workshop—working notes papers, 15–18 September, CEUR-WS.org, Sheffield, UK, pp 984–989

  45. Palkovskii Y, Belov A (2015) Submission to AraPlagDet PAN@FIRE2015 shared task on Arabic plagiarism detection. From the Zhytomyr State University and SkyLine LLC

  46. Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics, pp 311–318. http://aclweb.org/anthology/P02-1040. Accessed 2 Jan 2019

  47. Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Empirical methods in natural language processing (EMNLP), pp 1532–1543. http://www.aclweb.org/anthology/D14-1162. Accessed 24 Feb 2018

  48. Potthast M, Barrón-Cedeño A, Eiselt A, Stein B, Rosso P (2010) Overview of the 2nd international competition on plagiarism detection. http://www.clef-initiative.eu/publication/working-notes. Accessed 30 Apr 2018

  49. Potthast M, Stein B, Barrón-Cedeño A, Rosso P (2010) An evaluation framework for plagiarism detection. In: Huang CR, Jurafsky D (eds) 23rd international conference on computational linguistics (COLING 10). Association for Computational Linguistics, Stroudsburg, Pennsylvania, pp 997–1005

  50. Potthast M, Eiselt A, Barrón-Cedeño A, Stein B, Rosso P (2011) Overview of the 3rd international competition on plagiarism detection. http://www.clef-initiative.eu/publication/working-notes. Accessed 15 Mar 2019

  51. Potthast M, Gollub T, Hagen M, Graßegger J, Kiesel J, Michel M, Oberländer A, Tippmann M, Barrón-Cedeño A, Gupta P, Rosso P, Stein B (2012) Overview of the 4th international competition on plagiarism detection. http://www.clef-initiative.eu/publication/working-notes. Accessed 6 Jan 2019

  52. Potthast M, Gollub T, Hagen M, Tippmann M, Kiesel J, Rosso P, Stamatatos E, Stein B (2013) Overview of the 5th international competition on plagiarism detection. In: Forner P, Navigli R, Tufis D (eds) Working notes papers of the CLEF 2013 evaluation labs, pp 301–331. http://www.clef-initiative.eu/publication/working-notes. Accessed 20 Feb 2019

  53. Potthast M, Hagen M, Beyer A, Busse M, Tippmann M, Rosso P, Stein B (2014) Overview of the 6th international competition on plagiarism detection. In: Cappellato L, Ferro N, Halvey M, Kraaij W (eds) Working notes papers of the CLEF 2014 evaluation labs, CLEF and CEUR-WS.org, CEUR workshop proceedings, pp 845–876. http://www.clef-initiative.eu/publication/working-notes. Accessed 7 May 2018

  54. Reginaldo TV, Meireles MRG, Patrocínio ZKG (2019) Evaluating AdaBoost for plagiarism detection. In: Vera-Rodriguez R, Fierrez J, Morales A (eds) Progress in pattern recognition, image analysis, computer vision, and applications. CIARP 2018. Lecture Notes in Computer Science, vol 11401. Springer, pp 865–873

  55. Rodríguez Torrejón D, Martín Ramos J (2012) Detailed comparison module in CoReMo 1.9 Plagiarism Detector—notebook for PAN at CLEF 2012. In: Forner P, Karlgren J, Womser-Hacker C (eds) CLEF 2012 evaluation labs and workshop—working notes papers, 17–20 September, Rome, Italy, pp 1–8. http://www.clef-initiative.eu/publication/working-notes. Accessed 3 Feb 2019

  56. Rodríguez Torrejón D, Martín Ramos J (2013) Text Alignment Module in CoReMo 2.1 Plagiarism Detector—Notebook for PAN at CLEF 2013

  57. Rodríguez Torrejón D, Martín Ramos J (2014) CoReMo 2.3 plagiarism detector text alignment module-notebook for PAN at CLEF. In: Cappellato L, Ferro N, Halvey M, Kraaij W (eds) CLEF 2014 Evaluation labs and workshop—working notes papers, 15–18 September. CEUR-WS.org, Sheffield, UK, pp 997–1003

  58. Sanchez-Perez M, Sidorov G, Gelbukh A (2014) A winning approach to text alignment for text reuse detection at PAN 2014–notebook for PAN at CLEF. In: Cappellato L, Ferro N, Halvey M, Kraaij W (eds) CLEF 2014 evaluation labs and workshop—working notes papers, 15–18 September, CEUR-WS.org, Sheffield, UK, pp 1004–1011

  59. Sánchez-Vega F, y Gómez MM, Villaseñor-Pineda L (2012) Optimized fuzzy text alignment for plagiarism detection—notebook for PAN at CLEF 2012. http://www.clef-initiative.eu/publication/working-notes. Accessed 1 Feb 2019

  60. Sánchez-Vega F, Villatoro-Tello E, Montes-y Gómez M, Rosso P, Stamatatos E, Villaseñor-Pineda L (2019) Paraphrase plagiarism identification with character-level features. Pattern Anal Appl 22(2):669–681. https://doi.org/10.1007/s10044-017-0674-z

    Article  MathSciNet  Google Scholar 

  61. Saremi M, Yaghmaee F (2013) Submission to the 5th international competition on plagiarism detection. http://www.uni-weimar.de/medien/webis/events/pan-13. From Semnan University, Iran. Accessed 20 Dec 2018

  62. Scholkopf B, Smola AJ (2001) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge

    Google Scholar 

  63. Shrestha P, Solorio T (2013) Using a variety of N-grams for the detection of different kinds of plagiarism—notebook for PAN at CLEF 2013

  64. Shrestha P, Maharjan S, Solorio T (2014) Machine translation evaluation metric for text alignment-notebook for PAN at CLEF. In: Cappellato L, Ferro N, Halvey M, Kraaij W (eds) CLEF 2014 evaluation labs and workshop—working notes papers, 15–18 September, CEUR-WS.org, Sheffield, UK, pp 1012–1016

  65. Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: In Proceedings of Association for Machine Translation in the Americas, pp 223–231

  66. Suchomel Š, Kasprzak J, Brandejs M (2012) Three way search engine queries with multi-feature document comparison for plagiarism detection—notebook for PAN at CLEF. In: Forner P, Karlgren J, Womser-Hacker C (eds) CLEF 2012 evaluation labs and workshop—working notes papers, 17–20 September, Rome, Italy, pp 1–12. http://www.clef-initiative.eu/publication/working-notes. Accessed 24 Nov 2018

  67. Suchomel Š, Kasprzak J, Brandejs M (2013) Diverse queries and feature type selection for plagiarism discovery—notebook for PAN at CLEF 2013

  68. Wan S, Dras M, Dale R, Paris C (2006) Using dependency-based features to take the ’para-farce’ out of paraphrase. In: Proceedings of the Australasian language technology workshop 2006, pp 131–138. http://aclweb.org/anthology/U06-1019. Accessed 10 Mar 2019

  69. Zanzotti FM, Pennacchiotti M, Moschitti A (2009) A machine learning approach to textual entailment recognition. Nat Lang Eng 15(4):551–582. https://doi.org/10.1017/S1351324909990143

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the Research Center of the College of Computer and Information Sciences, King Saud University. The authors are grateful for this support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed El Bachir Menai.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Output examples

Appendix: Output examples

This appendix contains, for each obfuscation type, some plagiarism cases from PAN 2012 through 2014 as reported by [51,52,53]. The highlighted texts are the text segments detected by our system.

Example 1: Simulated Plagiarism PAN12 source-document02500 suspicious-document02500.

Suspicious text: Arrest, if in Cincinnati, William Wood, a friend of Jackson. Cargo as an accomplice. About 20 years old, 5 feet 11 inches, blond hair, face taut, rather thin, weighs 165 pounds. We’re going from here to South Bend after Wood to leave here to there. CRIM McDermott and Plummer. ”

Immediately after receiving the telegram Col. Deitsch Witte detailed Detectives, Bulmer and Jackson to take care of Jackson. It was learned that stayed at the house of Mrs. McNevin, at 222 West Ninth, next to the Opera House Robinson.

Detective Jackson was stationed at home and Witte and Bulmer in front of the room. Just when it seemed he had found his game for the fact that the officers were after him and had gone to unknown places was captured. It was after nine o’clock, when most of the last ray of hope had died out of the breasts officers, the Chief of Police Deitsch received the news that Jackson had been seen at the Palace Hotel.

The chief began and I met a man answering description of Jackson. He informed the detectives of reality, the individual was last seen and was walking slowly along Ninth Street, and when he reached 222 he looked up at the windows. He walked slowly to Plum Street and stopped and looked back toward the house.

Then hurried north on Plum Street to the Court. When the route was part of the square Bulmer detective approached him, saying: “Your name is Jackson, right?” He turned completely pale and trembling like an aspen, and as the detective continued, “I love you” he exclaimed, “My God, what is this?

At the same time, there was the beginning of the Mayor’s Office

Source text: Arrest if in Cincinnati, William Wood, friend of Jackson. Charge as accomplice. About 20 years, 5 feet 11 inches, light blonde hair, smooth face, rather slender, weighs 165 pounds. We go from here to South Bend after Wood as he left here for that place. CRIM MCDERMOTT AND PLUMMER.”

Immediately on receipt of the telegram Colonel Deitsch detailed Detectives Witte, Bulmer and Jackson to look after Jackson. It was learned that he roomed at the house of Mrs. McNevin, at 222 West Ninth, next door to Robinson’s Opera House. Detective Jackson was stationed in the house and Witte and Bulmer in the saloon opposite.

Just when it seemed as though their intended game had discovered the fact that the officers were after him and had left for parts unknown he was captured.

It was after 9 o’clock, when almost the last ray of hope had died out of the officers breasts, that Chief of Police Deitsch received word that Jackson had just been seen at the Palace Hotel. The chief started out and ran into a man answering Jackson’s description. He informed the detectives of the fact, the fellow was watched and was seen to walk slowly down Ninth Street, and on reaching 222 he looked up at the windows. He strolled slowly to Plum Street and stopped and again looked back at the house.

He then walked rapidly north on Plum Street toward Court. When he had traversed part of the square Detective Bulmer stepped up to him, saying: “Your name is Jackson, isn’t it?” The man turned perfectly livid and trembled like an aspen, and as the detective continued to say, “I want you,” he exclaimed, “My God! what is this for?”

At the same time the start was made for the Mayor’s Office.

Example 2: High artificial obfuscation PAN12 source-document01505 suspicious-document01505.

Suspicious text: He bestir, when you say than there the message as the Cheops, that you should not shoot whether your Escape there is as visit as i establish mine to be.\(*****\) Mister. Of christopher and CULLEY, whom he may never talk how i have from bustling one, ill cinematic message_ The noemi of an Elevation by\(*\), has powerfully melt up_ atrocities to a, thus good. Latter Alleyway of(CASSELL) is, in malice of which i cannot excessively really reject, as successful part to anyone need want, come a writer to have been postdate of headway_ decoupage. Wondrous_ scenario is not prussian Direction, in circumference, and the tearjerker being_ fifty dudgeon by his prussian national by House, and what do he look yet the place at WA would again in emblem. The “Alleyway to” of prussian gens is own shantytown in Dhegiha\(*\) War whether the_, stranded atrocities of mankind has really, has met for boy by the vicissitudes of dystopia were well be to articulation him do. I had to labialize that one of Mister. CULLEY’Karl trouble will have represented in case, when the message, leading love, had today to be bored at debris of all villagers, and have a association was shortly more urgent conversations. Violently of the betrayed and review the Kentan of is bring at-_, and she is lie, necessarily to retrieve, as short intentions are of potential miseries_ support, but had excessively cognize for_ Dave’mho_ in day for be hustle merely away of a idiom of a eventually final (and hastily be thrilling) souls. The account, that i has to believe, than the performer, whom he go prominently thus, is for—Mister. CULLEY year. The Dave\(*\)_,_ whom has been_ narrative, is not the juvenile which do he are demo entirely in another estimation events up its truthful quality; or on the subject_ Dave’_ may not have particularly rotate eventually be unimportant troubles of ground of a performer,_. For him are eventually i will have certainly visualize the volume before one not to be lose with one who lessens that texture at westward rugged or masterful weeks.\(*****\)_ Life Objects and Conversations in (Gracie and Edith) is there are least publication that he will be formerly see varied cer to which have been the who deplored it. Sorrow there can be of blade and cerebral panel who give; anglophilia on cowardice nor to tour that things make of the agency caused that the brother beside those it has been for the; and vehement with the troubles of another position (against the_ are o’er) keep after the message by the autobiography.

Source text: I hope, when you read this tale of the Pharaohs, that you will not find that your memory of the Book of Exodus is as faded as I found mine to be.

\(* * * * *\)

Mr. CHRISTOPHER CULLEY, whom you may remember for a bustling, rather cinematic story called_Naomi of the Mountains_, has now followed this with another, considerably better._Lily of the Alley_ (CASSELL) is, in spite of a title of which I cannot too strongly disapprove, as successful a piece of work of its own kind as anyone need wish for, showing the author to have made a notable advance in his art. Again the setting is Wild West, on the Mexican border, the theme of the tale being the outrages inflicted upon American citizens by VILLA, and what seemed then the bewildering delay of Washington over the vindication of the flag. The “Alley” of its unfortunate name is the slum in Kansas City where _Dave_, stranded on his way westward, met the girl to whom the laws of fiction were inevitably to join him. I fancy that one of Mr. CULLEY’S difficulties may have lain in the fact that, when the tale, following _Dave_, had finally shaken itself from the dust of cities, the need for feminine society was conspicuously less urgent. Even after a rescued and refreshed _Lily_ is brought up-country, she is kept, so to speak, as long as possible at the base, and only arrives on the actual scene of _Dave’s_ activities in time to be bustled hurriedly out of the way of the final (and wonderfully thrilling) chapters. The explanation is, I think, that the cowboy, whom he knows so well, is for Mr. CULLEY hero and heroine too. _Dave_, round whom the story revolves, is a pleasant study of a type of American youth which we are coming gratefully to estimate at its true worth; but in the development of the theme _Dave_ soon becomes almost insignificant beside the greater figure of the cowboy, _Monte Latarette_. For him alone I should regard the book as one not to be missed by anyone who values a handling of character at once delicate and masterful.

\(* * * * *\)

_Keeling Letters and Recollections_ (ALLEN AND UNWIN) is a book that will perhaps rouse varied emotions in those who read it. Regret there will be for so much youth and intellectual vigour sacrificed; admiration for courage and for a patriotism that circumstances made by no means the simple matter of conviction that it has been for most; and vehement opposition to many of the views (on the War especially) held by the subject of the memoir.

Example 3: Low artificial obfuscation PAN12 source-document01000 suspicious-document01000.

Suspicious text: The call of, “Pig away!” and the dash of bairn in the pursuit, at last make such a soprano that both attend grey and the much-try Andrew make disorder to the vicar. Time great declared that discipline was become impossible, and Andrew that there would not be a “martal vegetable in the Master Robinette’s’hog you got out thus merely.” Then have the vicar it made the concept for this consequence: “if David’s polly porker is seen in a duel had again, it travel stern that same time to Farmer.” The priest’s regulation were not flagship to be disregarded, and his threats were always carried away. David and Ambrose might have been understand with a cock and nails very busy avenues at the hog-hordeolum that afternoon, and Antony’mho visits to the garden discontinue, until one unlucky occasion when David was away from condominium alas, and it fell away in the mode:—in the town of Nearminster, ten knot from Easney, populate Pennie’s godparent Miss Unity Cheffins, and it was Mr and Mrs Hawthorn’grass to pay than her did an choice of 2 or 3 days, taking each kinds of the dean bairn with them in curve. It was an occasion much expect by latter, but more for the glory in the situation than from any delight being connected with it, for Miss and Unity was instead come a potent young begum, and particular strength in her notions as to their poor jealousy. She was fond of saying, “in what_ my_ time young fortune did so and often,” and of noticing any failure from all niceness, or yet any shortcoming. She was a proper young lady, and lived in a firm so outside the Cathedral close; it was sombrely furnished, and full of dark young portraits, and rare China bowling and knick-knacks, which last Woman Completeness thought a transaction of, and dusted carefully with a hands.

Source text: The cry of, “Pig out!” and the consequent rush of children in pursuit, at last reached such a pitch that both Miss Grey and the much-tried Andrew made complaint to the vicar. Miss Grey declared that discipline was becoming impossible, and Andrew that there would not be a “martal vegetable in the garden if Master David’s pig got out so often.” Then the vicar made a rule to this effect:

“If David’s pig is seen in the garden again, it goes back that same day to Farmer Hatchard.”

The vicar’s rules were not things to be disregarded, and his threats were always carried out. David and Ambrose might have been seen with a large hammer and nails very busy at the pig-sty that afternoon, and Antony’s visits to the garden ceased, until one unlucky occasion when David was away from home, and it fell out in the following manner:-

In the cathedral town of Nearminster, 10 miles from Easney, lived Pennie’s godmother Miss Unity Cheffins, and it was Mr and Mrs Hawthorn’s custom to pay her an annual visit of 2 or 3 days, taking each of the four elder children with them in turn. It was an occasion much anticipated by the latter, but more for the honour of the thing than from any actual pleasure connected with it, for Miss Unity was rather a stiff old lady, and particular in her notions as to their proper behaviour. She was fond of saying, “In_my_ time young people did so and so,” and of noticing any little failure in politeness, or even any personal defect. She was a rich old lady, and lived in a great square house just inside the Cathedral Close; it was sombrely furnished, and full of dark old portraits, and rare China bowls and knick-knacks, which last Miss Unity thought a great deal of, and dusted carefully with her own hands.

Example 4: Random artificial obfuscation PAN13 source-document00921 suspicious-document00086.

Suspicious text: When responding, please specify exactly which program you are previous in. rsvp only whfellow@ writing. upenn edu here whfellow@ writing. upenn Seminar, students will study the work Dog. This program. The edu is teach by Professor. Al.-Lee syllabi from age are available.

Source text: When responding, please specify exactly which program you are interested in.rsvp only whfellow@writing.upenn.edu here whfellow@writing.upenn.edu In the Writers House Fellows Seminar, students will study the work of all three Fellows. The course is taught by Kelly Professor and Writers House Faculty Director . This year’s coordinator of the program is . Al Filreis Jamie–Lee Josselyn The syllabi from previous years are available.

Suspicious text: We desire to make it possible for the youngest writers and abstractor-evaluator to have sustained contact with abstractor of accomplishment. We besides desire to resist the clip-honored differentiation—more honored in biologism between working with eminent writers on extremity and analyse literature on the other.

Source text: We want to make it possible for the youngest writers and writer-critics to have sustained contact with authors of great accomplishment in an informal atmosphere. We also want to resist the time-honored distinction—more honored in practice than in theory—between working with eminent writers on the one hand and studying literature on the other.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Altheneyan, A.S., Menai, M.E. Automatic plagiarism detection in obfuscated text. Pattern Anal Applic 23, 1627–1650 (2020). https://doi.org/10.1007/s10044-020-00882-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-020-00882-9

Keywords

Navigation