Abstract
Plagiarism is a serious problem in education, research, publishing and other fields. Automatic plagiarism detection systems are crucial for ensuring the integrity and genuineness of intellectual work. There are different types of plagiarism, such as copy–paste, obfuscation and translation. In particular, obfuscated text is one of the hardest types of plagiarism to detect. In this paper, we propose an automatic plagiarism detection system for obfuscated text based on a support vector machine classifier that exploits a set of lexical, syntactic and semantic features. We evaluated the performance of the proposed system on benchmark English and Arabic corpora made available by the PAN Workshop series: PAN 2012, PAN 2013, PAN 2014 and PAN@FIRE2015. We also compared the performance of our system to the performances of other systems that participated in the PAN competitions. The obtained results show that our system had the best performance in terms of the F-measure on the PAN 2012 and on the PAN@FIRE2015 obfuscated sub-corpora, was among the top four on the PAN 2013 corpus and was among the top two on the PAN 2014 corpus.
Similar content being viewed by others
References
Abnar S, Dehghani M, Zamani H, Shakery A, (2014) Expanded N-grams for semantic text alignment-notebook for PAN at CLEF. In: Cappellato L, Ferro N, Halvey M, Kraaij W (eds) CLEF 2014 evaluation labs and workshop—working notes papers, 15–18 September, CEUR-WS.org, Sheffield, UK, pp 928–938
Adams R, Nicolae G, Nicolae C, Harabagiu S (2007) Textual entailment through extended lexical overlap and lexico-semantic matching. In: Proceedings of the ACL-PASCAL workshop on textual entailment and paraphrasing, Association for Computational Linguistics, Stroudsburg, PA, USA, RTE ’07, pp 119–124
Al-Sulaiti L, Atwell ES (2006) The design of a corpus of contemporary arabic. Int J Corpus Linguist 11(2):135–171
Alvi F, Stevenson M, Clough P, (2014) Hashing and merging heuristics for text reuse detection-notebook for PAN at CLEF. In: Cappellato L, Ferro N, Halvey M, Kraaij W (eds) CLEF 2014 evaluation labs and workshop—working notes papers, 15–18 September, CEUR-WS.org, Sheffield, UK, pp 939–946
Alzahrani S (2015) Arabic plagiarism detection using word correlation in N-grams with K-overlapping approach—working notes for PAN-AraPlagDet at FIRE 2015. In: FIRE 2015 working notes papers, 4–6 December, Gandhinagar, India
Alzahrani S, Salim N, Abraham A (2012) Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE Trans Syst Man Cybern Part C App Rev 42(2):133–149. https://doi.org/10.1109/TSMCC.2011.2134847
Banea C, Chen D, Mihalcea R, Cardie C, Wiebe J (2014) Simcompass: using deep learning word embeddings to assess cross-level similarity. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014), Association for Computational Linguistics, pp 560–565. https://doi.org/10.3115/v1/S14-2098.URL http://aclweb.org/anthology/S14-2098
Bensalem I, Boukhalfa I, Rosso P, Abouenour L, Darwish K, Chikhi S (2015) Overview of the AraPlagDet PAN@FIRE2015 shared task on Arabic plagiarism detection. FIRE 2015 working notes papers, 4–6 December, Gandhinagar, India, pp 111–122
Billah Nagoudi EM, Khorsi A, Cherroun H, Schwab D (2018) A two-level plagiarism detection system for Arabic documents. Cybern Inf Technol 18(1). https://hal.archives-ouvertes.fr/hal-01706138
Bochkarev VV, Shevlyakova AV, Solovyev VD (2015) The average word length dynamics as an indicator of cultural changes in society. Soc Evol Hist 14(2):153–175
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:27:1–27:27
Das D, Smith NA (2009) Paraphrase identification as probabilistic quasi-synchronous recognition. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP: vol 1—Volume 1, Association for Computational Linguistics, Stroudsburg, PA, USA, ACL ’09, pp 468–476
Daud A, Khan JA, Nasir JA, Abbasi RA, Aljohani NR, Alowibdi JS (2018) Latent dirichlet allocation and pos tags based method for external plagiarism detection: LDA and POS tags based plagiarism detection. Int J Semant Web Inf Sys 14(3):53–69
de Marneffe MC, MacCartney B, Manning CD (2006) Generating typed dependency parses from phrase structure parses. In: LREC, pp 449–454
Denkowski M, Lavie A (2014) Meteor universal: language specific translation evaluation for any target language. In: Proceedings of the EACL 2014 workshop on statistical machine translation, pp 376–380
Eissen SMZ, Stein B (2006) Intrinsic plagiarism detection. In: Lalmas M, MacFarlane A, Rüger S, Tombros A, Tsikrika T, Yavlinsky A (eds) Advances in information retrieval. Springer, Berlin, pp 565–569
Finch A, Hwang YS, Sumita E (2005) Using machine translation evaluation techniques to determine sentence-level semantic equivalence. In: Proceedings of the 3rd international workshop on paraphrasing (IWP2005), pp 17–24
Gharavi E, Veisi H, Rosso P (2019) Scalable and language-independent embedding-based approach for plagiarism detection considering obfuscation type: no training phase. Neural Comput Appl. https://doi.org/10.1007/s00521-019-04594-y
Gillam L (2013) Guess again and see if they line up: surrey’s runs at plagiarism detection—notebook for PAN at CLEF 2013
Gillam L, Notley S (2014) Evaluating robustness for ’IPCRESS’: surrey’s text alignment for plagiarism detection-notebook for PAN at CLEF. In: Cappellato L, Ferro N, Halvey M, Kraaij W (eds) CLEF 2014 evaluation labs and workshop—working notes papers, 15–18 September, CEUR-WS.org, Sheffield, UK, pp 951–957
Gillam L, Newbold N, Cooke N (2012) Educated guesses and equality judgements: using search engines and pairwise match for external plagiarism detection—notebook for PAN at CLEF 2012. http://www.clef-initiative.eu/publication/working-notes. Accessed 10 Nov 2018
Glinos D, (2014) A hybrid architecture for plagiarism detection-notebook for PAN at CLEF. In: Cappellato L, Ferro N, Halvey M, Kraaij W (eds) CLEF 2014 evaluation labs and workshop—working notes papers, 15–18 September, CEUR-WS.org, Sheffield, UK, pp 958–965
Gross P, Modaresi P (2014) Plagiarism alignment detection by merging context seeds-notebook for PAN at CLEF. In: Cappellato L, Ferro N, Halvey M, Kraaij W (eds) CLEF 2014 evaluation labs and workshop—working notes papers, 15–18 September, CEUR-WS.org, Sheffield, UK, pp 966–972
Grozea C, Popescu M (2012) Encoplot—tuned for high recall (also proposing a new plagiarism detection score). In: Forner P, Karlgren J, Womser-Hacker C (eds) CLEF 2012 Evaluation labs and workshop—working notes papers, 17–20 September, Rome, Italy, pp 538–556. http://www.clef-initiative.eu/publication/working-notes. Accessed 9 Apr 2018
Jayapal A (2012) Similarity overlap metric and greedy string tiling at PAN 2012: Plagiarism detection—notebook for PAN at CLEF 2012. http://www.clef-initiative.eu/publication/working-notes. Accessed 24 Jan 2018
Jayapal A, Goswami B (2013) Submission to the 5th international competition on plagiarism detection. From Nuance Communications, USA. http://www.uni-weimar.de/medien/webis/events/pan-13. Accessed 26 Jan 2018
Ji Y, Eisenstein J (2013) Discriminative improvements to distributional sentence similarity. In: Proceedings of the 2013 conference on empirical methods in natural language processing, Association for Computational Linguistics, pp 891–896. http://aclweb.org/anthology/D13-1090. Accessed 20 Apr 2018
Kenter T, de Rijke M (2015) Short text similarity with word embeddings. In: Proceedings of the 24th ACM international on conference on information and knowledge management, ACM, New York, NY, USA, CIKM ’15, pp 1411–1420. https://doi.org/10.1145/2806416.2806475,
Kong L, Qi H, Wang S, Du C, Wang S, Han Y (2012) Approaches for candidate document retrieval and detailed comparison of plagiarism detection—Notebook for PAN at CLEF 2012. http://www.clef-initiative.eu/publication/working-notes. Accessed 25 Nov 2018
Kong L, Qi H, Du C, Wang M, Han Z (2013) Approaches for source retrieval and text alignment of plagiarism detection—notebook for PAN at CLEF 2013
Kong L, Han Y, Han Z, Yu H, Wang Q, Zhang T, Qi H, (2014) Source retrieval based on learning to rank and text alignment based on plagiarism type recognition for plagiarism detection-notebook for PAN at CLEF. In: Cappellato L, Ferro N, Halvey M, Kraaij W (eds) CLEF 2014 evaluation labs and workshop—working notes papers, 15–18 September, CEUR-WS.org, Sheffield, UK, pp 973–976
Küppers R, Conrad S (2012) A set-based approach to plagiarism detection—notebook for PAN at CLEF 2012. http://www.clef-initiative.eu/publication/working-notes. Accessed 25 May 2018
Larock MH, Tressler JC, Lewis CE (1980) Mastering effective English. Copp Clark Pitman, Mississauga
Madnani N, Tetreault J, Chodorow M (2012) Re-examining machine translation metrics for paraphrase identification. In: Proceedings of the 2012 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, Association for Computational Linguistics, Stroudsburg, PA, USA, NAACL HLT ’12, pp 182–190. http://dl.acm.org/citation.cfm?id=2382029.2382055. Accessed 24 Sept 2018
Magooda A, Mahgoub A, Rashwan M, Fayek M (2015) Rdi system for extrinsic plagiarism detection (rdi\_red). FIRE 2015 working notes papers, 4–6 December. Gandhinagar, India, pp 126–128
Maurer HA, Kappe F, Zaka B (2006) Plagiarism-a survey. J UCS 12(8):1050–1084
Mollá D (2003) Towards semantic-based overlap measures for question-answering. In: Proceedings of the Australasian language technology workshop, pp 110–117. http://aclweb.org/anthology/U03-1014. Accessed 3 Oct 2018
Mollá D, Gardiner M (2004) Answerfinder: question answering by combining lexical, syntactic and semantic information. Proc Aus Lang Technol Workshop 2004:9–16
Nourian A (2013) Submission to the 5th international competition on plagiarism detection. http://www.uni-weimar.de/medien/webis/events/pan-13. From the Iran University of Science and Technology. Accessed 11 Dec 2018
Oberreuter G, Eiselt A (2014) Submission to the 6th international competition on plagiarism detection. http://www.uni-weimar.de/medien/webis/events/pan-14. From Innovand.io, Chile. Accssed 21 May 2018
Oberreuter G, Carrillo-Cisneros D, Scherson I, Velásquez J (2012) Submission to the 4th international competition on plagiarism detection. http://www.uni-weimar.de/medien/webis/events/pan-12. From the University of Chile, Chile, and the University of California, USA
Palkovskii Y, Belov A (2012) Applying specific clusterization and fingerprint density distribution with genetic algorithm overall tuning in external plagiarism detection—notebook for PAN at CLEF 2012. http://www.clef-initiative.eu/publication/working-notes. Accessed 24 Nov 2018
Palkovskii Y, Belov A (2013) Using hybrid similarity methods for plagiarism detection—notebook for PAN at CLEF 2013
Palkovskii Y, Belov A (2014) Developing high-resolution universal multi-type N-gram plagiarism detector-notebook for PAN at CLEF. In: Cappellato L, Ferro N, Halvey M, Kraaij W (eds) CLEF 2014 evaluation labs and workshop—working notes papers, 15–18 September, CEUR-WS.org, Sheffield, UK, pp 984–989
Palkovskii Y, Belov A (2015) Submission to AraPlagDet PAN@FIRE2015 shared task on Arabic plagiarism detection. From the Zhytomyr State University and SkyLine LLC
Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics, pp 311–318. http://aclweb.org/anthology/P02-1040. Accessed 2 Jan 2019
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Empirical methods in natural language processing (EMNLP), pp 1532–1543. http://www.aclweb.org/anthology/D14-1162. Accessed 24 Feb 2018
Potthast M, Barrón-Cedeño A, Eiselt A, Stein B, Rosso P (2010) Overview of the 2nd international competition on plagiarism detection. http://www.clef-initiative.eu/publication/working-notes. Accessed 30 Apr 2018
Potthast M, Stein B, Barrón-Cedeño A, Rosso P (2010) An evaluation framework for plagiarism detection. In: Huang CR, Jurafsky D (eds) 23rd international conference on computational linguistics (COLING 10). Association for Computational Linguistics, Stroudsburg, Pennsylvania, pp 997–1005
Potthast M, Eiselt A, Barrón-Cedeño A, Stein B, Rosso P (2011) Overview of the 3rd international competition on plagiarism detection. http://www.clef-initiative.eu/publication/working-notes. Accessed 15 Mar 2019
Potthast M, Gollub T, Hagen M, Graßegger J, Kiesel J, Michel M, Oberländer A, Tippmann M, Barrón-Cedeño A, Gupta P, Rosso P, Stein B (2012) Overview of the 4th international competition on plagiarism detection. http://www.clef-initiative.eu/publication/working-notes. Accessed 6 Jan 2019
Potthast M, Gollub T, Hagen M, Tippmann M, Kiesel J, Rosso P, Stamatatos E, Stein B (2013) Overview of the 5th international competition on plagiarism detection. In: Forner P, Navigli R, Tufis D (eds) Working notes papers of the CLEF 2013 evaluation labs, pp 301–331. http://www.clef-initiative.eu/publication/working-notes. Accessed 20 Feb 2019
Potthast M, Hagen M, Beyer A, Busse M, Tippmann M, Rosso P, Stein B (2014) Overview of the 6th international competition on plagiarism detection. In: Cappellato L, Ferro N, Halvey M, Kraaij W (eds) Working notes papers of the CLEF 2014 evaluation labs, CLEF and CEUR-WS.org, CEUR workshop proceedings, pp 845–876. http://www.clef-initiative.eu/publication/working-notes. Accessed 7 May 2018
Reginaldo TV, Meireles MRG, Patrocínio ZKG (2019) Evaluating AdaBoost for plagiarism detection. In: Vera-Rodriguez R, Fierrez J, Morales A (eds) Progress in pattern recognition, image analysis, computer vision, and applications. CIARP 2018. Lecture Notes in Computer Science, vol 11401. Springer, pp 865–873
Rodríguez Torrejón D, Martín Ramos J (2012) Detailed comparison module in CoReMo 1.9 Plagiarism Detector—notebook for PAN at CLEF 2012. In: Forner P, Karlgren J, Womser-Hacker C (eds) CLEF 2012 evaluation labs and workshop—working notes papers, 17–20 September, Rome, Italy, pp 1–8. http://www.clef-initiative.eu/publication/working-notes. Accessed 3 Feb 2019
Rodríguez Torrejón D, Martín Ramos J (2013) Text Alignment Module in CoReMo 2.1 Plagiarism Detector—Notebook for PAN at CLEF 2013
Rodríguez Torrejón D, Martín Ramos J (2014) CoReMo 2.3 plagiarism detector text alignment module-notebook for PAN at CLEF. In: Cappellato L, Ferro N, Halvey M, Kraaij W (eds) CLEF 2014 Evaluation labs and workshop—working notes papers, 15–18 September. CEUR-WS.org, Sheffield, UK, pp 997–1003
Sanchez-Perez M, Sidorov G, Gelbukh A (2014) A winning approach to text alignment for text reuse detection at PAN 2014–notebook for PAN at CLEF. In: Cappellato L, Ferro N, Halvey M, Kraaij W (eds) CLEF 2014 evaluation labs and workshop—working notes papers, 15–18 September, CEUR-WS.org, Sheffield, UK, pp 1004–1011
Sánchez-Vega F, y Gómez MM, Villaseñor-Pineda L (2012) Optimized fuzzy text alignment for plagiarism detection—notebook for PAN at CLEF 2012. http://www.clef-initiative.eu/publication/working-notes. Accessed 1 Feb 2019
Sánchez-Vega F, Villatoro-Tello E, Montes-y Gómez M, Rosso P, Stamatatos E, Villaseñor-Pineda L (2019) Paraphrase plagiarism identification with character-level features. Pattern Anal Appl 22(2):669–681. https://doi.org/10.1007/s10044-017-0674-z
Saremi M, Yaghmaee F (2013) Submission to the 5th international competition on plagiarism detection. http://www.uni-weimar.de/medien/webis/events/pan-13. From Semnan University, Iran. Accessed 20 Dec 2018
Scholkopf B, Smola AJ (2001) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge
Shrestha P, Solorio T (2013) Using a variety of N-grams for the detection of different kinds of plagiarism—notebook for PAN at CLEF 2013
Shrestha P, Maharjan S, Solorio T (2014) Machine translation evaluation metric for text alignment-notebook for PAN at CLEF. In: Cappellato L, Ferro N, Halvey M, Kraaij W (eds) CLEF 2014 evaluation labs and workshop—working notes papers, 15–18 September, CEUR-WS.org, Sheffield, UK, pp 1012–1016
Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: In Proceedings of Association for Machine Translation in the Americas, pp 223–231
Suchomel Š, Kasprzak J, Brandejs M (2012) Three way search engine queries with multi-feature document comparison for plagiarism detection—notebook for PAN at CLEF. In: Forner P, Karlgren J, Womser-Hacker C (eds) CLEF 2012 evaluation labs and workshop—working notes papers, 17–20 September, Rome, Italy, pp 1–12. http://www.clef-initiative.eu/publication/working-notes. Accessed 24 Nov 2018
Suchomel Š, Kasprzak J, Brandejs M (2013) Diverse queries and feature type selection for plagiarism discovery—notebook for PAN at CLEF 2013
Wan S, Dras M, Dale R, Paris C (2006) Using dependency-based features to take the ’para-farce’ out of paraphrase. In: Proceedings of the Australasian language technology workshop 2006, pp 131–138. http://aclweb.org/anthology/U06-1019. Accessed 10 Mar 2019
Zanzotti FM, Pennacchiotti M, Moschitti A (2009) A machine learning approach to textual entailment recognition. Nat Lang Eng 15(4):551–582. https://doi.org/10.1017/S1351324909990143
Acknowledgements
This work was supported by the Research Center of the College of Computer and Information Sciences, King Saud University. The authors are grateful for this support.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Output examples
Appendix: Output examples
This appendix contains, for each obfuscation type, some plagiarism cases from PAN 2012 through 2014 as reported by [51,52,53]. The highlighted texts are the text segments detected by our system.
Example 1: Simulated Plagiarism PAN12 source-document02500 suspicious-document02500.
Suspicious text: Arrest, if in Cincinnati, William Wood, a friend of Jackson. Cargo as an accomplice. About 20 years old, 5 feet 11 inches, blond hair, face taut, rather thin, weighs 165 pounds. We’re going from here to South Bend after Wood to leave here to there. CRIM McDermott and Plummer. ”
Immediately after receiving the telegram Col. Deitsch Witte detailed Detectives, Bulmer and Jackson to take care of Jackson. It was learned that stayed at the house of Mrs. McNevin, at 222 West Ninth, next to the Opera House Robinson.
Detective Jackson was stationed at home and Witte and Bulmer in front of the room. Just when it seemed he had found his game for the fact that the officers were after him and had gone to unknown places was captured. It was after nine o’clock, when most of the last ray of hope had died out of the breasts officers, the Chief of Police Deitsch received the news that Jackson had been seen at the Palace Hotel.
The chief began and I met a man answering description of Jackson. He informed the detectives of reality, the individual was last seen and was walking slowly along Ninth Street, and when he reached 222 he looked up at the windows. He walked slowly to Plum Street and stopped and looked back toward the house.
Then hurried north on Plum Street to the Court. When the route was part of the square Bulmer detective approached him, saying: “Your name is Jackson, right?” He turned completely pale and trembling like an aspen, and as the detective continued, “I love you” he exclaimed, “My God, what is this?
At the same time, there was the beginning of the Mayor’s Office
Source text: Arrest if in Cincinnati, William Wood, friend of Jackson. Charge as accomplice. About 20 years, 5 feet 11 inches, light blonde hair, smooth face, rather slender, weighs 165 pounds. We go from here to South Bend after Wood as he left here for that place. CRIM MCDERMOTT AND PLUMMER.”
Immediately on receipt of the telegram Colonel Deitsch detailed Detectives Witte, Bulmer and Jackson to look after Jackson. It was learned that he roomed at the house of Mrs. McNevin, at 222 West Ninth, next door to Robinson’s Opera House. Detective Jackson was stationed in the house and Witte and Bulmer in the saloon opposite.
Just when it seemed as though their intended game had discovered the fact that the officers were after him and had left for parts unknown he was captured.
It was after 9 o’clock, when almost the last ray of hope had died out of the officers breasts, that Chief of Police Deitsch received word that Jackson had just been seen at the Palace Hotel. The chief started out and ran into a man answering Jackson’s description. He informed the detectives of the fact, the fellow was watched and was seen to walk slowly down Ninth Street, and on reaching 222 he looked up at the windows. He strolled slowly to Plum Street and stopped and again looked back at the house.
He then walked rapidly north on Plum Street toward Court. When he had traversed part of the square Detective Bulmer stepped up to him, saying: “Your name is Jackson, isn’t it?” The man turned perfectly livid and trembled like an aspen, and as the detective continued to say, “I want you,” he exclaimed, “My God! what is this for?”
At the same time the start was made for the Mayor’s Office.
Example 2: High artificial obfuscation PAN12 source-document01505 suspicious-document01505.
Suspicious text: He bestir, when you say than there the message as the Cheops, that you should not shoot whether your Escape there is as visit as i establish mine to be.\(*****\) Mister. Of christopher and CULLEY, whom he may never talk how i have from bustling one, ill cinematic message_ The noemi of an Elevation by\(*\), has powerfully melt up_ atrocities to a, thus good. Latter Alleyway of(CASSELL) is, in malice of which i cannot excessively really reject, as successful part to anyone need want, come a writer to have been postdate of headway_ decoupage. Wondrous_ scenario is not prussian Direction, in circumference, and the tearjerker being_ fifty dudgeon by his prussian national by House, and what do he look yet the place at WA would again in emblem. The “Alleyway to” of prussian gens is own shantytown in Dhegiha\(*\) War whether the_, stranded atrocities of mankind has really, has met for boy by the vicissitudes of dystopia were well be to articulation him do. I had to labialize that one of Mister. CULLEY’Karl trouble will have represented in case, when the message, leading love, had today to be bored at debris of all villagers, and have a association was shortly more urgent conversations. Violently of the betrayed and review the Kentan of is bring at-_, and she is lie, necessarily to retrieve, as short intentions are of potential miseries_ support, but had excessively cognize for_ Dave’mho_ in day for be hustle merely away of a idiom of a eventually final (and hastily be thrilling) souls. The account, that i has to believe, than the performer, whom he go prominently thus, is for—Mister. CULLEY year. The Dave\(*\)_,_ whom has been_ narrative, is not the juvenile which do he are demo entirely in another estimation events up its truthful quality; or on the subject_ Dave’_ may not have particularly rotate eventually be unimportant troubles of ground of a performer,_. For him are eventually i will have certainly visualize the volume before one not to be lose with one who lessens that texture at westward rugged or masterful weeks.\(*****\)_ Life Objects and Conversations in (Gracie and Edith) is there are least publication that he will be formerly see varied cer to which have been the who deplored it. Sorrow there can be of blade and cerebral panel who give; anglophilia on cowardice nor to tour that things make of the agency caused that the brother beside those it has been for the; and vehement with the troubles of another position (against the_ are o’er) keep after the message by the autobiography.
Source text: I hope, when you read this tale of the Pharaohs, that you will not find that your memory of the Book of Exodus is as faded as I found mine to be.
\(* * * * *\)
Mr. CHRISTOPHER CULLEY, whom you may remember for a bustling, rather cinematic story called_Naomi of the Mountains_, has now followed this with another, considerably better._Lily of the Alley_ (CASSELL) is, in spite of a title of which I cannot too strongly disapprove, as successful a piece of work of its own kind as anyone need wish for, showing the author to have made a notable advance in his art. Again the setting is Wild West, on the Mexican border, the theme of the tale being the outrages inflicted upon American citizens by VILLA, and what seemed then the bewildering delay of Washington over the vindication of the flag. The “Alley” of its unfortunate name is the slum in Kansas City where _Dave_, stranded on his way westward, met the girl to whom the laws of fiction were inevitably to join him. I fancy that one of Mr. CULLEY’S difficulties may have lain in the fact that, when the tale, following _Dave_, had finally shaken itself from the dust of cities, the need for feminine society was conspicuously less urgent. Even after a rescued and refreshed _Lily_ is brought up-country, she is kept, so to speak, as long as possible at the base, and only arrives on the actual scene of _Dave’s_ activities in time to be bustled hurriedly out of the way of the final (and wonderfully thrilling) chapters. The explanation is, I think, that the cowboy, whom he knows so well, is for Mr. CULLEY hero and heroine too. _Dave_, round whom the story revolves, is a pleasant study of a type of American youth which we are coming gratefully to estimate at its true worth; but in the development of the theme _Dave_ soon becomes almost insignificant beside the greater figure of the cowboy, _Monte Latarette_. For him alone I should regard the book as one not to be missed by anyone who values a handling of character at once delicate and masterful.
\(* * * * *\)
_Keeling Letters and Recollections_ (ALLEN AND UNWIN) is a book that will perhaps rouse varied emotions in those who read it. Regret there will be for so much youth and intellectual vigour sacrificed; admiration for courage and for a patriotism that circumstances made by no means the simple matter of conviction that it has been for most; and vehement opposition to many of the views (on the War especially) held by the subject of the memoir.
Example 3: Low artificial obfuscation PAN12 source-document01000 suspicious-document01000.
Suspicious text: The call of, “Pig away!” and the dash of bairn in the pursuit, at last make such a soprano that both attend grey and the much-try Andrew make disorder to the vicar. Time great declared that discipline was become impossible, and Andrew that there would not be a “martal vegetable in the Master Robinette’s’hog you got out thus merely.” Then have the vicar it made the concept for this consequence: “if David’s polly porker is seen in a duel had again, it travel stern that same time to Farmer.” The priest’s regulation were not flagship to be disregarded, and his threats were always carried away. David and Ambrose might have been understand with a cock and nails very busy avenues at the hog-hordeolum that afternoon, and Antony’mho visits to the garden discontinue, until one unlucky occasion when David was away from condominium alas, and it fell away in the mode:—in the town of Nearminster, ten knot from Easney, populate Pennie’s godparent Miss Unity Cheffins, and it was Mr and Mrs Hawthorn’grass to pay than her did an choice of 2 or 3 days, taking each kinds of the dean bairn with them in curve. It was an occasion much expect by latter, but more for the glory in the situation than from any delight being connected with it, for Miss and Unity was instead come a potent young begum, and particular strength in her notions as to their poor jealousy. She was fond of saying, “in what_ my_ time young fortune did so and often,” and of noticing any failure from all niceness, or yet any shortcoming. She was a proper young lady, and lived in a firm so outside the Cathedral close; it was sombrely furnished, and full of dark young portraits, and rare China bowling and knick-knacks, which last Woman Completeness thought a transaction of, and dusted carefully with a hands.
Source text: The cry of, “Pig out!” and the consequent rush of children in pursuit, at last reached such a pitch that both Miss Grey and the much-tried Andrew made complaint to the vicar. Miss Grey declared that discipline was becoming impossible, and Andrew that there would not be a “martal vegetable in the garden if Master David’s pig got out so often.” Then the vicar made a rule to this effect:
“If David’s pig is seen in the garden again, it goes back that same day to Farmer Hatchard.”
The vicar’s rules were not things to be disregarded, and his threats were always carried out. David and Ambrose might have been seen with a large hammer and nails very busy at the pig-sty that afternoon, and Antony’s visits to the garden ceased, until one unlucky occasion when David was away from home, and it fell out in the following manner:-
In the cathedral town of Nearminster, 10 miles from Easney, lived Pennie’s godmother Miss Unity Cheffins, and it was Mr and Mrs Hawthorn’s custom to pay her an annual visit of 2 or 3 days, taking each of the four elder children with them in turn. It was an occasion much anticipated by the latter, but more for the honour of the thing than from any actual pleasure connected with it, for Miss Unity was rather a stiff old lady, and particular in her notions as to their proper behaviour. She was fond of saying, “In_my_ time young people did so and so,” and of noticing any little failure in politeness, or even any personal defect. She was a rich old lady, and lived in a great square house just inside the Cathedral Close; it was sombrely furnished, and full of dark old portraits, and rare China bowls and knick-knacks, which last Miss Unity thought a great deal of, and dusted carefully with her own hands.
Example 4: Random artificial obfuscation PAN13 source-document00921 suspicious-document00086.
Suspicious text: When responding, please specify exactly which program you are previous in. rsvp only whfellow@ writing. upenn edu here whfellow@ writing. upenn Seminar, students will study the work Dog. This program. The edu is teach by Professor. Al.-Lee syllabi from age are available.
Source text: When responding, please specify exactly which program you are interested in.rsvp only whfellow@writing.upenn.edu here whfellow@writing.upenn.edu In the Writers House Fellows Seminar, students will study the work of all three Fellows. The course is taught by Kelly Professor and Writers House Faculty Director . This year’s coordinator of the program is . Al Filreis Jamie–Lee Josselyn The syllabi from previous years are available.
Suspicious text: We desire to make it possible for the youngest writers and abstractor-evaluator to have sustained contact with abstractor of accomplishment. We besides desire to resist the clip-honored differentiation—more honored in biologism between working with eminent writers on extremity and analyse literature on the other.
Source text: We want to make it possible for the youngest writers and writer-critics to have sustained contact with authors of great accomplishment in an informal atmosphere. We also want to resist the time-honored distinction—more honored in practice than in theory—between working with eminent writers on the one hand and studying literature on the other.
Rights and permissions
About this article
Cite this article
Altheneyan, A.S., Menai, M.E. Automatic plagiarism detection in obfuscated text. Pattern Anal Applic 23, 1627–1650 (2020). https://doi.org/10.1007/s10044-020-00882-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-020-00882-9