Abstract
This article describes a joint research work between Monash University and the University of Alicante, where software originally meant for plagiarisman and copy detection in academic works is successfully applied to perform comparative analysis of different editions of literary works. The experiments were performed with Spanish texts from the Miguel de Cervantes digital library. The results have proved useful for literary and linguistic research, automating part of the tedious task of comparative text analysis. Besides, other interesting uses were detected.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bia, A. and Pedreño, A. (2001). The Miguel de Cervantes Digital Library: The Hispanic Voice on the WEB. LLC (Literary and Linguistic Computing) journal, Oxford University Press, 16(2): 161–177. Presented at ALLC/ACH 2000, The Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities, 21/25 July 2000, University of Glasgow.
Broder, A., Glassman, S., and Manasse, M. Syntatic Clustering of the Web. In Sixth International Web Conference, Santa Clara, California, USA. URL: http://decweb.ethz.ch/WWW6/Technical/Paper205/paper205.html.
Chang, W. and Lawler, E. (1994). Sublinear Approximate String Matching and Biological Applications. Algorithmica, 12:327–344.
Garcia-Molina, H. and Shivakumar, N. (1995a). SCAM: A Copy Detection Mechanismfor Digital Documents. In Proceedings of 2nd International Conference in Theory and Practice of Digital Libraries (DL’95), Austin, Texas.
Garcia-Molina, H. and Shivakumar, N. (1995b). The SCAM Approach To Copy Detection in Digital Libraries. D-lib Magazine.
Gusfield, D. (1997). Algorithms on Strings, Trees, and Sequences. Cambridge University Press.
Heintze, N. (1996). Scalable Document Fingerprinting. In Proceedings of the Second USENIX Workshop on Electronic Commerce, Oakland, California. URL: http://www.cs.cmu.edu/afs/cs/user/nch/www/koala/main.html.
Manber, U. (1994). Finding similar Files in a Large File System. In Proceedings of the 1994 USENIX Conference, pages 1–10. URL: http://www.cs.cmu.edu/afs/cs/user/nch/www/koala/main.html.
Monostori, K., Zaslavsky, A., and Schmidt, H. (1999). Parallel Overlap and Similarity Detection in Semi-Structured Document Collections. In 6th Annual Australasian Conference on Parallel And Real-Time Systems (PART’ 99).
Navarro, G., Baeza-Yates, R., and Ribeiro-Neto, B. (1999). Indexing and searching. In Modern Information Retrieval, chapter 8, pages 191–228. ACM press and Addison Wesley, Edinburgh Gate, Harlow, Essex CM20 2JE, England, 1st edition. See also http://www.dcc.ufmg.br/irbook or http://sunsite.dcc.uchile.cl/irbook.
Ukkonen, E. (1995). On-Line Construction of Suffix Trees. Algorithmica, 14:249–260.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zaslavsky, A., Bia, A., Monostori, K. (2001). Using Copy-Detection and Text Comparison Algorithms for Cross-Referencing Multiple Editions of Literary Works. In: Constantopoulos, P., Sølvberg, I.T. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2001. Lecture Notes in Computer Science, vol 2163. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44796-2_10
Download citation
DOI: https://doi.org/10.1007/3-540-44796-2_10
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42537-3
Online ISBN: 978-3-540-44796-2
eBook Packages: Springer Book Archive