Using Copy-Detection and Text Comparison Algorithms for Cross-Referencing Multiple Editions of Literary Works

Zaslavsky, Arkady; Bia, Alejandro; Monostori, Krisztian

doi:10.1007/3-540-44796-2_10

Arkady Zaslavsky⁷,
Alejandro Bia⁸ &
Krisztian Monostori⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2163))

Included in the following conference series:

International Conference on Theory and Practice of Digital Libraries

849 Accesses
5 Citations

Abstract

This article describes a joint research work between Monash University and the University of Alicante, where software originally meant for plagiarisman and copy detection in academic works is successfully applied to perform comparative analysis of different editions of literary works. The experiments were performed with Spanish texts from the Miguel de Cervantes digital library. The results have proved useful for literary and linguistic research, automating part of the tedious task of comparative text analysis. Besides, other interesting uses were detected.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bia, A. and Pedreño, A. (2001). The Miguel de Cervantes Digital Library: The Hispanic Voice on the WEB. LLC (Literary and Linguistic Computing) journal, Oxford University Press, 16(2): 161–177. Presented at ALLC/ACH 2000, The Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities, 21/25 July 2000, University of Glasgow.
Article Google Scholar
Broder, A., Glassman, S., and Manasse, M. Syntatic Clustering of the Web. In Sixth International Web Conference, Santa Clara, California, USA. URL: http://decweb.ethz.ch/WWW6/Technical/Paper205/paper205.html.
Chang, W. and Lawler, E. (1994). Sublinear Approximate String Matching and Biological Applications. Algorithmica, 12:327–344.
Article MATH MathSciNet Google Scholar
Garcia-Molina, H. and Shivakumar, N. (1995a). SCAM: A Copy Detection Mechanismfor Digital Documents. In Proceedings of 2nd International Conference in Theory and Practice of Digital Libraries (DL’95), Austin, Texas.
Google Scholar
Garcia-Molina, H. and Shivakumar, N. (1995b). The SCAM Approach To Copy Detection in Digital Libraries. D-lib Magazine.
Google Scholar
Gusfield, D. (1997). Algorithms on Strings, Trees, and Sequences. Cambridge University Press.
Google Scholar
Heintze, N. (1996). Scalable Document Fingerprinting. In Proceedings of the Second USENIX Workshop on Electronic Commerce, Oakland, California. URL: http://www.cs.cmu.edu/afs/cs/user/nch/www/koala/main.html.
Manber, U. (1994). Finding similar Files in a Large File System. In Proceedings of the 1994 USENIX Conference, pages 1–10. URL: http://www.cs.cmu.edu/afs/cs/user/nch/www/koala/main.html.
Monostori, K., Zaslavsky, A., and Schmidt, H. (1999). Parallel Overlap and Similarity Detection in Semi-Structured Document Collections. In 6th Annual Australasian Conference on Parallel And Real-Time Systems (PART’ 99).
Google Scholar
Navarro, G., Baeza-Yates, R., and Ribeiro-Neto, B. (1999). Indexing and searching. In Modern Information Retrieval, chapter 8, pages 191–228. ACM press and Addison Wesley, Edinburgh Gate, Harlow, Essex CM20 2JE, England, 1st edition. See also http://www.dcc.ufmg.br/irbook or http://sunsite.dcc.uchile.cl/irbook.
Google Scholar
Ukkonen, E. (1995). On-Line Construction of Suffix Trees. Algorithmica, 14:249–260.
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Monash University, Melbourne, Australia
Arkady Zaslavsky
Miguel de Cervantes DL, University of Alicante, Alicante, Spain
Alejandro Bia
Monash University, Melbourne, Australia
Krisztian Monostori

Authors

Arkady Zaslavsky
View author publications
You can also search for this author in PubMed Google Scholar
Alejandro Bia
View author publications
You can also search for this author in PubMed Google Scholar
Krisztian Monostori
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Crete, Leof. Knossou, P.O. Box 1470, 71409, Heraklion, Greece
Panos Constantopoulos
Foundation for Research and Technology - Hellas, Institute of Computer Science, Vassilika Vouton, P.O. Box 1385, 71110, Heraklion, Greece
Panos Constantopoulos
Department of Computer and Information Science, The Norwegian University of Science and Technology, 7491, Trondheim, Norway
Ingeborg T. Sølvberg

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zaslavsky, A., Bia, A., Monostori, K. (2001). Using Copy-Detection and Text Comparison Algorithms for Cross-Referencing Multiple Editions of Literary Works. In: Constantopoulos, P., Sølvberg, I.T. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2001. Lecture Notes in Computer Science, vol 2163. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44796-2_10

Download citation

DOI: https://doi.org/10.1007/3-540-44796-2_10
Published: 30 August 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42537-3
Online ISBN: 978-3-540-44796-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics