Skip to main content

Using Copy-Detection and Text Comparison Algorithms for Cross-Referencing Multiple Editions of Literary Works

  • Conference paper
  • First Online:
Book cover Research and Advanced Technology for Digital Libraries (ECDL 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2163))

Included in the following conference series:

Abstract

This article describes a joint research work between Monash University and the University of Alicante, where software originally meant for plagiarisman and copy detection in academic works is successfully applied to perform comparative analysis of different editions of literary works. The experiments were performed with Spanish texts from the Miguel de Cervantes digital library. The results have proved useful for literary and linguistic research, automating part of the tedious task of comparative text analysis. Besides, other interesting uses were detected.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bia, A. and Pedreño, A. (2001). The Miguel de Cervantes Digital Library: The Hispanic Voice on the WEB. LLC (Literary and Linguistic Computing) journal, Oxford University Press, 16(2): 161–177. Presented at ALLC/ACH 2000, The Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities, 21/25 July 2000, University of Glasgow.

    Article  Google Scholar 

  2. Broder, A., Glassman, S., and Manasse, M. Syntatic Clustering of the Web. In Sixth International Web Conference, Santa Clara, California, USA. URL: http://decweb.ethz.ch/WWW6/Technical/Paper205/paper205.html.

  3. Chang, W. and Lawler, E. (1994). Sublinear Approximate String Matching and Biological Applications. Algorithmica, 12:327–344.

    Article  MATH  MathSciNet  Google Scholar 

  4. Garcia-Molina, H. and Shivakumar, N. (1995a). SCAM: A Copy Detection Mechanismfor Digital Documents. In Proceedings of 2nd International Conference in Theory and Practice of Digital Libraries (DL’95), Austin, Texas.

    Google Scholar 

  5. Garcia-Molina, H. and Shivakumar, N. (1995b). The SCAM Approach To Copy Detection in Digital Libraries. D-lib Magazine.

    Google Scholar 

  6. Gusfield, D. (1997). Algorithms on Strings, Trees, and Sequences. Cambridge University Press.

    Google Scholar 

  7. Heintze, N. (1996). Scalable Document Fingerprinting. In Proceedings of the Second USENIX Workshop on Electronic Commerce, Oakland, California. URL: http://www.cs.cmu.edu/afs/cs/user/nch/www/koala/main.html.

  8. Manber, U. (1994). Finding similar Files in a Large File System. In Proceedings of the 1994 USENIX Conference, pages 1–10. URL: http://www.cs.cmu.edu/afs/cs/user/nch/www/koala/main.html.

  9. Monostori, K., Zaslavsky, A., and Schmidt, H. (1999). Parallel Overlap and Similarity Detection in Semi-Structured Document Collections. In 6th Annual Australasian Conference on Parallel And Real-Time Systems (PART’ 99).

    Google Scholar 

  10. Navarro, G., Baeza-Yates, R., and Ribeiro-Neto, B. (1999). Indexing and searching. In Modern Information Retrieval, chapter 8, pages 191–228. ACM press and Addison Wesley, Edinburgh Gate, Harlow, Essex CM20 2JE, England, 1st edition. See also http://www.dcc.ufmg.br/irbook or http://sunsite.dcc.uchile.cl/irbook.

    Google Scholar 

  11. Ukkonen, E. (1995). On-Line Construction of Suffix Trees. Algorithmica, 14:249–260.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zaslavsky, A., Bia, A., Monostori, K. (2001). Using Copy-Detection and Text Comparison Algorithms for Cross-Referencing Multiple Editions of Literary Works. In: Constantopoulos, P., Sølvberg, I.T. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2001. Lecture Notes in Computer Science, vol 2163. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44796-2_10

Download citation

  • DOI: https://doi.org/10.1007/3-540-44796-2_10

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42537-3

  • Online ISBN: 978-3-540-44796-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics