Skip to main content
Log in

A fast method for determining the origins of documents based on LZW compression

  • Published:
International Journal on Digital Libraries Aims and scope Submit manuscript

Abstract.

The move to publish documents electronically has several significant advantages to publishers and to consumers. These include the elimination of printing costs, paper costs, warehousing and transport of material, and the lag between release and delivery to the customer. There are also inherent dangers in electronic publishing as an unlimited number of perfect reproductions of the original can be made and distributed, thus depriving the publisher and author of revenues. While prevention of copying is preferred, it seems to be impractical when documents appear in digital form. In this paper we describe a method for digitally fingerprinting documents so that the publisher can distribute a unique copy to each customer. When a suspected illegal copy of a document is found, the publisher can determine which user’s copy was used. As long as the illegal copy is identical to the one of the originals, this is a straightforward process of comparison. A more serious problem arises when the attacker tries to hide the identity of the original by distorting the document (by changing segments, adding or deleting segments, etc.). In this situation, straightforward comparison may not be effective. In this case, we may want to find the closest original document to the illegal copy or determine whether a document is largely based on another document. We describe a method based on comparing the dictionaries generated by the LZW compression algorithm. This method allows for very rapid comparison of documents in the presence of changes made to prevent detection (distortion). While the primary application was for text documents, similar techniques can be applied to software and to images.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Author information

Authors and Affiliations

Authors

Additional information

Published online: 25 July 2001

Rights and permissions

Reprints and permissions

About this article

Cite this article

Agnew, G., Sivanandan, A. A fast method for determining the origins of documents based on LZW compression. Int J Digit Libr 3, 297–301 (2002). https://doi.org/10.1007/s007990100043

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s007990100043

Navigation