Abstract.
The move to publish documents electronically has several significant advantages to publishers and to consumers. These include the elimination of printing costs, paper costs, warehousing and transport of material, and the lag between release and delivery to the customer. There are also inherent dangers in electronic publishing as an unlimited number of perfect reproductions of the original can be made and distributed, thus depriving the publisher and author of revenues. While prevention of copying is preferred, it seems to be impractical when documents appear in digital form. In this paper we describe a method for digitally fingerprinting documents so that the publisher can distribute a unique copy to each customer. When a suspected illegal copy of a document is found, the publisher can determine which user’s copy was used. As long as the illegal copy is identical to the one of the originals, this is a straightforward process of comparison. A more serious problem arises when the attacker tries to hide the identity of the original by distorting the document (by changing segments, adding or deleting segments, etc.). In this situation, straightforward comparison may not be effective. In this case, we may want to find the closest original document to the illegal copy or determine whether a document is largely based on another document. We describe a method based on comparing the dictionaries generated by the LZW compression algorithm. This method allows for very rapid comparison of documents in the presence of changes made to prevent detection (distortion). While the primary application was for text documents, similar techniques can be applied to software and to images.
Similar content being viewed by others
Author information
Authors and Affiliations
Additional information
Published online: 25 July 2001
Rights and permissions
About this article
Cite this article
Agnew, G., Sivanandan, A. A fast method for determining the origins of documents based on LZW compression. Int J Digit Libr 3, 297–301 (2002). https://doi.org/10.1007/s007990100043
Issue Date:
DOI: https://doi.org/10.1007/s007990100043