Bucket attack on numeric set watermarking model and safeguards

https://doi.org/10.1016/j.istr.2011.09.002Get rights and content

Abstract

Numeric set watermarking is a way to provide ownership proof for numerical data. Numerical data can be considered to be primitives for multimedia types such as images and videos since they are organized forms of numeric information. Thereby, the capability to watermark numerical data directly implies the capability to watermark multimedia objects and discourage information theft on social networking sites and the Internet in general. Unfortunately, there has been very limited research done in the field of numeric set watermarking due to underlying limitations in terms of number of items in the set and LSBs in each item available for watermarking. In 2009, Gupta et al. proposed a numeric set watermarking model that embeds watermark bits in the items of the set based on a hash value of the items’ most significant bits (MSBs). If an item is chosen for watermarking, a watermark bit is embedded in the least significant bits, and the replaced bit is inserted in the fractional value to provide reversibility. The authors show their scheme to be resilient against the traditional subset addition, deletion, and modification attacks as well as secondary watermarking attacks.

In this paper, we present a bucket attack on this watermarking model. The attack consists of creating buckets of items with the same MSBs and determine if the items of the bucket carry watermark bits. Experimental results show that the bucket attack is very strong and destroys the entire watermark with close to 100% success rate. We examine the inherent weaknesses in the watermarking model of Gupta et al. that leave it vulnerable to the bucket attack and propose potential safeguards that can provide resilience against this attack.

Categories and subject descriptors

[intellectual-property protection]

Introduction

The world has witnessed increasingly high volumes of Internet usage during the last decade. More recently, there has been a sharp improvement in the Internet speed, enabling online sharing of large data files, in excess of 100’s of Mega Bytes. At the same time, illegitimate transfer has also become more widespread with peer-to-peer networking and anonymous file-hosting sites such as RapidShare and Megaupload providing pirated copies of copyright digital information.

There are several media types that are more vulnerable and/or lucrative for Internet pirates to download illegally. Common targets include movies, audio, and e-books. However, databases, software and market figures such as currency exchange rate forecasts are also extremely useful and therefore lucrative for downloading and distributing illegitimately. One of the ways of fighting illegal digital content distribution is digital watermarking.

Digital watermarking (hereafter, watermarking) is the process of embedding owner-identifying information inside digital content. Early research in watermarking was started in late 1990s in the area of image watermarking (Braudaway, 1997; Cox et al., 1996, 1997). The features of a strong watermarking scheme are:

  • 1.

    Provability: The owner of the content should be able to prove existence of the watermark in pirated content beyond reasonable doubt in front of an unbiased judge.

  • 2.

    High resilience: The watermark should survive attacks that apply limited distortion to the watermarked content. This is equivalent to saying that condition (1) should be satisfied even after the watermarked content has been tampered with by an adversary. It is assumed that the adversary is not willing to substantially degrade the data quality, but may accept a some loss of data quality in their attempts to damage the watermark.

  • 3.

    Blindness: Unmarked content should not be required to satisfy condition (1).

  • 4.

    Low false positives: Content should not be determined to contain a watermark that, in reality, it does not contain.

  • 5.

    High capacity: A sufficient number of watermark bits should be embeddable in the content. This, in turn, facilitates conditions (2) and (4).

Relational databases are composed of tables where each table is a set of tuples, each tuple in turn being a set of attributes. A single attribute or set of attributes is identified as the Primary Key and is required to be unique for each tuple. Watermarking of relational databases can exploit the Primary Key to uniquely identify each tuple and to select the attribute of the tuple that should carry the watermark bit. The assumption is made that the attacker cannot modify the Primary Key without substantially degrading the data quality and/or causing inconsistencies in the data (Agrawal and Kiernan, 2002; Agrawal et al., 2003; Gross-Amblard, 2003; Boney et al., 1996).

Numerical sets differ from relational databases in that they are composed of individual items with no tuple structure and no Primary Key. Numerical sets may consists of ordered or unordered data values. For example, the sequence of exchange rates between two currencies over a year may be presented without dates as an ordered numerical set. Watermarking of numerical sets has not received much attention, with only two research papers in this area known to the authors (Gupta et al., 2009; Sion et al., 2002). There are several issues that arise while watermarking numeric sets that make them difficult to watermark compared to relation databases.

  • Unlike relational database watermarking, where a tuple’s primary key provides the location information of a watermark bit, there is no such “marker” in numeric set watermarking.

  • Relational database watermarking research generally makes the assumption that the primary key cannot be changed otherwise it might render the relation useless. This facility is not available for numeric set watermarking.

  • If the size of the numeric set is small, they may be insufficient items to embed the watermark. On the other hand, if the numeric set contains a large number of items, there may be duplicated items, or items that have the same integer component of value. It is assumed that the fractional value does not contain significant data.

In Sion et al. (2002), the watermark is encoded in the distribution of the numeric set. The definition of normalization is confusing in their work. Citing Sion et al. (2002), “… preliminary normalization step in which a common divider to all the items is first identified and divided by.”. However, it is unrealistic to expect items from a numeric set of substantial cardinality to have a common divisor. The major problems with this watermarking scheme, however, as discussed in Gupta et al. (2009), are as follows,

  • Ability to watermark only the sets with near normal-distribution.

  • The primitive watermarking step resulting in distorting the confidence violator υc(Si) used during the detection phase

To solve these problems, Gupta et al. (2009) proposed a watermarking model for numerical data where each numeric item is treated independently, based on a hash value on its most significant bits. If the item is chosen for watermarking, based on the hash value, a watermark bit is embedded in the least significant bits, and the replaced bit is inserted in the fractional value. The authors show their scheme to be resilient against subset addition, deletion, and modification attacks as well as secondary watermarking attacks. In this paper, we present an attack on this scheme, called the bucket attack, so called because the attack consists of creating buckets of items with same MSBs and identifying as possible watermark locations those bit positions where all items in a bucket have the same bit value. We also present empirical results that show the effectiveness of this attack, and modifications to the existing watermarking scheme that protect against the bucket attack.

Section snippets

Organization of paper

Section 3 provides notation used throughout the paper. In Section 4, we describe the Gupta et al. Numerical Watermarking scheme (GNW). We analyze GNW to indicate the problems with the underlying model and present a simple yet strong attack in Section 4.2 with experimental results provided in Section 5. We conclude the paper in Section 7 with a note on open problem in the area of numeric set watermarking.

Notations

We use the following notation throughout this paper.

  • Set S={s1,,sN}R is the numeric set to be watermarked.

  • MSBa(b) represents the a most significant bits of item bR

  • LSBa(b) represents the a least significant bits of the integer component of item bR

  • abs(x) represents the absolute value of xR

  • H(a) represents a one-way hash function H(.) on a

  • |S| represents the number of items in set S

  • ○ represents concatenation

  • [a] is the floor function representing the integer value of aR

Description

GNW proceeds with the watermark insertion by embedding a watermark bit in one of every γ items. This can be done by checking if γ divides λ, where λ is a one way hash on a concatenation of MSBϕ(si) and a secret key κ, shown as follows (where ɸ is the number of MSBs used to determine if a particular item should be watermarked or not):λ=H(MSBϕ(si)K)

It is assumed that there are ξ LSBs that can be modified without substantially reducing the data’s utility. The value of ξ is determined by the

Experimental results

We implemented our scheme in C++ (Dev-C++ IDE) under the Windows Operating System on a 2.4 GHz Intel Processor with 2 GB RAM. To generate hash functions, the General Purpose Hash Function Algorithms Library by Arash Partow was used. For details about this library, please access http://www.partow.net/. We used the RSHash(string) function in our project.

We tested 20,000 numerical sets ranging from 10 to 500 items and with γ values as {5, 10, 20, 40}. We achieved an overall attack success rate of

Analysis of the attack

To devise a solution that provides resilience against the bucket attack, we revisit the important points of the watermarking scheme. An item siS is represented in Fig. 1 such that si = B + C where B and C represent the integer and fraction part respectively. Let the binary representations of B be {b0,,bn1} and C be {c0,,cm1}.

Further, the integral part of the item, B, has two components, MSB={b0,,bϕ1} and LSB={bϕ,,bn1}. We are not concerned with the fractional part C since it is not

Conclusion

In this paper, we have shown a highly effective attack on the GNW numeric set watermarking model which was an attempt to adapt a relational database watermarking model to suit numeric set watermarking. The attack is based on the vulnerability introduced during watermark insertion where the most significant bits (MSBs) of the numeric items are used as a substitute for a primary key. Since numeric sets do not have a primary key, the only realistic way to uniquely identify items to be marked and

References (9)

  • R. Agrawal et al.

    Watermarking relational databases

  • Rakesh Agrawal et al.

    Watermarking relational data: framework, algorithms and analysis

    The VLDB Journal

    (2003)
  • Laurence Boney et al.

    Digital watermarks for audio signals

  • Gordon W. Braudaway

    Protecting publicly-available images with an invisible image watermark

There are more references available in the full text version of this article.

Cited by (0)

View full text