More on graph theoretic software watermarks: Implementation, analysis, and attacks

https://doi.org/10.1016/j.infsof.2008.09.016Get rights and content

Abstract

This paper presents an implementation of the watermarking method proposed by Venkatesan et al. in their paper [R. Venkatesan, V. Vazirani, S. Sinha, A graph theoretic approach to software watermarking, in: Fourth International Information Hiding Workshop, Pittsburgh, PA, 2001]. An executable program is marked by the addition of code for which the topology of the control-flow graph encodes a watermark. We discuss issues that were identified during construction of an actual implementation that operates on Java bytecode. We present two algorithms for splitting a watermark number into a redundant set of pieces and an algorithm for turning a watermark number into a control-flow graph. We measure the size and time overhead of watermarking, and evaluate the algorithm against a variety of attacks.

Introduction

This paper builds upon and elaborates a software watermarking scheme proposed by Venkatesan et al. [3]. We will refer to that paper as VVS and to its watermarking scheme as GTW. The present paper contributes:

  • The first public implementation of GTW.

  • An implementation that operates on Java bytecode.

  • An example of an error-correcting graph encoding.

  • The generation of executable code from graphs.

  • Several alternatives for marking basic blocks.

  • Extraction (not just detection) of a watermark value.

  • Empirical measurements of an actual GTW implementation.

  • Experimental analysis of possible attacks.

Graph theoretic watermarking encodes a value in the topology of a control-flow graph, or CFG [4]. Each node of a CFG represents a basic block consisting of instructions with a single entry and a single exit. A directed edge connects two basic blocks if control can pass from one to the other during execution. The CFG itself also has a single entry and a single exit.

A watermark graph W is merged with a target program’s graph P by adding extra control-flow edges between them. Basic blocks belonging to W are marked to distinguish them from the nodes of P. These marks are later used to extract W from P+W during the recognition process. The GTW process is illustrated in Fig. 1.

The VVS paper hypothesizes that naïvely inserted watermark code is weakly connected to the original program and is therefore easily detected. Weakly connected graph components can be identified using standard graph algorithms and can then be manually inspected if they are few in number. Such inspection may reveal the watermark code at much lower cost than manual inspection of the full program.

The attack model of VVS considers an adversary who attempts to locate a cut between the watermark subgraph and the original CFG (dashed edges in Fig. 1). The GTW algorithm is designed to produce a strongly connected watermark so that such a cut cannot be identified. The VVS paper proves that such a separation is unlikely. More formally, the GTW algorithm adds edges between the program P and the watermark W in such a way that many other node divisions within P have the same size cut as the division between P and W.

We have implemented the GTW algorithm in the framework of SandMark [5], a tool for experimenting with algorithms that protect software from reverse engineering, piracy, and tampering. SandMark contains a large number of obfuscation and watermarking algorithms as well as tools for manual and automatic analysis and reverse engineering. SandMark operates on Java bytecode. It can be downloaded for experimentation from sandmark.cs.arizona.edu.

Our implementation of GTW, which we will call GTWSM, is the first publicly available implementation of the GTW algorithm. We have found that GTW can be implemented with minimal overhead, a high degree of stealthiness, and with relatively high bit-rate.

Error-correcting graph techniques make the algorithm resilient against edge-flip attacks, in which the basic blocks are reordered, but it remains vulnerable to a large number of other semantics-preserving code transformations. GTW’s crucial weakness is its reliance on the reliable recognition of marked basic blocks during watermark extraction. We are unaware of any block marking method that is invulnerable to simple attacks.

The remainder of this paper is organized as follows. Section 2 surveys related work. Section 3 presents an overview of our implementation, and Sections 4 Embedding, 5 Recognition describe the embedding and recognition algorithms in detail. Section 6 evaluates GTW with respect to resilience against attacks, bit-rate, and stealth. Section 7 discusses future work.

Section snippets

Related work

Davidson and Myhrvold [6] published the first software watermarking algorithm. A watermark is embedded by rearranging the order of the basic blocks in an executable. Like other order-based algorithms, this is easily defeated by a random reordering.

Qu and Potkonjak [7], [8] encode a watermark in a program’s register allocation. Like all algorithms based on renaming, this is very fragile. Watermarks typically do not survive even a decompilation/recompilation step. This algorithm also suffers from

An overview of GTWSM

Our implementation of GTW operates on Java bytecode. Choosing Java lets us leverage the tools of the SandMark and BCEL [13] libraries, and lets us attack the results using SandMark’s collection of obfuscators. Like every executable format, Java bytecode has some unique quirks, but the results should be generally applicable.

The GTW embedding algorithm takes as input application code P, watermark code W, secret keys ω1 and ω2, and integers m and n. GTWSM uses a smaller and simpler set of

Embedding

The construction of a watermark graph W is not discussed in VVS. In GTWSM we accept an integer value for transformation into a watermark CFG. The recognition process performs the inverse transformation from CFG to integer.

The embedding process involves several steps: splitting the watermark value into small integers; constructing directed graphs that encode these values; generating code that corresponds to the graphs; and connecting the code to the program. We present two distinct methods of

Recognition

The recognition process in VVS has three steps: detection of watermark nodes, sampling of subsets of the watermark nodes, and computation of robust properties of these subsets. The set of robust property values composes the watermark.

Evaluation

Most software watermarking research has focused on the discovery of novel embedding schemes. Little work has been done on their evaluation. A software watermarking algorithm can be evaluated using several criteria:

  • 1.

    Data rate:

    What is the ratio of size of the watermark that can be embedded to the size of the program?

  • 2.

    Embedding overhead:

    How much slower or larger is the watermarked application compared to the original?

  • 3.

    Resistance to detection (stealth):

    Does the watermarked program have statistical

Discussion and future work

Because our recognizer returns a specific watermark value, as opposed to just a success/failure flag, GTWSM can be used for fingerprinting. This is a technique where each copy of an application program is distributed with its own unique watermark value, allowing pirated copies to be traced back to a specific original.

Our implementation of the GTW watermarking system is fully functional and reasonably efficient. It is resilient against a small number of random program modifications, in

Summary

We have produced a working implementation of the graph theoretic watermark described by Venkatesan et al. [3]. The implementation is faithful to the paper within the constraints of Java bytecode, and includes necessary components that were left unspecified by the original paper. While the GTW design protects against detection, its fundamental dependence on static block marking leaves watermarked programs vulnerable to distortive attacks.

References (23)

  • C. Collberg, E. Carter, S. Debray, A. Huntwork, J. Kececioglu, C. Linn, M. Stepp, Dynamic path-based software...
  • C. Collberg, A. Huntwork, E. Carter, G. Townsend, Graph theoretic software watermarks: implementation, analysis, and...
  • R. Venkatesan, V. Vazirani, S. Sinha, A graph theoretic approach to software watermarking, in: Fourth International...
  • A.V. Aho et al.

    Compilers, Principles, Techniques, and Tools

    (1986)
  • C. Collberg, G. Myles, A. Huntwork, Sandmark – a tool for software protection research, IEEE Security and Privacy 1 (4)...
  • R.L. Davidson, N. Myhrvold, Method and system for generating and auditing a signature for a computer program, US Patent...
  • G. Qu, M. Potkonjak, Analysis of watermarking techniques for graph coloring problem, in: IEEE/ACM International...
  • G. Myles, C. Collberg, Software watermarking through register allocation: implementation, analysis, and attacks, in:...
  • J.P. Stern, G. Hachez, F. Koeune, J.-J. Quisquater, Robust object watermarking: application to code, in: Information...
  • G. Arboit, A method for watermarking Java programs via opaque predicates, in: The Fifth International Conference on...
  • C. Collberg, C. Thomborson, D. Low, Manufacturing cheap, resilient, and stealthy opaque constructs, in: Principles of...
  • Cited by (43)

    • On the resilience of canonical reducible permutation graphs

      2018, Discrete Applied Mathematics
      Citation Excerpt :

      Soon after the creation of the first software watermark in 1996 by Davidson and Myhrvold [8], many interesting ideas have followed, including encoding a binary–the identifier–as a special digraph embedded into the software’s control-flow graph, an idea which was patented by Venkatesan and Vazirani in 2006 [12]. Graph-based watermarking schemes have received a lot of attention ever since, and due emphasis must be given to the contributions of Collberg et al. in a series of papers [6,7,5]. More recently, Chroni and Nikolopoulos presented an ingenious such scheme [3,4], where the generated watermark graphs constitute a subclass of reducible flow graphs [9–11].

    • Abstract interpretation-based semantic framework for software birthmark

      2012, Computers and Security
      Citation Excerpt :

      Along with the rapid development of software industry and Internet, software piracy has become a major concern for many software companies and IT sectors (Collberg and Nagra, 2009). To detect and prevent software piracy, advanced techniques, such as software watermarking (Collberg and Sahoo, 2005; Collberg et al., 2009; Davidson and Myhrvold, 1996; Liu et al., 2006; Myles and Collberg, 2003; Qu and Poktonjak, 1998; Stern et al., 1999; Zeng et al., 2011), software fingerprinting (Collberg et al., 2007; Pieprzyk, 1999), and software birthmark (Choi et al., 2009; Lim et al., 2008, 2009; Lu et al., 2007; Myles and Collberg, 2004, 2005; Myles, 2006; Park et al., 2008a,b; 2011; Schuler et al., 2007; Tamada et al., 2003, 2004, 2005), have been proposed. Software watermarking and fingerprinting are used to dissuade illegal copying and resale of programs.

    • Research Progress of Neural Networks Watermarking Technology

      2021, Jisuanji Yanjiu yu Fazhan/Computer Research and Development
    View all citing articles on Scopus

    This paper extends material previously published in Refs. [1], [2].

    View full text