Elsevier

Digital Signal Processing

Volume 25, February 2014, Pages 173-189
Digital Signal Processing

DNA sequence watermarking based on random circular angle

https://doi.org/10.1016/j.dsp.2013.11.010Get rights and content

Highlights

  • DNA watermarking is for copyright protection and authentication of a DNA sequence.

  • We address the random circular-angle based watermarking for the coding DNA sequence.

  • Our method satisfies amino acid preservation, mutation resistance, and security.

  • Embeddable codons are selected by random mapping table and singularity detection.

  • The watermark changes random circular angles of embeddable codons.

Abstract

This paper discusses DNA watermarking for copyright protection and authentication of a DNA sequence. We then propose a DNA watermarking method that confers mutation resistance, amino acid residue conservation, and watermark security. Our method allocates codons to random circular angles using a random mapping table and selects a number of codons for embedding targets using the Lipschitz regularity that is measured from the evolution across scales of local modulus maxima of codon circular angles. We then embed the watermark into random circular angles of codons without changing the amino acid residue. The length and location of target codons depend on the random mapping table and the singularity of detection of Lipschitz regularity. This table is used as the watermark key and can be applied to any codon sequence regardless of sequence length. Without knowledge of this table, it is very difficult to detect the length and location of sequences for extracting the watermark. From experimental results on the suitability of similar watermark capacities, we verified that our method has a lower bit-rate error for point mutations compared with previous methods. Further, we established that the entropies of the random mapping table and the location of target codons are high, indicating that the watermark is secure.

Introduction

The genetic code contains profound personal information. It may be considered as a personal diary, and its unauthorized disclosure may be considered as a grave invasion of privacy and a violation of human rights. Legal measures have been established to ensure the safety and security of procedures for collecting human genetic information (HGI) [1], [2], [3]. There are laws of ethics or guidelines for HGI use, but security techniques for preventing illegal copying and piracy of HGI are urgently required. DNA is considered as a new biometric medium for storing extraordinarily large amounts of data. Thus, DNA storage demands that DNA security techniques are addressed. Recent research on DNA security includes DNA cryptography [4], [5], [6], [7], steganography [8], [9], [10], [11], [12], [13], [14], [15], [16], and watermarking [17], [18], [19], [20], [21], [22], [23], [24], [25] using DNA sequences with a character stream of A, G, T (or U), and C. These studies [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25] were validated by in vivo, in vitro, or in silico experiments.

DNA cryptography [4], [5], [6], [7] is a technique for biological encryption and decryption based on the polymerase chain reaction (PCR) or DNA chips and has been recognized as a new biological encryption technique for potential widespread use in the future. However, it has not yet begun to replace the conventional encryption algorithms because of the difficulty in its implementation. DNA steganography [8], [9], [10], [11], [12], [13], [14], [15], [16] is the technique for hiding messages in DNA sequences and is useful for DNA signature/identification and DNA storage of vast quantities of information. However, the purposes of DNA cryptography and steganography are not to recover DNA sequences or messages under changing experimental conditions or from mutations, and they are therefore not suitable as applications for copyright protection.

DNA watermarking [17], [18], [19], [20], [21], [22], [23], [24], [25] is a technique for protecting the information within a DNA sequence. DNA-based watermarks can be applied to copyright-protect DNA sequences as well as for discriminating between wild-type and artificial genomes [25]. Recently, J. Craig Venterʼs research team inserted a watermark at intergenic sites known to tolerate transposon insertions to identify the genome of Mycoplasma genitalium JCVI-1.0 as synthetic [26], [27]. Jupiter et al. [28], [29] presented the strategy of watermark implementation for tracking select or infectious agents. They suggested five features for implementation strategy as follows: message fidelity, error tolerance, easy interpretation, uniqueness, and resistance. They compared and analyzed the features of DNA and multimedia watermarking. Multimedia watermarking schemes for audio, image, video, and 3D model are mostly processed in the frequency domain of the discrete cosine transform (DCT) [30], [31], discrete wavelet transform (DWT) [32], [33], [34], and scale-invariant feature transform (SIFT) [35], as well as the geometric domain [36], [37]. However, DNA-based watermark must be embedded without changing the function of coding regions, which represents the main difference between DNA and multimedia watermarking.

The genome contains all of an organismʼs hereditary information, and it includes coding sequences (genes), which are translated into polypeptide chains (proteins), representing, for example, approximately 1.5% of the human genome. Sequences that do not encode proteins may be transcribed into non-coding RNAs such as micro-RNAs, which may code for RNAs that regulate gene function. The genome also includes pseudogenes, which are mutated remnants of genes that are unable to encode a functional product. The majority of the human genome comprises non-coding repeated sequences. DNA steganography or watermarking methods have been designed differently depending on whether the information is embedded in non-coding [17], [18], [19], [20] or coding DNA [21], [22], [23], [24].

Non-coding DNA-based methods typically assume that non-coding DNA does not change the phenotype of an organism. Based on this assumption, any pirate could substitute an arbitrary sequence or dummy sequence for a non-coding DNA sequence while including embedded sequences without changing phenotype. However, many types of non-coding DNA sequences have known biological functions, including the transcriptional and translational regulation of protein-coding sequences. Non-transcribed DNA sequences may contribute to chromosomal properties or possess functions that are yet to be discovered. Therefore, it is not yet been appropriate to manipulate non-coding DNA for steganography and watermarking.

Coding DNA is translated to a polypeptide chain, so the watermark embedded in coding DNA sequences should preserve the protein profile of an organism. This is a prerequisite and a limitation of coding DNA-based methods. We refer to this limit here as amino acid residue (amino acid) conservation. Amino acid conservation is problematic when designing DNA watermarking, in contrast to image and video watermarking.

Most conventional DNA watermarking methods focus on the embedding process through simple substitutions or bit allocations to a base or codon as determined by the genetic code, and focus has been placed on watermarked gene in in vivo experiments. Therefore, it is necessary to analyze the resistance to phenotypic change, amino acid sequence conservation, as well as security for signal processing when designing a DNA watermarking method that satisfies the above requirements.

Here we present a coding DNA watermarking method for copyright protection of a DNA sequence that provides mutation resistance, watermark security, and amino acid sequence conservation. We analyze the performance of coding DNA watermarking using in silico experiments. The main features of our method are as follows: First, we map codons to numerical values of random circular angles using a random mapping table for security and ease of signal processing. The random mapping table for 64 codons, which includes start and stop codons, is used as the watermark key. Coding sequences of various lengths can be mapped to random circular angles using any random mapping table. Second, we select a number of target codons for embedding using the Lipschitz regularity of local modulus maxima at multi-resolution scales. The local modulus maxima of random circular angles depend on the random mapping table. The length and location of target codons depend on Lipschitz values of local modulus maxima. It is very difficult to detect locations of embedded codons without the knowledge of random mapping table. Third, we embed repeatedly a binary watermark into random circular angles of codons to confer mutation resistance. An angle of a center codon and the distance between two angles of neighboring codons are changed by a bit of a watermark without changing the encoded amino acid. Finally, random circular angles are based on circular coding, which makes the numerical transformation of DNA symbols easier and allows estimation of symbol errors in arbitrary positions. Moreover, it allows the allocation of synonymous codons to neighboring numerical values.

The performance of our in silico experiments verified that our method ensures amino acid sequence conservation and is more resist to point mutations compared with DNA-Crypt watermarking [21] and the Liss method [24]. We investigated the capacity based on the analysis by Balado et al. [38], [39], [40]. We computed the entropy of the random mapping table and random positions of codons for analyzing security and confirmed that the entropies were high.

This paper is organized as follows: In Section 2, we explain the structure of DNA sequences and the genetic code and analyze conventional watermarking methods. We present the proposed DNA watermarking method in Section 3 before analyzing its performance using in silico experiments in Section 4. Finally, are conclusions are presented in Section 5.

Section snippets

The genetic code and DNA watermarking

In this section, we examine the genetic code [41], [42], [43] and then analyze the requirements of DNA watermarking for copyright protection of a DNA sequence.

Proposed coding DNA watermarking

This paper presents a watermarking method for DNA sequences with the features as follows: 1) Embed the binary watermark into coding DNA sequences. Non-coding DNA sequences are included in regulatory regions such as promoters or other regulatory motifs, so coding DNA sequence are suitable for use as the embedding target. 2) Ensure amino acid conservation and resist small-scale mutations such as point mutations. 3) Use a fixed-length watermark key regardless of the sequence length and ensure

Experimental results

Our in silico experiment used coding DNA sequences from the National Center for Bioinformatics information database (Table 3). We analyzed the data capacity of our, Heider et al. [20], [21] and Liss et al. methods [24], and adjusted the watermark lengths to be similar. We then evaluated their mutation resistance. Further, we analyzed the security of our method based on differential entropy. We used a WDH(n) code for error correction from the DNA-Crypt method of Heider et al. However, because

Conclusions

The main requirements of codon DNA watermarking are amino acid conservation, mutation resistance, and security. Here we present a codon DNA watermarking method that satisfies these requirements and was evaluated in silico. The main features of the proposed method are the numerical mapping to random circular angles based on random mapping table, searching for codons using the local modulus maxima according to the Lipschitz regularity, and watermark embedding to ensure amino acid conservation.

Acknowledgements

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (NRF-2011-0023118) and the Busan Metropolitan City, Korea, under the 2013 Brain Busan 21 program grants.

Suk-Hwan Lee received the B.S., M.S., and Ph.D. degrees in Electrical Engineering from Kyungpook National University, Korea in 1999, 2001, and 2004 respectively. He worked at Electronics and Telecommunications Research Institute in 2005. He is currently an Associate Professor in the Department of Information Security at Tongmyong University, which he started in 2005. He works as an Editor of Korea Multimedia Society Journal, is a member of IEEE, IEEK, IEICE and also is an officer of IEEE R10

References (53)

  • National Conference of State Legislatures

    Genetic privacy laws, genetic information: legal issues relating to discrimination and privacy

  • M. Yamamoto et al.

    Large-scale DNA memory based on the nested PCR

    Nat. Comput.

    (September 2008)
  • A. Gehani et al.

    DNA-based cryptography

  • B. Anam et al.

    Review on the advancements of DNA cryptography

  • C.T. Clelland et al.

    Hiding messages in DNA microdots

    Nature

    (June 1999)
  • V.I. Risca

    DNA-based steganography

    Cryptologia

    (2001)
  • B. Shimanovsky et al.

    Hiding data in DNA

  • M. Arita

    Writing information into DNA

  • M. Arita et al.

    Secret signatures inside genomic DNA

    Biotechnol. Prog.

    (2004)
  • G.C. Smith et al.

    Some possible codes for encrypting data in DNA

    Biotechnol. Lett.

    (July 2003)
  • P.C. Wong et al.

    Organic data memory using the DNA approach

    Commun. ACM

    (January 2003)
  • N. Yachie et al.

    Alignment-based approach for durable data storage into living organisms

    Biotechnol. Prog.

    (March–April 2007)
  • N. Yachie et al.

    Stabilizing synthetic data in the DNA of living organisms

    Syst. Synth. Biol.

    (June 2008)
  • D. Heider et al.

    DNA watermarks in non-coding regulatory sequences

    BMC Res. Notes

    (2009)
  • D. Heider et al.

    DNA-based watermarks using the DNA-Crypt algorithm

    BMC Bioinform.

    (2007)
  • D. Heider et al.

    DNA watermarks – a proof of concept

    BMC Mol. Biol.

    (2008)
  • Cited by (8)

    • Image watermarking using chaotic map and DNA coding

      2015, Optik
      Citation Excerpt :

      Recently, Lee presented a coding DNA watermarking method in a lifting-based DWT domain that focused on the feasibility of frequency domain watermarking for DNA sequences [28]. Then he proposed a DNA watermarking method that allocated codons to random circular angles using a random mapping table and selected a number of codons for embedding targets using the Lipschitz regularity [29]. In the DNA-based encryption, Smith et al. discussed three codes for encrypting data in DNA and shown the Huffman code would be more useful than the comma code and the alternating code [14].

    • DNA Watermarking using codon postfix technique

      2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics
    View all citing articles on Scopus

    Suk-Hwan Lee received the B.S., M.S., and Ph.D. degrees in Electrical Engineering from Kyungpook National University, Korea in 1999, 2001, and 2004 respectively. He worked at Electronics and Telecommunications Research Institute in 2005. He is currently an Associate Professor in the Department of Information Security at Tongmyong University, which he started in 2005. He works as an Editor of Korea Multimedia Society Journal, is a member of IEEE, IEEK, IEICE and also is an officer of IEEE R10 Changwon section. His research interests include multimedia signal processing, multimedia security, digital signal processing, bio security, and computer graphics.

    View full text