DNA sequence watermarking based on random circular angle

doi:10.1016/j.dsp.2013.11.010

Digital Signal Processing

Volume 25, February 2014, Pages 173-189

https://doi.org/10.1016/j.dsp.2013.11.010 Get rights and content

Highlights

•
DNA watermarking is for copyright protection and authentication of a DNA sequence.
•
We address the random circular-angle based watermarking for the coding DNA sequence.
•
Our method satisfies amino acid preservation, mutation resistance, and security.
•
Embeddable codons are selected by random mapping table and singularity detection.
•
The watermark changes random circular angles of embeddable codons.

Abstract

This paper discusses DNA watermarking for copyright protection and authentication of a DNA sequence. We then propose a DNA watermarking method that confers mutation resistance, amino acid residue conservation, and watermark security. Our method allocates codons to random circular angles using a random mapping table and selects a number of codons for embedding targets using the Lipschitz regularity that is measured from the evolution across scales of local modulus maxima of codon circular angles. We then embed the watermark into random circular angles of codons without changing the amino acid residue. The length and location of target codons depend on the random mapping table and the singularity of detection of Lipschitz regularity. This table is used as the watermark key and can be applied to any codon sequence regardless of sequence length. Without knowledge of this table, it is very difficult to detect the length and location of sequences for extracting the watermark. From experimental results on the suitability of similar watermark capacities, we verified that our method has a lower bit-rate error for point mutations compared with previous methods. Further, we established that the entropies of the random mapping table and the location of target codons are high, indicating that the watermark is secure.

Introduction

The genetic code contains profound personal information. It may be considered as a personal diary, and its unauthorized disclosure may be considered as a grave invasion of privacy and a violation of human rights. Legal measures have been established to ensure the safety and security of procedures for collecting human genetic information (HGI) [1], [2], [3]. There are laws of ethics or guidelines for HGI use, but security techniques for preventing illegal copying and piracy of HGI are urgently required. DNA is considered as a new biometric medium for storing extraordinarily large amounts of data. Thus, DNA storage demands that DNA security techniques are addressed. Recent research on DNA security includes DNA cryptography [4], [5], [6], [7], steganography [8], [9], [10], [11], [12], [13], [14], [15], [16], and watermarking [17], [18], [19], [20], [21], [22], [23], [24], [25] using DNA sequences with a character stream of A, G, T (or U), and C. These studies [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25] were validated by in vivo, in vitro, or in silico experiments.

DNA cryptography [4], [5], [6], [7] is a technique for biological encryption and decryption based on the polymerase chain reaction (PCR) or DNA chips and has been recognized as a new biological encryption technique for potential widespread use in the future. However, it has not yet begun to replace the conventional encryption algorithms because of the difficulty in its implementation. DNA steganography [8], [9], [10], [11], [12], [13], [14], [15], [16] is the technique for hiding messages in DNA sequences and is useful for DNA signature/identification and DNA storage of vast quantities of information. However, the purposes of DNA cryptography and steganography are not to recover DNA sequences or messages under changing experimental conditions or from mutations, and they are therefore not suitable as applications for copyright protection.

DNA watermarking [17], [18], [19], [20], [21], [22], [23], [24], [25] is a technique for protecting the information within a DNA sequence. DNA-based watermarks can be applied to copyright-protect DNA sequences as well as for discriminating between wild-type and artificial genomes [25]. Recently, J. Craig Venterʼs research team inserted a watermark at intergenic sites known to tolerate transposon insertions to identify the genome of Mycoplasma genitalium JCVI-1.0 as synthetic [26], [27]. Jupiter et al. [28], [29] presented the strategy of watermark implementation for tracking select or infectious agents. They suggested five features for implementation strategy as follows: message fidelity, error tolerance, easy interpretation, uniqueness, and resistance. They compared and analyzed the features of DNA and multimedia watermarking. Multimedia watermarking schemes for audio, image, video, and 3D model are mostly processed in the frequency domain of the discrete cosine transform (DCT) [30], [31], discrete wavelet transform (DWT) [32], [33], [34], and scale-invariant feature transform (SIFT) [35], as well as the geometric domain [36], [37]. However, DNA-based watermark must be embedded without changing the function of coding regions, which represents the main difference between DNA and multimedia watermarking.

The genome contains all of an organismʼs hereditary information, and it includes coding sequences (genes), which are translated into polypeptide chains (proteins), representing, for example, approximately 1.5% of the human genome. Sequences that do not encode proteins may be transcribed into non-coding RNAs such as micro-RNAs, which may code for RNAs that regulate gene function. The genome also includes pseudogenes, which are mutated remnants of genes that are unable to encode a functional product. The majority of the human genome comprises non-coding repeated sequences. DNA steganography or watermarking methods have been designed differently depending on whether the information is embedded in non-coding [17], [18], [19], [20] or coding DNA [21], [22], [23], [24].

Non-coding DNA-based methods typically assume that non-coding DNA does not change the phenotype of an organism. Based on this assumption, any pirate could substitute an arbitrary sequence or dummy sequence for a non-coding DNA sequence while including embedded sequences without changing phenotype. However, many types of non-coding DNA sequences have known biological functions, including the transcriptional and translational regulation of protein-coding sequences. Non-transcribed DNA sequences may contribute to chromosomal properties or possess functions that are yet to be discovered. Therefore, it is not yet been appropriate to manipulate non-coding DNA for steganography and watermarking.

Coding DNA is translated to a polypeptide chain, so the watermark embedded in coding DNA sequences should preserve the protein profile of an organism. This is a prerequisite and a limitation of coding DNA-based methods. We refer to this limit here as amino acid residue (amino acid) conservation. Amino acid conservation is problematic when designing DNA watermarking, in contrast to image and video watermarking.

Most conventional DNA watermarking methods focus on the embedding process through simple substitutions or bit allocations to a base or codon as determined by the genetic code, and focus has been placed on watermarked gene in in vivo experiments. Therefore, it is necessary to analyze the resistance to phenotypic change, amino acid sequence conservation, as well as security for signal processing when designing a DNA watermarking method that satisfies the above requirements.

Here we present a coding DNA watermarking method for copyright protection of a DNA sequence that provides mutation resistance, watermark security, and amino acid sequence conservation. We analyze the performance of coding DNA watermarking using in silico experiments. The main features of our method are as follows: First, we map codons to numerical values of random circular angles using a random mapping table for security and ease of signal processing. The random mapping table for 64 codons, which includes start and stop codons, is used as the watermark key. Coding sequences of various lengths can be mapped to random circular angles using any random mapping table. Second, we select a number of target codons for embedding using the Lipschitz regularity of local modulus maxima at multi-resolution scales. The local modulus maxima of random circular angles depend on the random mapping table. The length and location of target codons depend on Lipschitz values of local modulus maxima. It is very difficult to detect locations of embedded codons without the knowledge of random mapping table. Third, we embed repeatedly a binary watermark into random circular angles of codons to confer mutation resistance. An angle of a center codon and the distance between two angles of neighboring codons are changed by a bit of a watermark without changing the encoded amino acid. Finally, random circular angles are based on circular coding, which makes the numerical transformation of DNA symbols easier and allows estimation of symbol errors in arbitrary positions. Moreover, it allows the allocation of synonymous codons to neighboring numerical values.

The performance of our in silico experiments verified that our method ensures amino acid sequence conservation and is more resist to point mutations compared with DNA-Crypt watermarking [21] and the Liss method [24]. We investigated the capacity based on the analysis by Balado et al. [38], [39], [40]. We computed the entropy of the random mapping table and random positions of codons for analyzing security and confirmed that the entropies were high.

This paper is organized as follows: In Section 2, we explain the structure of DNA sequences and the genetic code and analyze conventional watermarking methods. We present the proposed DNA watermarking method in Section 3 before analyzing its performance using in silico experiments in Section 4. Finally, are conclusions are presented in Section 5.

Section snippets

The genetic code and DNA watermarking

In this section, we examine the genetic code [41], [42], [43] and then analyze the requirements of DNA watermarking for copyright protection of a DNA sequence.

Proposed coding DNA watermarking

This paper presents a watermarking method for DNA sequences with the features as follows: 1) Embed the binary watermark into coding DNA sequences. Non-coding DNA sequences are included in regulatory regions such as promoters or other regulatory motifs, so coding DNA sequence are suitable for use as the embedding target. 2) Ensure amino acid conservation and resist small-scale mutations such as point mutations. 3) Use a fixed-length watermark key regardless of the sequence length and ensure

Experimental results

Our in silico experiment used coding DNA sequences from the National Center for Bioinformatics information database (Table 3). We analyzed the data capacity of our, Heider et al. [20], [21] and Liss et al. methods [24], and adjusted the watermark lengths to be similar. We then evaluated their mutation resistance. Further, we analyzed the security of our method based on differential entropy. We used a $WDH (n)$ code for error correction from the DNA-Crypt method of Heider et al. However, because

Conclusions

The main requirements of codon DNA watermarking are amino acid conservation, mutation resistance, and security. Here we present a codon DNA watermarking method that satisfies these requirements and was evaluated in silico. The main features of the proposed method are the numerical mapping to random circular angles based on random mapping table, searching for codons using the local modulus maxima according to the Lipschitz regularity, and watermark embedding to ensure amino acid conservation.

Acknowledgements

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (NRF-2011-0023118) and the Busan Metropolitan City, Korea, under the 2013 Brain Busan 21 program grants.

Suk-Hwan Lee received the B.S., M.S., and Ph.D. degrees in Electrical Engineering from Kyungpook National University, Korea in 1999, 2001, and 2004 respectively. He worked at Electronics and Telecommunications Research Institute in 2005. He is currently an Associate Professor in the Department of Information Security at Tongmyong University, which he started in 2005. He works as an Editor of Korea Multimedia Society Journal, is a member of IEEE, IEEK, IEICE and also is an officer of IEEE R10

References (53)

K. Tanaka et al.
Public-key systems using DNA as a one-way function for key distribution
Biosystems
(July 2005)
A. Leier et al.
Cryptography with DNA binary strands
Biosystems
(June 2000)
H.J. Shiu et al.
Data hiding methods based upon DNA sequences
Inf. Sci.
(June 2010)
S.-T. Chen et al.
Adaptive audio watermarking via the optimization point of view on the wavelet-based entropy
Digit. Signal Process.
(May 2013)
M. Ali et al.
An optimized watermarking technique based on self-adaptive DE in DWT–SVD transform domain
Signal Process.
(January 2014)
S.-H. Lee et al.
A watermarking for 3D mesh using the patch CEGIs
Digit. Signal Process.
(March 2007)
S.-H. Lee et al.
CAD drawing watermarking scheme
Digit. Signal Process.
(September 2010)
J.A. Machado et al.
Wavelet analysis of human DNA
Genomics
(June 2011)
P. Sankar
Genetic privacy
Annu. Rev. Med.
(February 2003)
J.A. Springer et al.
Ethics, privacy, and the future of genetic information in healthcare information assurance and security

National Conference of State Legislatures

Genetic privacy laws, genetic information: legal issues relating to discrimination and privacy

M. Yamamoto et al.

Large-scale DNA memory based on the nested PCR

Nat. Comput.

(September 2008)

A. Gehani et al.

DNA-based cryptography

B. Anam et al.

Review on the advancements of DNA cryptography

C.T. Clelland et al.

Hiding messages in DNA microdots

Nature

(June 1999)

V.I. Risca

DNA-based steganography

Cryptologia

(2001)

B. Shimanovsky et al.

Hiding data in DNA

M. Arita

Writing information into DNA

M. Arita et al.

Secret signatures inside genomic DNA

Biotechnol. Prog.

(2004)

G.C. Smith et al.

Some possible codes for encrypting data in DNA

Biotechnol. Lett.

(July 2003)

P.C. Wong et al.

Organic data memory using the DNA approach

Commun. ACM

(January 2003)

N. Yachie et al.

Alignment-based approach for durable data storage into living organisms

Biotechnol. Prog.

(March–April 2007)

N. Yachie et al.

Stabilizing synthetic data in the DNA of living organisms

Syst. Synth. Biol.

(June 2008)

D. Heider et al.

DNA watermarks in non-coding regulatory sequences

BMC Res. Notes

(2009)

D. Heider et al.

DNA-based watermarks using the DNA-Crypt algorithm

BMC Bioinform.

(2007)

D. Heider et al.

DNA watermarks – a proof of concept

BMC Mol. Biol.

(2008)

Cited by (8)

A two-parameter extended logistic chaotic map for modern image cryptosystems
2024, Digital Signal Processing: A Review Journal
In this paper, a novel Extended Logistic Chaotic Map (ELCM) with two control parameters is proposed. Overcoming the major drawback of the standard Logistic and other existing chaotic maps, the ELCM not only has infinite chaotic range as well as good ergodicity, but also has a simple structure just like the Logistic map, which greatly facilitates its practical implementation and becomes very suitable for today's real-time applications. Moreover, a new color image encryption scheme based on chaos concept, DNA (Deoxyribonucleic Acid) encoding and Convolutional Neural Network (CNN) is also presented. To perform a chaotic scrambling, the ELCM is however used at different stages of the encryption process. In addition, a pre-trained AlexNet CNN is used to generate a public key. After XORing with the secret key, the latter is used, on the one hand, to generate the initials values and the control parameters of the ELCM. On the other hand, it will be split into two keys in order to generate a random grayscale image, which will then be XORed with the three components (i.e., R, G and B) of the color plaintext image. Afterward, a permutation operation, DNA encoding, diffusion operation as well as bit reversion operation are then applied to the R, G and B components. The performance evaluation demonstrates that the ELCM not only has an infinite chaotic parameter range, but also exhibits a high chaotic complexity. Besides, experimental results as well as security analysis confirm that with NPCR of 99.6287%, the designed color image encryption scheme achieves an average entropy of 7.9975 and a near zero correlation (-0.0027). Furthermore, the proposed encryption scheme is in fact of higher security level in comparison to other schemes recently presented in the literature, thus making it highly suitable for today's real-time applications.
Image watermarking using chaotic map and DNA coding
2015, Optik
Citation Excerpt :
Recently, Lee presented a coding DNA watermarking method in a lifting-based DWT domain that focused on the feasibility of frequency domain watermarking for DNA sequences [28]. Then he proposed a DNA watermarking method that allocated codons to random circular angles using a random mapping table and selected a number of codons for embedding targets using the Lipschitz regularity [29]. In the DNA-based encryption, Smith et al. discussed three codes for encrypting data in DNA and shown the Huffman code would be more useful than the comma code and the alternating code [14].
In recent years, image watermarking has attracted considerable research interest, especially jointing with chaotic map and DNA sequences. In this work, we propose a novel architecture of image watermarking using chaotic map and DNA coding. Firstly, two logistic chaotic maps are used to structure a secure architecture with embedding watermarking into the LSB (least significant bit) of cover image. Analyzing numerical experimental results and comparing with previous works show that the proposed architecture possesses higher security than previous works. Then the method of DNA coding is jointed into the architecture to improve the ER (embedding rate). It not only significantly increases the ER, but also accelerates the development of DNA-based watermarking. The improved architecture is suitable for protecting the copyright of cover image in DNA-based information security.
A machine learning toolkit for genetic engineering attribution to facilitate biosecurity
2020, Nature Communications
DNA Watermarking using codon postfix technique
2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics
Reversible DNA data hiding using multiple difference expansions for DNA authentication and storage
2018, Multimedia Tools and Applications
Reversible Data Hiding for DNA Sequence Using Multilevel Histogram Shifting
2018, Security and Communication Networks

View all citing articles on Scopus

View full text

DNA sequence watermarking based on random circular angle

Highlights

Abstract

Introduction

Section snippets

The genetic code and DNA watermarking

Proposed coding DNA watermarking

Experimental results

Conclusions

Acknowledgements

Biosystems

Biosystems

Inf. Sci.

Digit. Signal Process.

Signal Process.

Digit. Signal Process.

Digit. Signal Process.

Genomics

Genetic privacy

Annu. Rev. Med.

Ethics, privacy, and the future of genetic information in healthcare information assurance and security

Genetic privacy laws, genetic information: legal issues relating to discrimination and privacy

Large-scale DNA memory based on the nested PCR

Nat. Comput.

DNA-based cryptography

Review on the advancements of DNA cryptography

Hiding messages in DNA microdots

Nature

DNA-based steganography

Cryptologia

Hiding data in DNA

Writing information into DNA

Secret signatures inside genomic DNA

Biotechnol. Prog.

Some possible codes for encrypting data in DNA

Biotechnol. Lett.

Organic data memory using the DNA approach

Commun. ACM

Alignment-based approach for durable data storage into living organisms

Biotechnol. Prog.

Stabilizing synthetic data in the DNA of living organisms

Syst. Synth. Biol.

DNA watermarks in non-coding regulatory sequences

BMC Res. Notes

DNA-based watermarks using the DNA-Crypt algorithm

BMC Bioinform.

DNA watermarks – a proof of concept

BMC Mol. Biol.