DNA sequence watermarking based on random circular angle
Introduction
The genetic code contains profound personal information. It may be considered as a personal diary, and its unauthorized disclosure may be considered as a grave invasion of privacy and a violation of human rights. Legal measures have been established to ensure the safety and security of procedures for collecting human genetic information (HGI) [1], [2], [3]. There are laws of ethics or guidelines for HGI use, but security techniques for preventing illegal copying and piracy of HGI are urgently required. DNA is considered as a new biometric medium for storing extraordinarily large amounts of data. Thus, DNA storage demands that DNA security techniques are addressed. Recent research on DNA security includes DNA cryptography [4], [5], [6], [7], steganography [8], [9], [10], [11], [12], [13], [14], [15], [16], and watermarking [17], [18], [19], [20], [21], [22], [23], [24], [25] using DNA sequences with a character stream of A, G, T (or U), and C. These studies [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25] were validated by in vivo, in vitro, or in silico experiments.
DNA cryptography [4], [5], [6], [7] is a technique for biological encryption and decryption based on the polymerase chain reaction (PCR) or DNA chips and has been recognized as a new biological encryption technique for potential widespread use in the future. However, it has not yet begun to replace the conventional encryption algorithms because of the difficulty in its implementation. DNA steganography [8], [9], [10], [11], [12], [13], [14], [15], [16] is the technique for hiding messages in DNA sequences and is useful for DNA signature/identification and DNA storage of vast quantities of information. However, the purposes of DNA cryptography and steganography are not to recover DNA sequences or messages under changing experimental conditions or from mutations, and they are therefore not suitable as applications for copyright protection.
DNA watermarking [17], [18], [19], [20], [21], [22], [23], [24], [25] is a technique for protecting the information within a DNA sequence. DNA-based watermarks can be applied to copyright-protect DNA sequences as well as for discriminating between wild-type and artificial genomes [25]. Recently, J. Craig Venterʼs research team inserted a watermark at intergenic sites known to tolerate transposon insertions to identify the genome of Mycoplasma genitalium JCVI-1.0 as synthetic [26], [27]. Jupiter et al. [28], [29] presented the strategy of watermark implementation for tracking select or infectious agents. They suggested five features for implementation strategy as follows: message fidelity, error tolerance, easy interpretation, uniqueness, and resistance. They compared and analyzed the features of DNA and multimedia watermarking. Multimedia watermarking schemes for audio, image, video, and 3D model are mostly processed in the frequency domain of the discrete cosine transform (DCT) [30], [31], discrete wavelet transform (DWT) [32], [33], [34], and scale-invariant feature transform (SIFT) [35], as well as the geometric domain [36], [37]. However, DNA-based watermark must be embedded without changing the function of coding regions, which represents the main difference between DNA and multimedia watermarking.
The genome contains all of an organismʼs hereditary information, and it includes coding sequences (genes), which are translated into polypeptide chains (proteins), representing, for example, approximately 1.5% of the human genome. Sequences that do not encode proteins may be transcribed into non-coding RNAs such as micro-RNAs, which may code for RNAs that regulate gene function. The genome also includes pseudogenes, which are mutated remnants of genes that are unable to encode a functional product. The majority of the human genome comprises non-coding repeated sequences. DNA steganography or watermarking methods have been designed differently depending on whether the information is embedded in non-coding [17], [18], [19], [20] or coding DNA [21], [22], [23], [24].
Non-coding DNA-based methods typically assume that non-coding DNA does not change the phenotype of an organism. Based on this assumption, any pirate could substitute an arbitrary sequence or dummy sequence for a non-coding DNA sequence while including embedded sequences without changing phenotype. However, many types of non-coding DNA sequences have known biological functions, including the transcriptional and translational regulation of protein-coding sequences. Non-transcribed DNA sequences may contribute to chromosomal properties or possess functions that are yet to be discovered. Therefore, it is not yet been appropriate to manipulate non-coding DNA for steganography and watermarking.
Coding DNA is translated to a polypeptide chain, so the watermark embedded in coding DNA sequences should preserve the protein profile of an organism. This is a prerequisite and a limitation of coding DNA-based methods. We refer to this limit here as amino acid residue (amino acid) conservation. Amino acid conservation is problematic when designing DNA watermarking, in contrast to image and video watermarking.
Most conventional DNA watermarking methods focus on the embedding process through simple substitutions or bit allocations to a base or codon as determined by the genetic code, and focus has been placed on watermarked gene in in vivo experiments. Therefore, it is necessary to analyze the resistance to phenotypic change, amino acid sequence conservation, as well as security for signal processing when designing a DNA watermarking method that satisfies the above requirements.
Here we present a coding DNA watermarking method for copyright protection of a DNA sequence that provides mutation resistance, watermark security, and amino acid sequence conservation. We analyze the performance of coding DNA watermarking using in silico experiments. The main features of our method are as follows: First, we map codons to numerical values of random circular angles using a random mapping table for security and ease of signal processing. The random mapping table for 64 codons, which includes start and stop codons, is used as the watermark key. Coding sequences of various lengths can be mapped to random circular angles using any random mapping table. Second, we select a number of target codons for embedding using the Lipschitz regularity of local modulus maxima at multi-resolution scales. The local modulus maxima of random circular angles depend on the random mapping table. The length and location of target codons depend on Lipschitz values of local modulus maxima. It is very difficult to detect locations of embedded codons without the knowledge of random mapping table. Third, we embed repeatedly a binary watermark into random circular angles of codons to confer mutation resistance. An angle of a center codon and the distance between two angles of neighboring codons are changed by a bit of a watermark without changing the encoded amino acid. Finally, random circular angles are based on circular coding, which makes the numerical transformation of DNA symbols easier and allows estimation of symbol errors in arbitrary positions. Moreover, it allows the allocation of synonymous codons to neighboring numerical values.
The performance of our in silico experiments verified that our method ensures amino acid sequence conservation and is more resist to point mutations compared with DNA-Crypt watermarking [21] and the Liss method [24]. We investigated the capacity based on the analysis by Balado et al. [38], [39], [40]. We computed the entropy of the random mapping table and random positions of codons for analyzing security and confirmed that the entropies were high.
This paper is organized as follows: In Section 2, we explain the structure of DNA sequences and the genetic code and analyze conventional watermarking methods. We present the proposed DNA watermarking method in Section 3 before analyzing its performance using in silico experiments in Section 4. Finally, are conclusions are presented in Section 5.
Section snippets
The genetic code and DNA watermarking
In this section, we examine the genetic code [41], [42], [43] and then analyze the requirements of DNA watermarking for copyright protection of a DNA sequence.
Proposed coding DNA watermarking
This paper presents a watermarking method for DNA sequences with the features as follows: 1) Embed the binary watermark into coding DNA sequences. Non-coding DNA sequences are included in regulatory regions such as promoters or other regulatory motifs, so coding DNA sequence are suitable for use as the embedding target. 2) Ensure amino acid conservation and resist small-scale mutations such as point mutations. 3) Use a fixed-length watermark key regardless of the sequence length and ensure
Experimental results
Our in silico experiment used coding DNA sequences from the National Center for Bioinformatics information database (Table 3). We analyzed the data capacity of our, Heider et al. [20], [21] and Liss et al. methods [24], and adjusted the watermark lengths to be similar. We then evaluated their mutation resistance. Further, we analyzed the security of our method based on differential entropy. We used a code for error correction from the DNA-Crypt method of Heider et al. However, because
Conclusions
The main requirements of codon DNA watermarking are amino acid conservation, mutation resistance, and security. Here we present a codon DNA watermarking method that satisfies these requirements and was evaluated in silico. The main features of the proposed method are the numerical mapping to random circular angles based on random mapping table, searching for codons using the local modulus maxima according to the Lipschitz regularity, and watermark embedding to ensure amino acid conservation.
Acknowledgements
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (NRF-2011-0023118) and the Busan Metropolitan City, Korea, under the 2013 Brain Busan 21 program grants.
Suk-Hwan Lee received the B.S., M.S., and Ph.D. degrees in Electrical Engineering from Kyungpook National University, Korea in 1999, 2001, and 2004 respectively. He worked at Electronics and Telecommunications Research Institute in 2005. He is currently an Associate Professor in the Department of Information Security at Tongmyong University, which he started in 2005. He works as an Editor of Korea Multimedia Society Journal, is a member of IEEE, IEEK, IEICE and also is an officer of IEEE R10
References (53)
- et al.
Public-key systems using DNA as a one-way function for key distribution
Biosystems
(July 2005) - et al.
Cryptography with DNA binary strands
Biosystems
(June 2000) - et al.
Data hiding methods based upon DNA sequences
Inf. Sci.
(June 2010) - et al.
Adaptive audio watermarking via the optimization point of view on the wavelet-based entropy
Digit. Signal Process.
(May 2013) - et al.
An optimized watermarking technique based on self-adaptive DE in DWT–SVD transform domain
Signal Process.
(January 2014) - et al.
A watermarking for 3D mesh using the patch CEGIs
Digit. Signal Process.
(March 2007) - et al.
CAD drawing watermarking scheme
Digit. Signal Process.
(September 2010) - et al.
Wavelet analysis of human DNA
Genomics
(June 2011) Genetic privacy
Annu. Rev. Med.
(February 2003)- et al.
Ethics, privacy, and the future of genetic information in healthcare information assurance and security
Genetic privacy laws, genetic information: legal issues relating to discrimination and privacy
Large-scale DNA memory based on the nested PCR
Nat. Comput.
DNA-based cryptography
Review on the advancements of DNA cryptography
Hiding messages in DNA microdots
Nature
DNA-based steganography
Cryptologia
Hiding data in DNA
Writing information into DNA
Secret signatures inside genomic DNA
Biotechnol. Prog.
Some possible codes for encrypting data in DNA
Biotechnol. Lett.
Organic data memory using the DNA approach
Commun. ACM
Alignment-based approach for durable data storage into living organisms
Biotechnol. Prog.
Stabilizing synthetic data in the DNA of living organisms
Syst. Synth. Biol.
DNA watermarks in non-coding regulatory sequences
BMC Res. Notes
DNA-based watermarks using the DNA-Crypt algorithm
BMC Bioinform.
DNA watermarks – a proof of concept
BMC Mol. Biol.
Cited by (8)
A two-parameter extended logistic chaotic map for modern image cryptosystems
2024, Digital Signal Processing: A Review JournalImage watermarking using chaotic map and DNA coding
2015, OptikCitation Excerpt :Recently, Lee presented a coding DNA watermarking method in a lifting-based DWT domain that focused on the feasibility of frequency domain watermarking for DNA sequences [28]. Then he proposed a DNA watermarking method that allocated codons to random circular angles using a random mapping table and selected a number of codons for embedding targets using the Lipschitz regularity [29]. In the DNA-based encryption, Smith et al. discussed three codes for encrypting data in DNA and shown the Huffman code would be more useful than the comma code and the alternating code [14].
A machine learning toolkit for genetic engineering attribution to facilitate biosecurity
2020, Nature CommunicationsDNA Watermarking using codon postfix technique
2018, IEEE/ACM Transactions on Computational Biology and BioinformaticsReversible DNA data hiding using multiple difference expansions for DNA authentication and storage
2018, Multimedia Tools and ApplicationsReversible Data Hiding for DNA Sequence Using Multilevel Histogram Shifting
2018, Security and Communication Networks
Suk-Hwan Lee received the B.S., M.S., and Ph.D. degrees in Electrical Engineering from Kyungpook National University, Korea in 1999, 2001, and 2004 respectively. He worked at Electronics and Telecommunications Research Institute in 2005. He is currently an Associate Professor in the Department of Information Security at Tongmyong University, which he started in 2005. He works as an Editor of Korea Multimedia Society Journal, is a member of IEEE, IEEK, IEICE and also is an officer of IEEE R10 Changwon section. His research interests include multimedia signal processing, multimedia security, digital signal processing, bio security, and computer graphics.