Abstract
Data-hiding in deoxyribonucleic acid (DNA) sequences can be used to develop an organic memory and to track parent genes in an offspring as well as in genetically modified organism. However, the main concerns regarding data-hiding in DNA sequences are the survival of organism and successful extraction of watermark from DNA. This implies that the organism should live and reproduce without any functional disorder even in the presence of the embedded data. Consequently, performing synonymous substitution in amino acids for watermarking becomes a primary option. In this regard, a hybrid watermark embedding strategy that employs synonymous substitution in both twofold and fourfold codons of amino acids is proposed. This work thus presents a high-capacity and mutation-resistant watermarking technique, DNA-LCEB, for hiding secret information in DNA of living organisms. By employing the different types of synonymous codons of amino acids, the data storage capacity has been significantly increased. It is further observed that the proposed DNA-LCEB employing a combination of synonymous substitution, lossless compression, encryption, and Bose–Chaudary–Hocquenghem coding is secure and performs better in terms of both capacity and robustness compared to existing DNA data-hiding schemes. The proposed DNA-LCEB is tested against different mutations, including silent, miss-sense, and non-sense mutations, and provides substantial improvement in terms of mutation detection/correction rate and bits per nucleotide. A web application for DNA-LCEB is available at http://111.68.99.218/DNA-LCEB.
















Similar content being viewed by others
References
Agarwal H (2010) Matlab implementation, analysis and comparison of RSA family cryptosystems. In: Presented at the IEEE conference on computational intelligence and computing research (ICCIC). doi:10.1109/ICCIC.2010.5705873
Ailenberg M, Rotstein OD (2009) An improved Huffman coding method for archiving text, images, and music characters in DNA. Biotechniques 47:747–754
Arita M, Ohashi Y (2004) Secret signatures inside genomic DNA. Biotechnol Prog 20:1605–1607
Balado FE, Haughton D (2010) Performance of DNA data embedding algorithms under substitution mutations. In: Presented at the 2010 IEEE international conference on bioinformatics and biomedicine workshops, Hong Kong, pp 201–206
Bose RC, Chaudhuri R (1960) On a class of error correction binary group codes. Inf Control 3(1):68–79
Chang CC, Lu T-C, Chang Y-F, Lee C-T (2007) Reversible data hiding schemes for deoxyribonucleic acid (DNA) medium. Int J Innov Comput Inf Control 3:1145–1160
Church GM, Gao Y, Kosuri S (2012) Next generation digital information storage in DNA. Science 07:2012
Cipra BA (1993) The ubiquitous Reed–Solomon codes. SIAM News 26-1
Clelland CT, Risca V, Bancroft C (1999) Hiding data in DNA microdots. Nature 399:533–534
Crick F, Watson JD (1953) Molecular structure of nucleic acids: a structure for deoxyribose nucleic acid. Nature 171:737–738
Daemen J, Rijmen V (1999) The block cipher rijndael . In: Third international conference, CARDIS’98, Louvain-la-Neuve, Belgium, September 14–16, 1998. Proceedings, pp 277–284. doi:10.1007/10721064_26
Gehani A, LaBean TH, Reif JH (2004) DNA based cryptography. Comput J IMACS DNA Based Comput Am Math Soc USA 2950:34–50
Gonzalez RC, Woods RE (2002) Digital image processing. Pearson Education, New Delhi
Hayat M, Khan A, Yeasin M (2012) Prediction of membrane proteins using split amino acid ensemble classification. Amino Acids 42:2447–2460
Heider D, Barnekow A (2007) DNA-based watermarks using the DNA-Crypt algorithm. Comput J BMC Bioinform 8:176–187
Heider D, Barnekow A (2008) DNA watermarks: a proof of concept. Comput J BMC Mol Biol 9:45–49
Heider D, Kessler D, Barnekow A (2008) Watermarking sexually reproducing diploid organisms. Bioinformatics 24:1961–1962
Heider D, Pyka M, Barnekow A (2009) DNA watermarks in non-coding regulatory sequences. BMC Res Notes 2:125
Khan A, Mirza AM (2007) Genetic perceptual shaping: utilizing cover image and conceivable attack information using genetic programming. Inf Fusion 8:354–365
Khan A, Tahir SF, Majid A, Chor T-S (2008) Machine learning based adaptive watermark decoding in view of an anticipated attack. Pattern Recognit 41:2594–2610
Kim H (2008) DNA repair Ku proteins in gastric cancer cells and pancreatic acinar cells. Amino Acids 34(2):195–202
Liss M, Daubert D, Brunner K, Kliche K, Hammes U, Leiherer A et al (2012) Embedding permanent watermarks in synthetic genes. PLoS One 7:10
Liu G, Liu H, Kadir A (2014) Hiding message into DNA sequence through DNA coding and chaotic maps. Med Biol Eng Comput 52(9):741–747. doi:10.1007/s11517-014-1177-3
Miller F (1882) Telegraphic code to insure privacy and secrecy in the transmission of telegrams. C.M. Cornwell
Modegi T (2005) Watermark embedding techniques for DNA sequences using codon usage bias features. In: Presented at the 16th international conference on genome informatics
Mousa H, Moustafa K, Abdel-Wahed W, Hadhoud M (2011) Data hiding based on contrast mapping using DNA medium. Int Arab J Inf Technol 8:147–154
Naveed M, Khan A (2011) GPCR-MPredictor: multi-level prediction of G protein-coupled receptors using genetic. Amino Acids 42:1825
NCBI (2012) GenBank. www.ncbi.nlm.nih.gov/genbank/
NIoSaT (NIST) (2001) Federal information processing standards publication (FIPS 197). Advanced encryption standard (AES)
Shimanovsky B, Feng J, Potkon M (2003) Hiding data in DNA. In: Presented at the revised papers from the 5th international workshop on information hiding, IH 2002 Noordwijkerhout, The Netherlands. Lecture Notes in Computer Science, vol 2578, pp 373–386
Shiu HJ, Ng KL, Feng JF, Lee RCT, Huang CH (2010) Data hiding method based upon DNA sequences. Inf Sci 180:12
Smith GC, Fiddes CC, Hawkings JP, Cox JPL (2003) Some possible codes for encrypting data in DNA. Biotechnol Lett 25:1125–1130
Tu C, Liang J, Tran TD (2003) Adaptive runlength coding. IEEE Signal Process Lett 10:61–64
Usman I, Khan A (2010) BCH coding and intelligent watermark embedding: employing both frequency and strength selection. Appl Soft Comput J 10:332–343
Wong PC, Wong K-K, Foote H (2003) Organic data memory using the DNA approach. Commun ACM 46:95–98
Yachie N, Ohashi Y, Tomita M (2008) Stabilizing synthetic data in the DNA of living organisms. Syst Synth Biol 2:19–25
Acknowledgments
This work is supported by ICT R&D, Pakistan research grant project; ICTRDF/TR&D/2012/62-DEWS and COMSTECH-TWAS Joint Research Grants Program for Young Scientist; 12-216 RG/ITC/AS-C; UNESCO FR: 3240270865. We also thank Mr. Khurram Jawad for his help in improving the write-up of the manuscript.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Hafeez, I., Khan, A. & Qadir, A. DNA-LCEB: a high-capacity and mutation-resistant DNA data-hiding approach by employing encryption, error correcting codes, and hybrid twofold and fourfold codon-based strategy for synonymous substitution in amino acids. Med Biol Eng Comput 52, 945–961 (2014). https://doi.org/10.1007/s11517-014-1194-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11517-014-1194-2