Abstract
Nature has been using DNA to store biological data for millions of years, and finally humans are learning to use the same medium for our own data. In this paper, we survey the field of cellular DNA encoding, where encoding schemes are used to insert data into pcDNA and ncDNA areas while bypassing the biological restrictions associated with those areas. We first characterize the unique bio-restrictions associated with existing cellular DNA encoding schemes, then we contrast the schemes with respect to the restrictions they meet, supported features, and implementation details. We discuss the pros and cons of the implementation of each encoding scheme, and make recommendations accordingly. Finally, we highlight existing gaps, and provide our insight into future research directions.
Similar content being viewed by others
References
Brunet T (2016) Aims and methods of biosteganography. J Biotechnol 226:56–64
Zhirnov V, Zadegan R, Sandhu G, Church G, Hughes W (2016) Nucleic acid memory. Nat Mater 15:336–370
Tanaka H (2008) Evaluation of information leakage via electromagnetic emanation and effectiveness of tempest. IEICE Trans Inform Syst 91(5):1439–1446
Lee S-H (2014) Dwt based coding DNA watermarking for DNA copyright protection. Inform Sci 273:263–286
Palkopoulou E, Mallick S, Skoglund P, Enk J, Rohland N, Li H, Omrak A, Vartanyan S, Poinar H, Götherström A (2015) Complete genomes reveal signatures of demographic and genetic declines in the woolly mammoth. Curr Biol 25(10):1395–1400
Clelland C, Risca V, Bancroft C (1999) Hiding messages in DNA microdots. Nature 399:533–534
Church GM, Gao Y, Kosuri S (2012) Next-generation digital information storage in DNA. Science 337(6102):1628–1628
Goldman N, Bertone P, Chen S, Dessimoz C, LeProust EM, Sipos B, Birney E (2013) Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature 494(7435):77–80
Huffman D (1952) A method for the construction of minimum-redundancy codes. Proc IRE 40(9):1098–1101
Grass RN, Heckel R, Puddu M, Paunescu D, Stark WJ (2015) Robust chemical preservation of digital information on DNA in silica with error-correcting codes. Angewandte Chemie Int Ed 54(8):2552–2555
Reed IS, Solomon G (1960) Polynomial codes over certain finite fields. J Soc Indus Appl Math 8(2):300–304
Yazdi SHT, Yuan Y, Ma J, Zhao H, Milenkovic O (2015) A rewritable, random-access DNA-based storage system. Sci Rep 5:14138
Blawat M, Gaedke K, Huetter I, Chen X-M, Turczyk B, Inverso S, Pruitt B, Church G (2016) Forward error correction for DNA data storage. Procedia Comput Sci 80:1011–1022
Bose DR-CRC (1960) On a class of error correcting binary group codes. Inform Control 3(1):68–79
Lin S, Costello DJ (2004) Error control coding. Pearson Education India
Bornholt J, Lopez R, Carmean D M,Ceze L, Seelig G, Strauss K (2016) A DNA-based archival storagesystem. In: Proceedings of the twenty-first international conference on architectural support for programming languages and operating systems. ACM, pp 637–649
Potdar V M, Han S, Chang E (2005) Fingerprinted secret sharing steganography for robustness againstimage cropping attacks. In: INDIN’05. 2005 3rd IEEE International Conference on Industrial Informatics, IEEE, pp 717–724
Jung K-H, Yoo K-Y (2009) Data hiding method using image interpolation. Comput Stand Interfaces 31(2):465–470
Li Z, Chen X, Pan X, Zeng X (2009) Losslessdata hiding scheme based on adjacent pixel difference. In: Computer Engineering and Technology, ICCET’09. International Conferenceon, vol 1, IEEE, pp 588–592
Manikopoulos C, Shi Y-Q, Song S, Zhang Z, Ni Z, Zou D (2002) Detection of block dct-based steganography in gray-scale images. In: Multimedia signal processing, 2002 IEEE Workshop on IEEE, pp 355–358
McKeon R T (2007) Strange fourier steganography in movies. In: 2007 IEEE International Conference on Electro/Information Technology, IEEE, pp 178–182
Chen W-Y (2007) Color image steganography scheme using set partitioning in hierarchical trees coding, digital fourier transform and adaptive phase modulation. Appl Math Comput 185(1):432–448
Potdar V M, Han S, Chang E (2005) A survey of digital image watermarking techniques. In: INDIN’05. 2005 3rd IEEE international conference on industrial informatics, IEEE, pp 709–716
Verma B, Jain S, Agarwal D (2005) Watermarking image databases: a review. In: Proceedings of the international conference on cognition and recognition, Mandya, Karnataka, India, pp 171–179
Abdulaziz N, Pang K (2000) Robust data hiding for images. In: Communication technology proceedings, 2000. WCC-ICCT 2000. International Conference on IEEE, vol 1, pp 380–383
Fard A M, Akbarzadeh-T M-R, Varasteh-A F,Varasteh-A F (2006) A new genetic algorithm approach for securejpeg steganography. In: 2006 IEEE International conference on engineering of intelligent systems, IEEE, pp 1–6
Dailey Paulson L (2006) New system fights steganography
Abdelwahab AA, Hassaan LA (2008) A discrete wavelet transform based technique for image data hiding. In: Radio science conference, NRSC National, IEEE, pp 1–9
Sallee P (2004) Model-based steganography. In: Kalker T, Cox I, Ro YM (eds) International workshop on digital watermarking, vol 2939. Springer, Berlin, Heidelberg, pp 154–167
Chang C-C, Tsai P, Lin M-H (2004) Anadaptive steganography for index-based images using codewordgrouping. In: Aizawa K, Nakamura Y, Satoh S (eds) Advances in multimedia information processing—PCM 2004, vol 3333. Springer, Berlin, Heidelberg, pp 731–738
Hirohisa H (2002) A data embedding method using bpcs principle with new complexity measures. In: Proceedings of pacific rim workshop on digital steganography, pp 30–47
Wu Y-T, Shih FY (2006) Genetic algorithm based methodology for breaking the steganalytic systems. IEEE Trans Syst Man Cybern Part B (Cybernetics) 36(1):24–31
Adleman LM (1994) Molecular computation of solutions to combinatorial problems. Nature 369:40
Boneh D, Dunworth C, Lipton RJ, Sgall J (1996) On the computational power of DNA. Discrete Appl Math 71(1):79–94
Kari L, Gloor G, Yu S (2000) Using DNA to solve the bounded post correspondence problem. Theor Comput Sci 231(2):193–203
Ogihara M, Ray A (1999) Executing parallellogical operations with DNA. In: Evolutionary computation, 1999. CEC 99. Proceedings of the 1999 Congress on IEEE, vol 2
Stojanovic MN, Stefanovic D (2003) A deoxyribozyme-based molecular automaton. Nat Biotechnol 21(9):1069–1074
Macdonald J, Li Y, Sutovic M, Lederman H, Pendri K, Lu W, Andrews BL, Stefanovic D, Stojanovic MN (2006) Medium scale integration of molecular logic gates in an automaton. Nano Lett 6(11):2598–2603
Benenson Y, Gil B, Ben-Dor U, Adar R, Shapiro E (2004) An autonomous molecular computer for logical control of gene expression. Nature 429(6990):423–429
Nayebi A (2009) Fast matrix multiplication techniques based on the adleman-lipton model. arXiv preprintarXiv:0912.0750
Bonnet J, Yin P, Ortiz ME, Subsoontorn P, Endy D (2013) Amplifying genetic logic gates. Science 340(6132):599–603
Brophy JA, Voigt CA (2014) Principles of genetic circuit design. Nat Methods 11(5):508–520
Nielsen A A, Der B S, Shin J,Vaidyanathan P, Paralanov V, Strychalski E A, Ross D,Densmore D, Voigt C A (2016) Genetic circuit design automation. Science 352(6281):aac7341
Watson J, Crick F (1953) Molecular structure of nucleic acids: a structure for deoxyribose nucleic acid. Nature 171(4356):737–738
Watson J, Baker T, Bell S, Gann A, Levine M, Losich R (2008) Molecular biology of the gene, 6th edn. Pearson, London
Angov E (2011) Codon usage: nature’s roadmap to expression and folding of proteins. Biotechnol J 6(6):650–659
Consortium et al EP (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489(414):57–74
Blattner FEA (1997) The complete genome sequence of Escherichia coli k-12. Science 277(5331):1453–1462
Viguera E, Conceill D, Ehrlich S (2001) Replication slippage involves DNA polymerase pausing and dissociation. Embo J 20(10):2587–2595
Smith G, Fiddles C, Hawkins J, Cox J (2003) Some possible codes for encrypting data in DNA. Biotechnol Lett 25:1125–1130
Brenner S, Williams SR, Vermaas EH, Storck T, Moon K, McCollum C, Mao JI, Luo S, Kirchner JJ, Eletr S, DuBridge RB (2000) In:vitro cloning of complex mixtures of DNA on micro beads: physical separation of differentially expressed CDNAS. Proc Natl Acad Sci 97(4):1665–1670
Mousa H, Moustafa K, Abdel-Wahed W, Hadhoud MM (2011) Data biding based on contrast mapping using DNA medium. Int Arab J Inform Technolol 8(2):147–154
Arita M, Yoshiaki O (2004) Secret signatures inside genomic DNA. Biotechnol Progr 20:1605–1607
Heider D, Barnekow A (2007) DNA-based watermarks using the DNA-crypt algorithm. BMC Bioinform 8:176. https://doi.org/10.1186/1471-2105-8-176
Liss M, Daubert D, Kliche K, Hammes U, Leiherer A, Wagner R (2012) Embedding permanent watermarks in synthetic genes. PLOS One 7:8
Khalifa A, Hamad S (2015) Hiding secret information in DNA sequences using silent mutations. Br J Math Comput Sci 11(5):1–11
Haughton D, Balado F (2013) Biocode: two biologically compatible algorithms for embedding data in non-coding and coding regions of DNA. BMC Bioinform 14(1):1
Lee S (2014) Dwt based coding DNA watermarking for DNA copyright protection. Inform Sci 273:263–286
Wong P, Wong K, Foote H (2003) Organic data memory using the DNA approach. Commun ACM 46(1):95–98
Yachie N, Sekiyama K, Sugahard J, Ohashi Y, Tomita M (2007) Alignment-based approach for durable data storage into living organisms. Biotechnol Progr 23:501–505
Ailenberg M, Rotstein O (2009) An improved huffman coding method for archiving text, images, and music characters in DNA. BioTechniques 47:747–754
Haughton D, Balado F (2011) Repetition coding as an effective error correction code for information encodedin DNA. In: 2011 IEEE 11th International Conference on Bioinformatics and Bioengineering. IEEE, Taichung, Taiwan
Heider D, Pyka M, Barnekow A (2009) DNA watermarks in non-coding regulatory sequences. BMC Res Notes 2:123
Kracht D, Schober S (2015) Insertion and deletion correcting DNA barcodes based watermarks. BMC Bioinform 16:50
Chun J, Lee H, Yoon J (2013) Passing go with DNA sequencing: delivering messages in a covert transgenic channel. IEEE CS Secur Priv Workshop 14:121
De Silva P, Ganegoda G (2016) New trends ofdigital data storage in DNA. Biomed Res Int 2016:8072463. https://doi.org/10.1155/2016/8072463
Heider D, Kessler D, Barnekow A (2008) Watermarking sexually reproducing diploid organisms. Bioinformatics 24(17):1961–1962
Garesse R, Vallejo C (2001) Animal mitochondrial biogenesis and function: a regulatory cross-talk between two genomes. Gene 263:1–16
Ratel D, Ravanat J, Berger F, Wion D (2006) N6-methyladenine: the other methylated base of DNA. Bioessays 28(3):309–315
Srinivas N, Deb K (1994) Muiltiobjective optimization using nondominated sorting in genetic algorithms. Evol Comput 2(3):221–248
Dorigo M, Stützle T (2004) Ant colony optimization. Bradford Company
Zhou Z, Dang Y, Zhou M, Li L, Yu C-H, Fu J, Chen S, Liu Y (2016) Codon usage is an important determinant of gene expression levels largely through its effects on transcription. Proc Natl Acad Sci 113(41):E6117–E6125
Kunkel TA (2004) DNA replication fidelity. J Biol Chem 279(17):16895–16898
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Dagher, G.G., Machado, A.P., Davis, E.C. et al. Data storage in cellular DNA: contextualizing diverse encoding schemes. Evol. Intel. 14, 331–343 (2021). https://doi.org/10.1007/s12065-019-00202-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12065-019-00202-z