Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter May 11, 2017

Genetic traces of never born proteins

  • Monika Piwowar EMAIL logo , Ewa Matczyńska , Maciej Malawski , Tomasz Szapieniec and Irena Roterman-Konieczna

Abstract

The presented results cover issues related to proteins that were “never born in nature”. The paper is focused on identifying genetic information stretches of protein sequences that were not identified to be existing in nature. The aim of the work was finding traces of “never born proteins” (NBP) everywhere in completely sequenced genomes including regions not expected as carrying the genetic information. The results of analyses relate to the search of the genetic material of species from different levels of the evolutionary tree from yeast through plant organisms up to the human genome. The analysis concerns searching the genome sequences. There are presented statistical details such as sequence frequencies, their length, percent identity and similarity of alignments, as well as E value of sequences found. Computations were performed on gLite-based grid environment. The results of the analyses showed that the NBP genetic record in the genomes of the studied organisms is absent at a significant level in terms of identity of contents and length of the sequences found. Most of the found sequences considered to be similar do not exceed 50% of the length of the NBP output sequences, which confirms that the genetic record of proteins is not accidental in terms of composition of gene sequences but also as regards the place of recording in genomes of living organisms.

  1. Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.

  2. Research funding: This work was financially supported by EUChinaGrid.

  3. Employment or leadership: None declared.

  4. Honorarium: None declared

  5. Competing interests: The funding organization(s) played no role in the study design; in the collection, analysis, and interpretation of data; in the writing of the report; or in the decision to submit the report for publication.

References

1. Szybalski W. In vivo and in vitro initiation of transcription. Adv Exp Med Biol 1974;44:23–4.10.1007/978-1-4684-3246-6_3Search in Google Scholar PubMed

2. Gibson DG, Benders GA, Andrews-Pfannkoch C, Denisova EA, Baden-Tillson H, Zaveri J, et al. Complete chemical synthesis, assembly, and cloning of a Mycoplasma genitalium genome. Science 2008;319:1215–20.10.1126/science.1151721Search in Google Scholar PubMed

3. Luisi PL, Chiarabelli C, Stano P. From never born proteins to minimal living cells: two projects in synthetic biology. Orig Life Evol Biosph 2006;36:605–16.10.1007/s11084-006-9033-6Search in Google Scholar PubMed

4. De Lucrezia D, Franchi M, Chiarabelli C, Gallori E, Luisi PL. Investigation of de novo totally random biosequences, part III: RNA Foster: a novel assay to investigate RNA folding structural properties. Chem Biodivers 2006;3:860–8.10.1002/cbdv.200690089Search in Google Scholar PubMed

5. Chiarabelli C, Vrijbloed JW, Thomas RM, Luisi PL. Investigation of de novo totally random biosequences, part I: a general method for in vitro selection of folded domains from a random polypeptide library displayed on phage. Chem Biodivers 2006;3:827–39.10.1002/cbdv.200690087Search in Google Scholar PubMed

6. Minervini G, Evangelista G, Polticelli F, Piwowar M, Kochanczyk M, Flis L, et al. Never born proteins as a test case for ab initio protein structures prediction. Bioinformation 2008;3:177–9.10.6026/97320630003177Search in Google Scholar PubMed PubMed Central

7. Chessari S, Thomas R, Polticelli F, Luisi PL. The production of de novo folded proteins by a stepwise chain elongation: a model for prebiotic chemical evolution of macromolecular sequences. Chem Biodivers 2006;3:1202–10.10.1002/cbdv.200690121Search in Google Scholar PubMed

8. Chiarabelli C, Stano P, Anella F, Carrara P, Luisi PL. Approaches to chemical synthetic biology. FEBS Lett 2012;586:2138–45.10.1016/j.febslet.2012.01.014Search in Google Scholar PubMed

9. Prymula K, Piwowar M, Kochanczyk M, Flis L, Malawski M, Szepieniec T, et al. In silico structural study of random amino acid sequence proteins not present in nature. Chem Biodivers 2009;6:2311–36.10.1002/cbdv.200800338Search in Google Scholar PubMed

10. Minervini G, Evangelista G, Villanova L, Slanzi D, De Lucrezia D, Poli I, et al. Massive non-natural proteins structure prediction using grid technologies. BMC Bioinform 2009;10:S22.10.1186/1471-2105-10-S6-S22Search in Google Scholar PubMed PubMed Central

11. Piwowar M, Banach M, Konieczny L, Roterman I. Hydrophobic core formation in protein complex of cathepsin. J Biomol Struct Dyn 2014;32:1023–32.10.1080/07391102.2013.801784Search in Google Scholar

12. Liu Y, Kuhlman B. RosettaDesign server for protein design. Nucleic Acids Res 2006;34:W235–8.10.1093/nar/gkl163Search in Google Scholar

13. Bradley P, Chivian D, Meiler J, Misura KM, Rohl CA, Schief WR, et al. Rosetta predictions in CASP5: successes, failures, and prospects for complete automation. Proteins 2003;53:457–68.10.1002/prot.10552Search in Google Scholar

14. Malawski M, Szepieniec T, Kochanczyk M, Piwowar M, Roterman I. An approach to protein folding on the Grid – {EUChinaGRID} experience. Bio-Algorithms Med-Syst 2007;3:45–50.Search in Google Scholar

15. Prymula K, Piwowar M, Kochańczyk M, Flis Ł, Malawski M, Szepieniec T, et al. Large scale computing to search for pharmacologically active proteins. In: KU KDM 2010: third ACC Cyfronet AGH users, 2010:12–3.Search in Google Scholar

16. Chiarabelli C, Vrijbloed JW, De Lucrezia D, Thomas RM, Stano P, Polticelli F, et al. Investigation of de novo totally random biosequences, part II: on the folding frequency in a totally random library of de novo proteins obtained by phage display. Chem Biodivers 2006;3:840–59.10.1002/cbdv.200690088Search in Google Scholar

17. Jurkowski W, Brylinski M, Konieczny L, Roterman I. Lysozyme folded in silico according to the limited conformational sub-space. J Biomol Struct Dyn 2004;22:149–58.10.1080/07391102.2004.10506991Search in Google Scholar

18. Brylinski M, Konieczny L, Roterman I. Fuzzy-oil-drop hydrophobic force field – a model to represent late-stage folding (in silico) of lysozyme. J Biomol Struct Dyn 2006;23:519–28.10.1080/07391102.2006.10507076Search in Google Scholar

19. Altschul SF, Gish W, Miller W, Myers, EW, Lipman DJ. Basic local alignment search tool. J Mol Biol 1990;215:403–10.10.1016/S0022-2836(05)80360-2Search in Google Scholar

20. Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. J Mol Biol 1997;268:78–94.10.1006/jmbi.1997.0951Search in Google Scholar PubMed

21. Stadie H, Ernst M, Ferrando J, Mankel R, Wrona K. Monte Carlo mass production for the ZEUS experiment on the grid. Nucl Instrum Meth A 2006;559:43–47.10.1016/j.nima.2005.11.112Search in Google Scholar

22. R Core Team. R: a language and environment for statistical computing. Vienna, Austria 2015.Search in Google Scholar

23. Peregrín-Alvarez JM, Parkinson J. The global landscape of sequence diversity. Genome Biol 2007;8:R238.10.1186/gb-2007-8-11-r238Search in Google Scholar PubMed PubMed Central

24. Culligan EP, Sleator RD, Marchesi JR, Hill C. Metagenomics and novel gene discovery: promise and potential for novel therapeutics. Virulence 2014;5:399–412.10.4161/viru.27208Search in Google Scholar

25. Kryukov K, Sumiyama K, Ikeo K, Gojobori T, Saitou N. A new database (GCD) on genome composition for eukaryote and prokaryote genome sequences and their initial analyses. Genome Biol Evol 2012;4:501–12.10.1093/gbe/evs026Search in Google Scholar

26. Craig JM, Bickmore WA. The distribution of CpG islands in mammalian chromosomes. Nat Genet 1994;7:376–82.10.1038/ng0794-376Search in Google Scholar

27. Gardiner K. Human genome organization. Curr Opin Genet Dev 1995;5:315–22.10.1016/0959-437X(95)80045-XSearch in Google Scholar

28. Bernardi G. The vertebrate genome: isochores and evolution. Mol Biol Evol 1993;10:186–204.Search in Google Scholar

29. Piwowar M, Meus J, Piwowar P, Wiśniowski Z, Stefaniak J, Roterman I. Tandemly repeated trinucleotides – comparative analysis. Acta Biochim Pol 2006;53:279–287.10.18388/abp.2006_3340Search in Google Scholar

30. Ulmschneider MB, Sansom MS. Amino acid distributions in integral membrane protein structures. Biochim Biophys Acta Biomembr 2001;1512:1–14.10.1016/S0005-2736(01)00299-1Search in Google Scholar

Received: 2017-3-9
Accepted: 2017-4-5
Published Online: 2017-5-11
Published in Print: 2017-6-27

©2017 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 20.4.2024 from https://www.degruyter.com/document/doi/10.1515/bams-2017-0006/html
Scroll to top button