Skip to main content

Computational Identification of Essential Genes in Prokaryotes and Eukaryotes

  • Conference paper
  • First Online:
  • 554 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 881))

Abstract

Several computational methods were proposed for the identification of essential genes (EGs). The machine learning based methods use features derived from the genetic sequences, gene-expression data, network topology, homology, and domain information. Except for the sequence-based features, the others require additional experimental data which is unavailable for under-studied and newly sequenced organisms. Hence, here, we propose a sequence-based identification of EGs. We performed gene essentiality predictions considering 15 bacteria, 1 archeaon, and 4 eukaryotes. Information-theoretic quantities, such as mutual information, conditional mutual information, entropy, Kullback-Leibler divergence, and Markov models, were used as features. In addition, with the hope of improving the prediction performance, other easily accessible sequence-based features related to stop codon usage, length, and GC content were included. For classification, the Random Forest algorithm was used. The performance of the proposed method is extensively evaluated by employing both intra- and cross-organism predictions. The obtained results were better than most of the previously published EG predictors which rely only on sequence information and comparable to those using additional features derived from network topology, homology, and gene-expression data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Koonin, E.V.: How many genes can make a cell: the minimal-gene-set concept 1. Annu. Rev. Genomics Hum. Genet. 1(1), 99–116 (2000)

    Article  Google Scholar 

  2. Itaya, M.: An estimation of minimal genome size required for life. FEBS Lett. 362(3), 257–260 (1995)

    Article  Google Scholar 

  3. Hutchison, C.A., Chuang, R.Y., Noskov, V.N., Assad-Garcia, N., Deerinck, T.J., Ellisman, M.H., Gill, J., Kannan, K., Karas, B.J., Ma, L., et al.: Design and synthesis of a minimal bacterial genome. Science 351(6280), aad6253 (2016)

    Article  Google Scholar 

  4. Chalker, A.F., Lunsford, R.D.: Rational identification of new antibacterial drug targets that are essential for viability using a genomics-based approach. Pharmacol. Ther. 95(1), 1–20 (2002)

    Article  Google Scholar 

  5. Lamichhane, G., Zignol, M., Blades, N.J., Geiman, D.E., Dougherty, A., Grosset, J., Broman, K.W., Bishai, W.R.: A postgenomic method for predicting essential genes at subsaturation levels of mutagenesis: application to Mycobacterium tuberculosis. Proc. Natl. Acad. Sci. 100(12), 7213–7218 (2003)

    Article  Google Scholar 

  6. Chen, L., Ge, X., Xu, P.: Identifying essential Streptococcus sanguinis genes using genome-wide deletion mutation. Gene Essentiality: Methods Protoc., 15–23 (2015)

    Google Scholar 

  7. Giaever, G., Chu, A.M., Ni, L., Connelly, C., Riles, L., Veronneau, S., Dow, S., Lucau-Danila, A., Anderson, K., Andre, B., et al.: Functional profiling of the Saccharomyces cerevisiae genome. Nature 418(6896), 387–391 (2002)

    Article  Google Scholar 

  8. Salama, N.R., Shepherd, B., Falkow, S.: Global transposon mutagenesis and essential gene analysis of Helicobacter pylori. J. Bacteriol. 186(23), 7926–7935 (2004)

    Article  Google Scholar 

  9. Cullen, L.M., Arndt, G.M.: Genome-wide screening for gene function using RNAi in mammalian cells. Immunol. Cell Biol. 83(3), 217–223 (2005)

    Article  Google Scholar 

  10. Blomen, V.A., Májek, P., Jae, L.T., Bigenzahn, J.W., Nieuwenhuis, J., Staring, J., Sacco, R., van Diemen, F.R., Olk, N., Stukalov, A., et al.: Gene essentiality and synthetic lethality in haploid human cells. Science 350(6264), 1092–1096 (2015)

    Article  Google Scholar 

  11. Hart, T., Chandrashekhar, M., Aregger, M., Steinhart, Z., Brown, K.R., MacLeod, G., Mis, M., Zimmermann, M., Fradet-Turcotte, A., Sun, S., et al.: High-resolution CRISPR screens reveal fitness genes and genotype-specific cancer liabilities. Cell 163(6), 1515–1526 (2015)

    Article  Google Scholar 

  12. Wang, T., Birsoy, K., Hughes, N.W., Krupczak, K.M., Post, Y., Wei, J.J., Lander, E.S., Sabatini, D.M.: Identification and characterization of essential genes in the human genome. Science 350(6264), 1096–1101 (2015)

    Article  Google Scholar 

  13. Mushegian, A.R., Koonin, E.V.: A minimal gene set for cellular life derived by comparison of complete bacterial genomes. Proc. Natl. Acad. Sci. 93(19), 10268–10273 (1996)

    Article  Google Scholar 

  14. Ning, L., Lin, H., Ding, H., Huang, J., Rao, N., Guo, F.: Predicting bacterial essential genes using only sequence composition information. Genet. Mol. Res. 13, 4564–4572 (2014)

    Article  Google Scholar 

  15. Song, K., Tong, T., Wu, F.: Predicting essential genes in prokaryotic genomes using a linear method: ZUPLS. Integr. Biol. 6(4), 460–469 (2014)

    Article  Google Scholar 

  16. Yu, Y., Yang, L., Liu, Z., Zhu, C.: Gene essentiality prediction based on fractal features and machine learning. Mol. BioSyst. 13(3), 577–584 (2017)

    Article  Google Scholar 

  17. Plaimas, K., Eils, R., König, R.: Identifying essential genes in bacterial metabolic networks with machine learning methods. BMC Syst. Biol. 4(1), 1 (2010)

    Article  Google Scholar 

  18. Acencio, M.L., Lemke, N.: Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information. BMC Bioinf. 10(1), 1 (2009)

    Article  Google Scholar 

  19. Lu, Y., Deng, J., Rhodes, J.C., Lu, H., Lu, L.J.: Predicting essential genes for identifying potential drug targets in Aspergillus fumigatus. Comput. Biol. Chem. 50, 29–40 (2014)

    Article  MathSciNet  Google Scholar 

  20. Cheng, J., Xu, Z., Wu, W., Zhao, L., Li, X., Liu, Y., Tao, S.: Training set selection for the prediction of essential genes. PLoS ONE 9(1), e86805 (2014)

    Article  Google Scholar 

  21. Palaniappan, K., Mukherjee, S.: Predicting essential genes across microbial genomes: a machine learning approach. In: 2011 10th International Conference on Machine Learning and Applications and Workshops (ICMLA), vol. 2, pp. 189–194. IEEE (2011)

    Google Scholar 

  22. Liu, X., Wang, B.J., Xu, L., Tang, H.L., Xu, G.Q.: Selection of key sequence-based features for prediction of essential genes in 31 diverse bacterial species. PLoS ONE 12(3), e0174638 (2017)

    Article  Google Scholar 

  23. Deng, J., Deng, L., Su, S., Zhang, M., Lin, X., Wei, L., Minai, A.A., Hassett, D.J., Lu, L.J.: Investigating the predictability of essential genes across distantly related organisms using an integrative approach. Nucleic Acids Res. 39(3), 795–807 (2011)

    Article  Google Scholar 

  24. Li, Y., Lv, Y., Li, X., Xiao, W., Li, C.: Sequence comparison and essential gene identification with new inter-nucleotide distance sequences. J. Theor. Biol. 418, 84–93 (2017)

    Article  MathSciNet  Google Scholar 

  25. Wei, W., Ning, L.W., Ye, Y.N., Guo, F.B.: Geptop: a gene essentiality prediction tool for sequenced bacterial genomes based on orthology and phylogeny. PLoS ONE 8(8), e72343 (2013)

    Article  Google Scholar 

  26. Guo, F.B., Dong, C., Hua, H.L., Liu, S., Luo, H., Zhang, H.W., Jin, Y.T., Zhang, K.Y.: Accurate prediction of human essential genes using only nucleotide composition and association information. Bioinformatics 33(12), 1758–1764 (2017)

    Article  Google Scholar 

  27. Sharp, P.M., Li, W.H.: The Codon Adaptation Index-a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 15(3), 1281–1295 (1987)

    Article  Google Scholar 

  28. Cheng, J., Wu, W., Zhang, Y., Li, X., Jiang, X., Wei, G., Tao, S.: A new computational strategy for predicting essential genes. BMC Genom. 14(1), 910 (2013)

    Article  Google Scholar 

  29. Chen, Y., Xu, D.: Understanding protein dispensability through machine-learning analysis of high-throughput data. Bioinformatics 21(5), 575–581 (2005)

    Article  Google Scholar 

  30. Seringhaus, M., Paccanaro, A., Borneman, A., Snyder, M., Gerstein, M.: Predicting essential genes in fungal genomes. Genome Res. 16(9), 1126–1135 (2006)

    Article  Google Scholar 

  31. Yuan, Y., Xu, Y., Xu, J., Ball, R.L., Liang, H.: Predicting the lethal phenotype of the knockout mouse by integrating comprehensive genomic data. Bioinformatics 28(9), 1246–1252 (2012)

    Article  Google Scholar 

  32. Lloyd, J.P., Seddon, A.E., Moghe, G.D., Simenc, M.C., Shiu, S.H.: Characteristics of plant essential genes allow for within-and between-species prediction of lethal mutant phenotypes. Plant Cell 27(8), 2133–2147 (2015)

    Article  Google Scholar 

  33. Guo, F.B., Ou, H.Y., Zhang, C.T.: ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes. Nucleic Acids Res. 31(6), 1780–1789 (2003)

    Article  Google Scholar 

  34. Nigatu, D., Henkel, W.: Prediction of essential genes based on machine learning and information theoretic features. In: Proceedings of BIOSTEC 2017 - BIOINFORMATICS, pp. 81–92 (2017)

    Google Scholar 

  35. Nigatu, D., Henkel, W., Sobetzko, P., Muskhelishvili, G.: Relationship between digital information and thermodynamic stability in bacterial genomes. EURASIP J. Bioinf. Syst. Biol. 2016(1), 1 (2016)

    Article  Google Scholar 

  36. Bauer, M., Schuster, S.M., Sayood, K.: The average mutual information profile as a genomic signature. BMC Bioinf. 9(1), 1 (2008)

    Article  Google Scholar 

  37. Date, S.V., Marcotte, E.M.: Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages. Nat. Biotechnol. 21(9), 1055–1062 (2003)

    Article  Google Scholar 

  38. Hagenauer, J., Dawy, Z., Göbel, B., Hanus, P., Mueller, J.: Genomic analysis using methods from information theory. In: Information Theory Workshop, pp. 55–59. IEEE (2004)

    Google Scholar 

  39. Luo, H., Lin, Y., Gao, F., Zhang, C.T., Zhang, R.: DEG 10, an update of the database of essential genes that includes both protein-coding genes and noncoding genomic elements. Nucleic Acids Res. 42(D1), D574–D580 (2014)

    Article  Google Scholar 

  40. Chen, W.H., Minguez, P., Lercher, M.J., Bork, P.: OGEE: an online gene essentiality database. Nucleic Acids Res. 40(D1), D901–D906 (2011)

    Article  Google Scholar 

  41. Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, Hoboken (2012)

    MATH  Google Scholar 

  42. Shannon, C.: A mathematical theory of communication. Bell System Technical Journal 27, 379–423, 623–656 (1948). Mathematical Reviews (MathSciNet): MR10, 133e

    Article  MathSciNet  Google Scholar 

  43. SantaLucia, J.: A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. Proc. Natl. Acad. Sci. 95(4), 1460–1465 (1998)

    Article  Google Scholar 

  44. Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)

    Article  MathSciNet  Google Scholar 

  45. Tong, H.: Determination of the order of a Markov chain by Akaike’s information criterion. J. Appl. Probab. 12, 488–497 (1975)

    Article  MathSciNet  Google Scholar 

  46. Katz, R.W.: On some criteria for estimating the order of a Markov chain. Technometrics 23(3), 243–249 (1981)

    Article  MathSciNet  Google Scholar 

  47. Peres, Y., Shields, P.: Two new Markov order estimators. ArXiv Mathematics e-prints, June 2005

    Google Scholar 

  48. Dalevi, D., Dubhashi, D.: The Peres-Shields order estimator for fixed and variable length Markov models with applications to DNA sequence similarity. In: Casadio, R., Myers, G. (eds.) WABI 2005. LNCS, vol. 3692, pp. 291–302. Springer, Heidelberg (2005). https://doi.org/10.1007/11557067_24

    Chapter  Google Scholar 

  49. Menéndez, M., Pardo, L., Pardo, M., Zografos, K.: Testing the order of Markov dependence in DNA sequences. Methodol. Comput. Appl. Probab. 13(1), 59–74 (2011)

    Article  MathSciNet  Google Scholar 

  50. Papapetrou, M., Kugiumtzis, D.: Markov chain order estimation with conditional mutual information. Physica A: Stat. Mech. Appl. 392(7), 1593–1601 (2013)

    Article  MathSciNet  Google Scholar 

  51. Papapetrou, M., Kugiumtzis, D.: Markov chain order estimation with parametric significance tests of conditional mutual information. Simul. Model. Pract. Theory 61, 1–13 (2016)

    Article  Google Scholar 

  52. Berthold, M.R., et al.: KNIME: the Konstanz information miner. In: Preisach, C., Burkhardt, H., Schmidt-Thieme, L., Decker, R. (eds.) GfKL 2007. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-78246-9_38

    Chapter  Google Scholar 

  53. Sarmiento, F., Mrázek, J., Whitman, W.B.: Genome-scale analysis of gene function in the hydrogenotrophic methanogenic archaeon Methanococcus maripaludis. Proc. Natl. Acad. Sci. 110(12), 4726–4731 (2013)

    Article  Google Scholar 

  54. Fraser, A.: Essential human genes. Cell Syst. 1(6), 381–382 (2015)

    Article  Google Scholar 

  55. Boone, C., Andrews, B.J.: The indispensable genome. Science 350(6264), 1028–1029 (2015)

    Article  Google Scholar 

  56. Dickinson, M.E., Flenniken, A.M., Ji, X., Teboul, L., Wong, M.D., White, J.K., Meehan, T.F., Weninger, W.J., Westerberg, H., Adissu, H., et al.: High-throughput discovery of novel developmental phenotypes. Nature 537(7621), 508 (2016)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dawit Nigatu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nigatu, D., Henkel, W. (2018). Computational Identification of Essential Genes in Prokaryotes and Eukaryotes. In: Peixoto, N., Silveira, M., Ali, H., Maciel, C., van den Broek, E. (eds) Biomedical Engineering Systems and Technologies. BIOSTEC 2017. Communications in Computer and Information Science, vol 881. Springer, Cham. https://doi.org/10.1007/978-3-319-94806-5_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-94806-5_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-94805-8

  • Online ISBN: 978-3-319-94806-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics