Skip to main content

Ess-NEXG: Predict Essential Proteins by Constructing a Weighted Protein Interaction Network Based on Node Embedding and XGBoost

  • Conference paper
  • First Online:
Bioinformatics Research and Applications (ISBRA 2020)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 12304))

Included in the following conference series:

Abstract

Essential proteins are indispensable in the development of organisms and cells. Identification of essential proteins lays the foundation for the discovery of drug targets and understanding of protein functions. Traditional biological experiments are expensive and time-consuming. Considering the limitations of biological experiments, many computational methods have been proposed to identify essential proteins. However, lots of noises in the protein-protein interaction (PPI) networks hamper the task of essential protein prediction. To reduce the effects of these noises, constructing a reliable PPI network by introducing other useful biological information to improve the performance of the prediction task is necessary. In this paper, we propose a model called Ess-NEXG which integrates RNA-Seq data, subcellular localization information, and orthologous information, for the prediction of essential proteins. In Ess-NEXG, we construct a reliable weighted network by using these data. Then we use the node2vec technique to capture the topological features of proteins in the constructed weighted PPI network. Last, the extracted features of proteins are put into a machine learning classifier to perform the prediction task. The experimental results show that Ess-NEXG outperforms other computational methods.

The authors wish it to be known that, in their opinion, the first two authors should be regarded as Joint First Authors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Winzeler, E.A., et al.: Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science 285, 901–906 (1999)

    Article  CAS  PubMed  Google Scholar 

  2. Clatworthy, A.E., Pierson, E., Hung, D.T.: Targeting virulence: a new paradigm for antimicrobial therapy. Nat. Chem. Biol. 3, 541 (2007)

    Article  CAS  PubMed  Google Scholar 

  3. Furney, S.J., Albà, M.M., López-Bigas, N.: Differences in the evolutionary history of disease genes affected by dominant or recessive mutations. BMC Genom. 7, 165 (2006). https://doi.org/10.1186/1471-2164-7-165

    Article  CAS  Google Scholar 

  4. Zhao, J., Lei, X.: Detecting overlapping protein complexes in weighted PPI network based on overlay network chain in quotient space. BMC Bioinform. 20, 1–12 (2019)

    Article  Google Scholar 

  5. Roemer, T., et al.: Large-scale essential gene identification in Candida albicans and applications to antifungal drug discovery. Mol. Microbiol. 50, 167–181 (2003)

    Article  CAS  PubMed  Google Scholar 

  6. Cullen, L.M., Arndt, G.M.: Genome-wide screening for gene function using RNAi in mammalian cells. Immunol. Cell Biol. 83, 217–223 (2005)

    Article  CAS  PubMed  Google Scholar 

  7. Giaever, G., et al.: Functional profiling of the Saccharomyces cerevisiae genome. Nature 418, 387 (2002)

    Article  CAS  PubMed  Google Scholar 

  8. Jeong, H., Mason, S.P., Barabási, A.-L., Oltvai, Z.N.: Lethality and centrality in protein networks. Nature 411, 41 (2001)

    Article  CAS  PubMed  Google Scholar 

  9. Hahn, M.W., Kern, A.D.: Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks. Mol. Biol. Evol. 22, 803–806 (2004)

    Article  PubMed  Google Scholar 

  10. Joy, M.P., Brock, A., Ingber, D.E., Huang, S.: High-betweenness proteins in the yeast protein interaction network. Biomed. Res. Int. 2005, 96–103 (2005)

    Google Scholar 

  11. Wuchty, S., Stadler, P.F.: Centers of complex networks. J. Theor. Biol. 223, 45–53 (2003)

    Article  PubMed  Google Scholar 

  12. Estrada, E., Rodriguez-Velazquez, J.A.: Subgraph centrality in complex networks. Phys. Rev. E 71, 056103 (2005)

    Article  Google Scholar 

  13. Bonacich, P.: Power and centrality: a family of measures. Am. J. Sociol. 92, 1170–1182 (1987)

    Article  Google Scholar 

  14. Stephenson, K., Zelen, M.: Rethinking centrality: methods and examples. Soc. Netw. 11, 1–37 (1989)

    Article  Google Scholar 

  15. Li, M., Wang, J., Chen, X., Wang, H., Pan, Y.: A local average connectivity-based method for identifying essential proteins from the network level. Comput. Biol. Chem. 35, 143–150 (2011)

    Article  PubMed  Google Scholar 

  16. Li, M., Zhang, H., Wang, J.-X., Pan, Y.: A new essential protein discovery method based on the integration of protein-protein interaction and gene expression data. BMC Syst. Biol. 6, 15 (2012). https://doi.org/10.1186/1752-0509-6-15

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Peng, W., Wang, J., Cheng, Y., Lu, Y., Wu, F., Pan, Y.: UDoNC: an algorithm for identifying essential proteins based on protein domains and protein-protein interaction networks. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 12, 276–288 (2015)

    Article  CAS  Google Scholar 

  18. Peng, W., Wang, J., Wang, W., Liu, Q., Wu, F.-X., Pan, Y.: Iteration method for predicting essential proteins based on orthology and protein-protein interaction networks. BMC Syst. Biol. 6, 87 (2012). https://doi.org/10.1186/1752-0509-6-87

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Qin, C., Sun, Y., Dong, Y.: A new computational strategy for identifying essential proteins based on network topological properties and biological information. PLoS ONE 12, e0182031 (2017)

    Article  PubMed  PubMed Central  Google Scholar 

  20. Hwang, Y.-C., Lin, C.-C., Chang, J.-Y., Mori, H., Juan, H.-F., Huang, H.-C.: Predicting essential genes based on network and sequence analysis. Mol. BioSyst. 5, 1672–1678 (2009)

    Article  CAS  PubMed  Google Scholar 

  21. Cheng, J., et al.: Training set selection for the prediction of essential genes. PLoS ONE 9, e86805 (2014)

    Article  PubMed  PubMed Central  Google Scholar 

  22. Zhong, J., Wang, J., Peng, W., Zhang, Z., Pan, Y.: Prediction of essential proteins based on gene expression programming. BMC Genom. 14, S7 (2013). https://doi.org/10.1186/1471-2164-14-S4-S7

    Article  Google Scholar 

  23. Acencio, M.L., Lemke, N.: Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information. BMC Bioinform. 10, 290 (2009). https://doi.org/10.1186/1471-2105-10-290

    Article  CAS  Google Scholar 

  24. Zeng, M., et al.: A deep learning framework for identifying essential proteins by integrating multiple types of biological information. IEEE/ACM Trans. Comput. Biol. Bioinform. (2019). https://doi.org/10.1109/TCBB.2019.2897679

  25. Zeng, M., Li, M., Wu, F.-X., Li, Y., Pan, Y.: DeepEP: a deep learning framework for identifying essential proteins. BMC Bioinform. 20, 506 (2019). https://doi.org/10.1186/s12859-019-3076-y

    Article  CAS  Google Scholar 

  26. Zeng, M., Li, M., Fei, Z., Wu, F.-X., Li, Y., Pan, Y.: A deep learning framework for identifying essential proteins based on protein-protein interaction network and gene expression data. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 583–588. IEEE (2018)

    Google Scholar 

  27. Zhang, F., et al.: A deep learning framework for gene ontology annotations with sequence-and network-based information. IEEE/ACM Trans. Comput. Biol. Bioinform. (2020). https://doi.org/10.1109/TCBB.2020.2968882

  28. Zhang, F., Song, H., Zeng, M., Li, Y., Kurgan, L., Li, M.: DeepFunc: a deep learning framework for accurate prediction of protein functions from protein sequences and interactions. Proteomics 19, 1900019 (2019)

    Article  Google Scholar 

  29. Von Mering, C., et al.: Comparative assessment of large-scale data sets of protein–protein interactions. Nature 417, 399 (2002)

    Article  Google Scholar 

  30. Stark, C., Breitkreutz, B.-J., Reguly, T., Boucher, L., Breitkreutz, A., Tyers, M.: BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 34, D535–D539 (2006)

    Article  CAS  PubMed  Google Scholar 

  31. Li, M., Li, W., Wu, F.-X., Pan, Y., Wang, J.: Identifying essential proteins based on sub-network partition and prioritization by integrating subcellular localization information. J. Theor. Biol. 447, 65–73 (2018)

    Article  CAS  PubMed  Google Scholar 

  32. Tang, X., Wang, J., Zhong, J., Pan, Y.: Predicting essential proteins based on weighted degree centrality. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 11, 407–418 (2014)

    Article  Google Scholar 

  33. Lei, X., Zhao, J., Fujita, H., Zhang, A.: Predicting essential proteins based on RNA-Seq, subcellular localization and GO annotation datasets. Knowl.-Based Syst. 151, 136–148 (2018)

    Article  Google Scholar 

  34. Zhao, J., Lei, X.: Predicting essential proteins based on second-order neighborhood information and information entropy. IEEE Access 7, 136012–136022 (2019)

    Article  Google Scholar 

  35. Mewes, H.-W., et al.: MIPS: a database for genomes and protein sequences. Nucleic Acids Res. 30, 31–34 (2002)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Cherry, J.M., et al.: SGD: saccharomyces genome database. Nucleic Acids Res. 26, 73–79 (1998)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Zhang, R., Lin, Y.: DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes. Nucleic Acids Res. 37, D455–D458 (2008)

    Article  PubMed  PubMed Central  Google Scholar 

  38. Chen, W.-H., Minguez, P., Lercher, M.J., Bork, P.: OGEE: an online gene essentiality database. Nucleic Acids Res. 40, D901–D906 (2011)

    Article  PubMed  PubMed Central  Google Scholar 

  39. Zhao, J., Lei, X., Wu, F.-X.: Predicting protein complexes in weighted dynamic PPI networks based on ICSC. Complexity 2017, 1–11 (2017)

    Google Scholar 

  40. Binder, J.X., et al.: COMPARTMENTS: unification and visualization of protein subcellular localization evidence. Database 2014 (2014)

    Google Scholar 

  41. Östlund, G., et al.: InParanoid 7: new algorithms and tools for eukaryotic orthology analysis. Nucleic Acids Res. 38, D196–D203 (2009)

    Article  PubMed  PubMed Central  Google Scholar 

  42. Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864. ACM (2016)

    Google Scholar 

  43. Goldberg, Y., Levy, O.: word2vec explained: deriving Mikolov et al.’s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722 (2014)

  44. Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710. ACM (2014)

    Google Scholar 

  45. Chen, W., Fu, K., Zuo, J., Zheng, X., Huang, T., Ren, W.: Radar emitter classification for large data set based on weighted-xgboost. IET Radar Sonar Navig. 11, 1203–1207 (2017)

    Article  Google Scholar 

  46. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001). https://doi.org/10.1023/A:1010933404324

    Article  Google Scholar 

  47. Freund, Y., Schapire, R., Abe, N.: A short introduction to boosting. J.-Japn. Soc. Artif. Intell. 14, 1612 (1999)

    Google Scholar 

  48. Zeng, M., Li, M., Fei, Z., Yu, Y., Pan, Y., Wang, J.: Automatic ICD-9 coding via deep transfer learning. Neurocomputing 324, 43–50 (2019)

    Article  Google Scholar 

  49. Zeng, M., Zhang, F., Wu, F.-X., Li, Y., Wang, J., Li, M.: Protein–protein interaction site prediction through combining local and global features with deep neural networks. Bioinformatics 36, 1114–1120 (2020)

    Article  PubMed  Google Scholar 

Download references

Acknowledgement

This work was supported in part by the National Natural Science Foundation of China under Grants (No. 61832019), the 111 Project (No. B18059), Hunan Provincial Science and Technology Program (2018WK4001).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Min Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, N., Zeng, M., Zhang, J., Li, Y., Li, M. (2020). Ess-NEXG: Predict Essential Proteins by Constructing a Weighted Protein Interaction Network Based on Node Embedding and XGBoost. In: Cai, Z., Mandoiu, I., Narasimhan, G., Skums, P., Guo, X. (eds) Bioinformatics Research and Applications. ISBRA 2020. Lecture Notes in Computer Science(), vol 12304. Springer, Cham. https://doi.org/10.1007/978-3-030-57821-3_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-57821-3_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-57820-6

  • Online ISBN: 978-3-030-57821-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics