Abstract
Essential proteins are indispensable in the development of organisms and cells. Identification of essential proteins lays the foundation for the discovery of drug targets and understanding of protein functions. Traditional biological experiments are expensive and time-consuming. Considering the limitations of biological experiments, many computational methods have been proposed to identify essential proteins. However, lots of noises in the protein-protein interaction (PPI) networks hamper the task of essential protein prediction. To reduce the effects of these noises, constructing a reliable PPI network by introducing other useful biological information to improve the performance of the prediction task is necessary. In this paper, we propose a model called Ess-NEXG which integrates RNA-Seq data, subcellular localization information, and orthologous information, for the prediction of essential proteins. In Ess-NEXG, we construct a reliable weighted network by using these data. Then we use the node2vec technique to capture the topological features of proteins in the constructed weighted PPI network. Last, the extracted features of proteins are put into a machine learning classifier to perform the prediction task. The experimental results show that Ess-NEXG outperforms other computational methods.
The authors wish it to be known that, in their opinion, the first two authors should be regarded as Joint First Authors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Winzeler, E.A., et al.: Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science 285, 901–906 (1999)
Clatworthy, A.E., Pierson, E., Hung, D.T.: Targeting virulence: a new paradigm for antimicrobial therapy. Nat. Chem. Biol. 3, 541 (2007)
Furney, S.J., Albà, M.M., López-Bigas, N.: Differences in the evolutionary history of disease genes affected by dominant or recessive mutations. BMC Genom. 7, 165 (2006). https://doi.org/10.1186/1471-2164-7-165
Zhao, J., Lei, X.: Detecting overlapping protein complexes in weighted PPI network based on overlay network chain in quotient space. BMC Bioinform. 20, 1–12 (2019)
Roemer, T., et al.: Large-scale essential gene identification in Candida albicans and applications to antifungal drug discovery. Mol. Microbiol. 50, 167–181 (2003)
Cullen, L.M., Arndt, G.M.: Genome-wide screening for gene function using RNAi in mammalian cells. Immunol. Cell Biol. 83, 217–223 (2005)
Giaever, G., et al.: Functional profiling of the Saccharomyces cerevisiae genome. Nature 418, 387 (2002)
Jeong, H., Mason, S.P., Barabási, A.-L., Oltvai, Z.N.: Lethality and centrality in protein networks. Nature 411, 41 (2001)
Hahn, M.W., Kern, A.D.: Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks. Mol. Biol. Evol. 22, 803–806 (2004)
Joy, M.P., Brock, A., Ingber, D.E., Huang, S.: High-betweenness proteins in the yeast protein interaction network. Biomed. Res. Int. 2005, 96–103 (2005)
Wuchty, S., Stadler, P.F.: Centers of complex networks. J. Theor. Biol. 223, 45–53 (2003)
Estrada, E., Rodriguez-Velazquez, J.A.: Subgraph centrality in complex networks. Phys. Rev. E 71, 056103 (2005)
Bonacich, P.: Power and centrality: a family of measures. Am. J. Sociol. 92, 1170–1182 (1987)
Stephenson, K., Zelen, M.: Rethinking centrality: methods and examples. Soc. Netw. 11, 1–37 (1989)
Li, M., Wang, J., Chen, X., Wang, H., Pan, Y.: A local average connectivity-based method for identifying essential proteins from the network level. Comput. Biol. Chem. 35, 143–150 (2011)
Li, M., Zhang, H., Wang, J.-X., Pan, Y.: A new essential protein discovery method based on the integration of protein-protein interaction and gene expression data. BMC Syst. Biol. 6, 15 (2012). https://doi.org/10.1186/1752-0509-6-15
Peng, W., Wang, J., Cheng, Y., Lu, Y., Wu, F., Pan, Y.: UDoNC: an algorithm for identifying essential proteins based on protein domains and protein-protein interaction networks. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 12, 276–288 (2015)
Peng, W., Wang, J., Wang, W., Liu, Q., Wu, F.-X., Pan, Y.: Iteration method for predicting essential proteins based on orthology and protein-protein interaction networks. BMC Syst. Biol. 6, 87 (2012). https://doi.org/10.1186/1752-0509-6-87
Qin, C., Sun, Y., Dong, Y.: A new computational strategy for identifying essential proteins based on network topological properties and biological information. PLoS ONE 12, e0182031 (2017)
Hwang, Y.-C., Lin, C.-C., Chang, J.-Y., Mori, H., Juan, H.-F., Huang, H.-C.: Predicting essential genes based on network and sequence analysis. Mol. BioSyst. 5, 1672–1678 (2009)
Cheng, J., et al.: Training set selection for the prediction of essential genes. PLoS ONE 9, e86805 (2014)
Zhong, J., Wang, J., Peng, W., Zhang, Z., Pan, Y.: Prediction of essential proteins based on gene expression programming. BMC Genom. 14, S7 (2013). https://doi.org/10.1186/1471-2164-14-S4-S7
Acencio, M.L., Lemke, N.: Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information. BMC Bioinform. 10, 290 (2009). https://doi.org/10.1186/1471-2105-10-290
Zeng, M., et al.: A deep learning framework for identifying essential proteins by integrating multiple types of biological information. IEEE/ACM Trans. Comput. Biol. Bioinform. (2019). https://doi.org/10.1109/TCBB.2019.2897679
Zeng, M., Li, M., Wu, F.-X., Li, Y., Pan, Y.: DeepEP: a deep learning framework for identifying essential proteins. BMC Bioinform. 20, 506 (2019). https://doi.org/10.1186/s12859-019-3076-y
Zeng, M., Li, M., Fei, Z., Wu, F.-X., Li, Y., Pan, Y.: A deep learning framework for identifying essential proteins based on protein-protein interaction network and gene expression data. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 583–588. IEEE (2018)
Zhang, F., et al.: A deep learning framework for gene ontology annotations with sequence-and network-based information. IEEE/ACM Trans. Comput. Biol. Bioinform. (2020). https://doi.org/10.1109/TCBB.2020.2968882
Zhang, F., Song, H., Zeng, M., Li, Y., Kurgan, L., Li, M.: DeepFunc: a deep learning framework for accurate prediction of protein functions from protein sequences and interactions. Proteomics 19, 1900019 (2019)
Von Mering, C., et al.: Comparative assessment of large-scale data sets of protein–protein interactions. Nature 417, 399 (2002)
Stark, C., Breitkreutz, B.-J., Reguly, T., Boucher, L., Breitkreutz, A., Tyers, M.: BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 34, D535–D539 (2006)
Li, M., Li, W., Wu, F.-X., Pan, Y., Wang, J.: Identifying essential proteins based on sub-network partition and prioritization by integrating subcellular localization information. J. Theor. Biol. 447, 65–73 (2018)
Tang, X., Wang, J., Zhong, J., Pan, Y.: Predicting essential proteins based on weighted degree centrality. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 11, 407–418 (2014)
Lei, X., Zhao, J., Fujita, H., Zhang, A.: Predicting essential proteins based on RNA-Seq, subcellular localization and GO annotation datasets. Knowl.-Based Syst. 151, 136–148 (2018)
Zhao, J., Lei, X.: Predicting essential proteins based on second-order neighborhood information and information entropy. IEEE Access 7, 136012–136022 (2019)
Mewes, H.-W., et al.: MIPS: a database for genomes and protein sequences. Nucleic Acids Res. 30, 31–34 (2002)
Cherry, J.M., et al.: SGD: saccharomyces genome database. Nucleic Acids Res. 26, 73–79 (1998)
Zhang, R., Lin, Y.: DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes. Nucleic Acids Res. 37, D455–D458 (2008)
Chen, W.-H., Minguez, P., Lercher, M.J., Bork, P.: OGEE: an online gene essentiality database. Nucleic Acids Res. 40, D901–D906 (2011)
Zhao, J., Lei, X., Wu, F.-X.: Predicting protein complexes in weighted dynamic PPI networks based on ICSC. Complexity 2017, 1–11 (2017)
Binder, J.X., et al.: COMPARTMENTS: unification and visualization of protein subcellular localization evidence. Database 2014 (2014)
Östlund, G., et al.: InParanoid 7: new algorithms and tools for eukaryotic orthology analysis. Nucleic Acids Res. 38, D196–D203 (2009)
Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864. ACM (2016)
Goldberg, Y., Levy, O.: word2vec explained: deriving Mikolov et al.’s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722 (2014)
Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710. ACM (2014)
Chen, W., Fu, K., Zuo, J., Zheng, X., Huang, T., Ren, W.: Radar emitter classification for large data set based on weighted-xgboost. IET Radar Sonar Navig. 11, 1203–1207 (2017)
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001). https://doi.org/10.1023/A:1010933404324
Freund, Y., Schapire, R., Abe, N.: A short introduction to boosting. J.-Japn. Soc. Artif. Intell. 14, 1612 (1999)
Zeng, M., Li, M., Fei, Z., Yu, Y., Pan, Y., Wang, J.: Automatic ICD-9 coding via deep transfer learning. Neurocomputing 324, 43–50 (2019)
Zeng, M., Zhang, F., Wu, F.-X., Li, Y., Wang, J., Li, M.: Protein–protein interaction site prediction through combining local and global features with deep neural networks. Bioinformatics 36, 1114–1120 (2020)
Acknowledgement
This work was supported in part by the National Natural Science Foundation of China under Grants (No. 61832019), the 111 Project (No. B18059), Hunan Provincial Science and Technology Program (2018WK4001).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, N., Zeng, M., Zhang, J., Li, Y., Li, M. (2020). Ess-NEXG: Predict Essential Proteins by Constructing a Weighted Protein Interaction Network Based on Node Embedding and XGBoost. In: Cai, Z., Mandoiu, I., Narasimhan, G., Skums, P., Guo, X. (eds) Bioinformatics Research and Applications. ISBRA 2020. Lecture Notes in Computer Science(), vol 12304. Springer, Cham. https://doi.org/10.1007/978-3-030-57821-3_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-57821-3_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57820-6
Online ISBN: 978-3-030-57821-3
eBook Packages: Computer ScienceComputer Science (R0)