Ess-NEXG: Predict Essential Proteins by Constructing a Weighted Protein Interaction Network Based on Node Embedding and XGBoost

Wang, Nian; Zeng, Min; Zhang, Jiashuai; Li, Yiming; Li, Min

doi:10.1007/978-3-030-57821-3_9

Nian Wang¹³,
Min Zeng¹³,
Jiashuai Zhang¹³,
Yiming Li¹³ &
…
Min Li¹³

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 12304))

Included in the following conference series:

International Symposium on Bioinformatics Research and Applications

1018 Accesses
4 Citations

Abstract

Essential proteins are indispensable in the development of organisms and cells. Identification of essential proteins lays the foundation for the discovery of drug targets and understanding of protein functions. Traditional biological experiments are expensive and time-consuming. Considering the limitations of biological experiments, many computational methods have been proposed to identify essential proteins. However, lots of noises in the protein-protein interaction (PPI) networks hamper the task of essential protein prediction. To reduce the effects of these noises, constructing a reliable PPI network by introducing other useful biological information to improve the performance of the prediction task is necessary. In this paper, we propose a model called Ess-NEXG which integrates RNA-Seq data, subcellular localization information, and orthologous information, for the prediction of essential proteins. In Ess-NEXG, we construct a reliable weighted network by using these data. Then we use the node2vec technique to capture the topological features of proteins in the constructed weighted PPI network. Last, the extracted features of proteins are put into a machine learning classifier to perform the prediction task. The experimental results show that Ess-NEXG outperforms other computational methods.

The authors wish it to be known that, in their opinion, the first two authors should be regarded as Joint First Authors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Winzeler, E.A., et al.: Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science 285, 901–906 (1999)
Article CAS PubMed Google Scholar
Clatworthy, A.E., Pierson, E., Hung, D.T.: Targeting virulence: a new paradigm for antimicrobial therapy. Nat. Chem. Biol. 3, 541 (2007)
Article CAS PubMed Google Scholar
Furney, S.J., Albà, M.M., López-Bigas, N.: Differences in the evolutionary history of disease genes affected by dominant or recessive mutations. BMC Genom. 7, 165 (2006). https://doi.org/10.1186/1471-2164-7-165
Article CAS Google Scholar
Zhao, J., Lei, X.: Detecting overlapping protein complexes in weighted PPI network based on overlay network chain in quotient space. BMC Bioinform. 20, 1–12 (2019)
Article Google Scholar
Roemer, T., et al.: Large-scale essential gene identification in Candida albicans and applications to antifungal drug discovery. Mol. Microbiol. 50, 167–181 (2003)
Article CAS PubMed Google Scholar
Cullen, L.M., Arndt, G.M.: Genome-wide screening for gene function using RNAi in mammalian cells. Immunol. Cell Biol. 83, 217–223 (2005)
Article CAS PubMed Google Scholar
Giaever, G., et al.: Functional profiling of the Saccharomyces cerevisiae genome. Nature 418, 387 (2002)
Article CAS PubMed Google Scholar
Jeong, H., Mason, S.P., Barabási, A.-L., Oltvai, Z.N.: Lethality and centrality in protein networks. Nature 411, 41 (2001)
Article CAS PubMed Google Scholar
Hahn, M.W., Kern, A.D.: Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks. Mol. Biol. Evol. 22, 803–806 (2004)
Article PubMed Google Scholar
Joy, M.P., Brock, A., Ingber, D.E., Huang, S.: High-betweenness proteins in the yeast protein interaction network. Biomed. Res. Int. 2005, 96–103 (2005)
Google Scholar
Wuchty, S., Stadler, P.F.: Centers of complex networks. J. Theor. Biol. 223, 45–53 (2003)
Article PubMed Google Scholar
Estrada, E., Rodriguez-Velazquez, J.A.: Subgraph centrality in complex networks. Phys. Rev. E 71, 056103 (2005)
Article Google Scholar
Bonacich, P.: Power and centrality: a family of measures. Am. J. Sociol. 92, 1170–1182 (1987)
Article Google Scholar
Stephenson, K., Zelen, M.: Rethinking centrality: methods and examples. Soc. Netw. 11, 1–37 (1989)
Article Google Scholar
Li, M., Wang, J., Chen, X., Wang, H., Pan, Y.: A local average connectivity-based method for identifying essential proteins from the network level. Comput. Biol. Chem. 35, 143–150 (2011)
Article PubMed Google Scholar
Li, M., Zhang, H., Wang, J.-X., Pan, Y.: A new essential protein discovery method based on the integration of protein-protein interaction and gene expression data. BMC Syst. Biol. 6, 15 (2012). https://doi.org/10.1186/1752-0509-6-15
Article CAS PubMed PubMed Central Google Scholar
Peng, W., Wang, J., Cheng, Y., Lu, Y., Wu, F., Pan, Y.: UDoNC: an algorithm for identifying essential proteins based on protein domains and protein-protein interaction networks. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 12, 276–288 (2015)
Article CAS Google Scholar
Peng, W., Wang, J., Wang, W., Liu, Q., Wu, F.-X., Pan, Y.: Iteration method for predicting essential proteins based on orthology and protein-protein interaction networks. BMC Syst. Biol. 6, 87 (2012). https://doi.org/10.1186/1752-0509-6-87
Article CAS PubMed PubMed Central Google Scholar
Qin, C., Sun, Y., Dong, Y.: A new computational strategy for identifying essential proteins based on network topological properties and biological information. PLoS ONE 12, e0182031 (2017)
Article PubMed PubMed Central Google Scholar
Hwang, Y.-C., Lin, C.-C., Chang, J.-Y., Mori, H., Juan, H.-F., Huang, H.-C.: Predicting essential genes based on network and sequence analysis. Mol. BioSyst. 5, 1672–1678 (2009)
Article CAS PubMed Google Scholar
Cheng, J., et al.: Training set selection for the prediction of essential genes. PLoS ONE 9, e86805 (2014)
Article PubMed PubMed Central Google Scholar
Zhong, J., Wang, J., Peng, W., Zhang, Z., Pan, Y.: Prediction of essential proteins based on gene expression programming. BMC Genom. 14, S7 (2013). https://doi.org/10.1186/1471-2164-14-S4-S7
Article Google Scholar
Acencio, M.L., Lemke, N.: Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information. BMC Bioinform. 10, 290 (2009). https://doi.org/10.1186/1471-2105-10-290
Article CAS Google Scholar
Zeng, M., et al.: A deep learning framework for identifying essential proteins by integrating multiple types of biological information. IEEE/ACM Trans. Comput. Biol. Bioinform. (2019). https://doi.org/10.1109/TCBB.2019.2897679
Zeng, M., Li, M., Wu, F.-X., Li, Y., Pan, Y.: DeepEP: a deep learning framework for identifying essential proteins. BMC Bioinform. 20, 506 (2019). https://doi.org/10.1186/s12859-019-3076-y
Article CAS Google Scholar
Zeng, M., Li, M., Fei, Z., Wu, F.-X., Li, Y., Pan, Y.: A deep learning framework for identifying essential proteins based on protein-protein interaction network and gene expression data. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 583–588. IEEE (2018)
Google Scholar
Zhang, F., et al.: A deep learning framework for gene ontology annotations with sequence-and network-based information. IEEE/ACM Trans. Comput. Biol. Bioinform. (2020). https://doi.org/10.1109/TCBB.2020.2968882
Zhang, F., Song, H., Zeng, M., Li, Y., Kurgan, L., Li, M.: DeepFunc: a deep learning framework for accurate prediction of protein functions from protein sequences and interactions. Proteomics 19, 1900019 (2019)
Article Google Scholar
Von Mering, C., et al.: Comparative assessment of large-scale data sets of protein–protein interactions. Nature 417, 399 (2002)
Article Google Scholar
Stark, C., Breitkreutz, B.-J., Reguly, T., Boucher, L., Breitkreutz, A., Tyers, M.: BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 34, D535–D539 (2006)
Article CAS PubMed Google Scholar
Li, M., Li, W., Wu, F.-X., Pan, Y., Wang, J.: Identifying essential proteins based on sub-network partition and prioritization by integrating subcellular localization information. J. Theor. Biol. 447, 65–73 (2018)
Article CAS PubMed Google Scholar
Tang, X., Wang, J., Zhong, J., Pan, Y.: Predicting essential proteins based on weighted degree centrality. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 11, 407–418 (2014)
Article Google Scholar
Lei, X., Zhao, J., Fujita, H., Zhang, A.: Predicting essential proteins based on RNA-Seq, subcellular localization and GO annotation datasets. Knowl.-Based Syst. 151, 136–148 (2018)
Article Google Scholar
Zhao, J., Lei, X.: Predicting essential proteins based on second-order neighborhood information and information entropy. IEEE Access 7, 136012–136022 (2019)
Article Google Scholar
Mewes, H.-W., et al.: MIPS: a database for genomes and protein sequences. Nucleic Acids Res. 30, 31–34 (2002)
Article CAS PubMed PubMed Central Google Scholar
Cherry, J.M., et al.: SGD: saccharomyces genome database. Nucleic Acids Res. 26, 73–79 (1998)
Article CAS PubMed PubMed Central Google Scholar
Zhang, R., Lin, Y.: DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes. Nucleic Acids Res. 37, D455–D458 (2008)
Article PubMed PubMed Central Google Scholar
Chen, W.-H., Minguez, P., Lercher, M.J., Bork, P.: OGEE: an online gene essentiality database. Nucleic Acids Res. 40, D901–D906 (2011)
Article PubMed PubMed Central Google Scholar
Zhao, J., Lei, X., Wu, F.-X.: Predicting protein complexes in weighted dynamic PPI networks based on ICSC. Complexity 2017, 1–11 (2017)
Google Scholar
Binder, J.X., et al.: COMPARTMENTS: unification and visualization of protein subcellular localization evidence. Database 2014 (2014)
Google Scholar
Östlund, G., et al.: InParanoid 7: new algorithms and tools for eukaryotic orthology analysis. Nucleic Acids Res. 38, D196–D203 (2009)
Article PubMed PubMed Central Google Scholar
Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864. ACM (2016)
Google Scholar
Goldberg, Y., Levy, O.: word2vec explained: deriving Mikolov et al.’s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722 (2014)
Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710. ACM (2014)
Google Scholar
Chen, W., Fu, K., Zuo, J., Zheng, X., Huang, T., Ren, W.: Radar emitter classification for large data set based on weighted-xgboost. IET Radar Sonar Navig. 11, 1203–1207 (2017)
Article Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001). https://doi.org/10.1023/A:1010933404324
Article Google Scholar
Freund, Y., Schapire, R., Abe, N.: A short introduction to boosting. J.-Japn. Soc. Artif. Intell. 14, 1612 (1999)
Google Scholar
Zeng, M., Li, M., Fei, Z., Yu, Y., Pan, Y., Wang, J.: Automatic ICD-9 coding via deep transfer learning. Neurocomputing 324, 43–50 (2019)
Article Google Scholar
Zeng, M., Zhang, F., Wu, F.-X., Li, Y., Wang, J., Li, M.: Protein–protein interaction site prediction through combining local and global features with deep neural networks. Bioinformatics 36, 1114–1120 (2020)
Article PubMed Google Scholar

Download references

Acknowledgement

This work was supported in part by the National Natural Science Foundation of China under Grants (No. 61832019), the 111 Project (No. B18059), Hunan Provincial Science and Technology Program (2018WK4001).

Author information

Authors and Affiliations

School of Computer Science and Engineering, Central South University, Changsha, 410083, People’s Republic of China
Nian Wang, Min Zeng, Jiashuai Zhang, Yiming Li & Min Li

Authors

Nian Wang
View author publications
You can also search for this author in PubMed Google Scholar
Min Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Jiashuai Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yiming Li
View author publications
You can also search for this author in PubMed Google Scholar
Min Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Min Li .

Editor information

Editors and Affiliations

Department of Computer Science, Georgia State University, Atlanta, GA, USA
Zhipeng Cai
University of Connecticut, Storrs Mansfield, CT, USA
Ion Mandoiu
Florida International University, Miami, FL, USA
Giri Narasimhan
Georgia State University, Atlanta, GA, USA
Pavel Skums
University of North Texas, Denton, TX, USA
Xuan Guo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, N., Zeng, M., Zhang, J., Li, Y., Li, M. (2020). Ess-NEXG: Predict Essential Proteins by Constructing a Weighted Protein Interaction Network Based on Node Embedding and XGBoost. In: Cai, Z., Mandoiu, I., Narasimhan, G., Skums, P., Guo, X. (eds) Bioinformatics Research and Applications. ISBRA 2020. Lecture Notes in Computer Science(), vol 12304. Springer, Cham. https://doi.org/10.1007/978-3-030-57821-3_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-57821-3_9
Published: 18 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57820-6
Online ISBN: 978-3-030-57821-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics