Abstract
The essential protein detection on protein–protein interaction (PPI) network can not only promote the research of life science, but also have important applications in disease diagnosis and drug target cell identifying. A large number of computation-based essential protein detection algorithms have been presented recently. Most of those methods detect the essential proteins according to the centrality measures of the nodes in PPI networks. Those centrality-based essential protein detection methods only consider the topological information of the PPI networks and neglect the biological features of the proteins which are crucial in recognizing the essential proteins. This paper presents a random walk-based method named EPD-RW to identify essential proteins by integrating network topology and biological information extracted from GO (gene ontology) data, gene expression profiles, domain information and phylogenetic profile. EPD-RW uses both the topological structure of the PPI and biological information of the proteins to guide the random walk for computing their essentialness. An iterative method is presented to efficiently integrate the topological and biological features at each step of the random walk. We test our method EDP-RW by experiments on yeast PPI datasets. We also compare the test results of EDP-RW with those of other methods. The experimental results demonstrate that EPD-RW can achieve the best performance among all the methods tested. The biological illustration of the results shows that our random walk-based method effectively increases the accuracy of essential proteins detecting results, and the biological features of the proteins can greatly enhance the performance of essential protein detecting.










Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Campos TL, Korhonen PK, Gasser RB, Young ND (2019) An evaluation of machine learning approaches for the prediction of essential genes in eukaryotes using protein sequence-derived features. Comput Struct Biotechnol J 17:785–796
Chen L, Vitkup D (2006) Predicting genes for orphan metabolic activities using phylogenetic profiles. Genome Biol 7(2):R17
Consortium G.O (2014) Gene ontology consortium: going forward. Nucleic Acids Res 43(D1):D1049–D1056
Cullen LM, Arndt GM (2005) Genome-wide screening for gene function using RNAi in mammalian cells. Immunol Cell Biol 83(3):217–223
Gavin AC, Aloy P, Grandi P, Krause R, Boesche MM, Marzioch M, Edelmann A (2006) Proteome survey reveals modularity of the yeast cell machinery. Nature 440(7084):631–636
George G, Parambath SV, Bekshe Lokappa SB, Varkey J (2019) Construction of Parkinson’s disease marker-based weighted protein-protein interaction network for prioritization of co-expressed genes. Gene 697:67–77
Gustafson AM, Snitkin ES, Parker SC, DeLisi C, Kasif S (2006) Towards the identification of essential genes using targeted genome sequencing and comparative analysis. BMC Genom 7(1):265
Hart GT, Lee I, Marcotte EM (2007) A high-accuracy consensus map of yeast protein complexes reveals modular nature of gene essentiality. BMC Bioinform 8(1):236
Jeong H, Mason SP, Barabási AL, Oltvai ZN (2001) Lethality and centrality in protein networks. Nature 411(6833):41
Ji J, Lv J, Yang C, Zhang AD (2016) Detecting functional modules based on a multiple-grain model in large-scale protein-protein interaction networks. IEEE/ACM Trans Comput Biol Bioinf 13(4):610–622
Jiang Y, Wang Y, Peng W, Chen L, Sun H, Liang Y, Blanzieri E (2014) Essential protein identification based on essential protein–protein interaction prediction by integrated edge weights. In: IEEE international conference on bioinformatics and biomedicine (BIBM)
Jones P, Binns D, Chang HY, Fraser M, Li WZ et al (2014) InterProScan 5: genome-scale protein function classification. Bioinformatics 30(9):1236–1240
Kim W (2012) Prediction of essential proteins using topological properties in GO-pruned PPI network based on machine learning methods. Tsinghua Sci Technol 17(6):645–658
Kim W , Li M, Wang J X, Pan Y (2011) Essential protein discovery based on network motif and gene ontology. In: 2011 IEEE international conference on bioinformatics and biomedicine. IEEE Press
Krogan NJ, Cagney G, Yu H, Zhong G, Guo X, Ignatchenko A, Li J (2006) Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440(7084):637–643
Lei XJ, Zhao J, Fujita H, Zhang AD (2018) Predicting essential proteins based on RNA-Seq, subcellular localization and GO annotation datasets. Knowl-Based Syst 151:136–148
Li M, Zhang H, Wang J, Pan Y (2012) A new essential protein discovery method based on the integration of protein-protein interaction and gene expression data. BMC Syst Biol 6(1):15
Li M, Zheng R, Zhang H, Wang J, Pan Y (2014) Effective identification of essential proteins based on priori knowledge, network topology and gene expressions. Methods 2014:325–333
Li M, Lu Y, Wang JX, Wu FX, Pan Y (2015) A topology potential-based method for identifying essential proteins from PPI networks. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 12(2):372–383
Li G, Li M, Wang J, Wu J, Wu FX (2016) Pan Y (2016) Predicting essential proteins based on subcellular localization, orthology and PPI networks. BMC Bioinform 17(8):571–581
Li M, Li WK, Wu FX, Pan Y, Wang JX (2018) Identifying essential proteins based on sub-network partition and prioritization by integrating subcellular localization information. J Theor Biol 447:65–73
Li GS, Li M, Peng W, Li YH, Wang JX (2019a) A novel extended Pareto Optimality Consensus model for predicting essential proteins. J Theor Biol 480:141–149
Li M, Ni P, Chen X, Wang J, Wu F, Pan Y (2019b) Construction of refined protein interaction network for predicting essential proteins. IEEE/ACM Trans Comput Biol Bioinform 16(4):1386–1397
Liu G, Wong L, Chua HN (2009) Complex discovery from weighted PPI networks. Bioinformatics 25(15):1891–1897
Peng W, Wang JX, Wang WP, Liu Q, Wu FX, Pan Y (2012) Iteration method for predicting essential proteins based on orthology and protein-protein interaction networks. BMC Syst Biol 6(1):87
Peng W, Wang J, Cheng Y, Lu Y, Wu FX, Pan Y (2015) UDoNC: an algorithm for identifying essential proteins based on protein domains and protein-protein interaction networks. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 12(2):276–288
Ren J, Wang J, Li M, Wang H, Liu B (2011) Prediction of essential proteins by integration of PPI network topology and protein complexes information. In: Chen J, Wang J, Zelikovsky A (eds) Bioinformatics research and applications. ISBRA 2011. Lecture Notes in Computer Science, vol 6674. Springer, Berlin. https://doi.org/10.1007/978-3-642-21260-4_6
Roemer T, Jiang B, Davison J, Ketela T, Veillette K et al (2003) Large-scale essential gene identification in Candida albicans and applications to antifungal drug discovery. Mol Microbiol 50(1):167–181
Stevenson D, Zumajo-Cardona C (2018) From plant ontology to gene ontology and back. Curr Plant Biol 14:66–69
Tang X, Wang WX, Zhong JC, Pan Y (2014) Predicting essential proteins based on weighted degree centrality. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 11(2):407–418
Tang Y, Li M, Wang JX, Pan Y, Wu FX (2015) CytoNCA: a cytoscape plugin for centrality analysis and evaluation of protein interaction networks. Biosystems 127:67–72
Tu BP, Kudlicki A, Rowicka M, McKnight SL (2005) Logic of the yeast metabolic cycle: temporal compartmentalization of cellular processes. Science 310(5751):1152–1158
Vallabhajosyula RR, Chakravarti D, Lutfeali S, Ray A, Raval A (2009) Identifying hubs in protein interaction networks. PLoS ONE 4(4):e5344
Wang JZ, Du DZ, Payattakool R, Yu PS, Chen CF (2007) A new method to measure the semantic similarity of GO terms. Bioinformatics 23(10):1274–1281
Wang J, Li M, Wang H, Pan Y (2012) Identification of essential proteins based on edge clustering coefficient. IEEE/ACM Trans Comput Biol Bioinf 9(4):1070–1080
Wang J, Peng W, Wu FX (2013) Computational approaches to predicting essential proteins: a survey. PROTEOMICS Clin Appl 7(1–2):181–192
Xenarios I, Salwínski L, Duan XJ, Higney P, Kim SM, Eisenberg D (2002) DIP, the database of interacting proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res 30(1):303–305
Xiao Q, Wang J, Peng X, Wu F (2015) Pan Y (2015) Identifying essential proteins from active PPI networks constructed with dynamic gene expression. BMC Genom 16(S3):S1
Yi Q, Luo J (2015) Prediction of essential proteins based on local interaction density. IEEE/ACM Trans Comput Biol Bioinf 13(6):1170–1182
Zhang X, Xu J, Xiao WX (2013) A new method for the discovery of essential proteins. PLoS ONE 8(3):e58763
Zhang ZP, Ruan JS, Gao JZ, Wu FX (2019) Predicting essential proteins from protein-protein interactions using order statistics. J Theor Biol 480:274–283
Zhao B, Wang J, Li M, Wu F, Pan Y (2014) Prediction of essential proteins based on overlapping essential modules. IEEE Trans Nanobiosci 13(4):415–424
Zhao B, Wang J, Li X, Wu FX (2016) Essential protein discovery based on a combination of modularity and conservatism. Methods 110:54–63
Zhong J, Wang J, Peng W, Zhang Z (2013) Pan Y (2013) Prediction of essential proteins based on gene expression programming. BMC Genom 14(S4):S7
Acknowledgements
This research was supported in part by the Chinese National Natural Science Foundation under grant Nos. 61379066, 61702441, 61070047, 61379064, 61472344, 61402395, 61906100 and 61602202; Natural Science Foundation of Jiangsu Province under contracts BK20180822, BK20130452, BK2012672, BK2012128 and BK20140492; and Natural Science Foundation of Education Department of Jiangsu Province under contract 18KJB520040, 12KJB520019 and 13KJB520026.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical standards
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.
Informed consent
This article does not contain any studies with animals performed by any of the authors. Informed consent was obtained from all individual participants included in the study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ahmed, N.M., Chen, L., Li, B. et al. A random walk-based method for detecting essential proteins by integrating the topological and biological features of PPI network. Soft Comput 25, 8883–8903 (2021). https://doi.org/10.1007/s00500-021-05780-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-021-05780-8