ABSTRACT
Essential proteins play a vital role in survival as well as the reproduction of an organism. It can be identified by the proteinprotein interaction (PPI) network. By the characteristics of the interaction network, we can differentiate essential protein from non-essential ones. It is necessary to understand the PPI for getting knowledge of protein functions and the information they carry. With the knowledge of essential proteins, it is possible to identify disease, lethal proteins, and design drugs like antibiotics. As experimental methods are time-consuming, error-prone and laborious, the computational approach becomes popular to predict essential proteins. Many kinds of researches have been done to identify essential proteins. Among them, machine learning techniques are found more promising. In this paper, we have proposed a method to identify essential protein using eight features calculated from the PPI network. We prepare dataset and classify it with three classifiers named XGBoost (eXtreme Gradient Boosting) Tree, C 5.0 and an ensemble method that includes two classifiers (XGBoost Tree and C 5.0). After evaluating results, we found that C 5.0 outperforms the results of XGBoost Tree and ensemble method. We also compare the results with other existing methods and get a very good improvement over the others.
- Chiou-Yi Hor, Chang-Biau Yang, Zih-Jie Yang, Chiou-Ting Tseng (2013). Prediction of Protein Essentiality by the Support Vector Machine with Statistical Tests. Evolutionary Bioinformatics, 9, 387--416.Google ScholarCross Ref
- R. S. Kamath, A. G. Fraser, Y. Dong, G. Poulin, R. Durbin, M. Gotta, A. Kanapin, N. Le Bot Et al (2003). Systematic functional analysis of the Caenorhabditis elegans genome using RNAi. Nature, 421(6920), 231--237.Google ScholarCross Ref
- Zhang Xue, Acencio Marcio Luis, Lemke Ney (2016). Predicting Essential Genes and Proteins Based on Machine Learning and Network Topological Features: A Comprehensive Review. Frontiers in Physiology, 7(75), 1--11.Google Scholar
- Zhong J, Wang J, Peng W, et al (2015). A feature selection method for prediction essential protein. Tsinghua Science and Technology, 20(5), 491--499.Google ScholarCross Ref
- Fathima Shabnam C B and Sminu Izudheen (2016), UdoGeC: Essential Protein Prediction Using Domain And Gene Expression Profiles. ScienceDirect, 93, 1003--1009.Google Scholar
- T. Ito, T. Chiba, R. Ozawa, M. Yoshida, M. Hattori, and Y. Sakaki (2001). A comprehensive two-hybrid analysis to explore the yeast protein interactome. National Academy of Sciences, 98(8), 4569--4574.Google ScholarCross Ref
- O. Puig, F. Caspary, G. Rigaut, B. Rutz, E. Bouveret, E. Bragado-Nilsson, M. Wilm, and B. Sèraphin (2001). The tandem affinity purification (tap) method: A general procedure of protein complex purification. Methods, 24(3), 218--229.Google ScholarCross Ref
- Y. Ho, A. Gruhler, A. Heilbut, G. D. Bader, L. Moore, S.-L. Adams, A. Millar, P. Taylor, K. Bennett, and K. Boutilier (2002). Systematic identification of protein complexes in saccharomyces cerevisiae by mass spectrometry. Nature, 415(6868), 180--183.Google ScholarCross Ref
- Yanbin Wang, Zhu-Hong You, Shan Yang, Xiao Li, Tong-Hai Jiang and Xi Zhou (2019). A High Efficient Biological Language Model for Predicting Protein-Protein Interactions. Bioinformatics and Computational Biology, 8(2), 122.Google Scholar
- Partha S. Das, Sandip Chakroborty et. al. (2017). Machine Learning Based Prediction of essential Genes of Saccharomyces Cerevisiae Utilizing Protein Abundance as a Feature. International Journal of Current Research, 9(9), 56875--56878.Google Scholar
- G. del Rio, D. Koschtzki, and G. Coello (2009), How to identify essential genes from molecular networks?. BMC Systems Biology, 3, 102.Google ScholarCross Ref
- M. L. Acencio and N. Lemke (2009). Towards the prediction of essential genes by integration of network topology. BMC Bioinformatics, 10, 290.Google ScholarCross Ref
- J. Deng, L. Deng, S. Su, M. Zhang, X. Lin, L. Wei, A. A. Minai, D. J. Hassett, and L. J. Lu (2011). Investigating the predictability of essential genes across distantly related organisms using an integrative approach. Nucleic Acids Research, 39(3), 795--807.Google ScholarCross Ref
- Lu Y., Deng J., Rhodes J.C., Lu H., and Lu L.J. (2014). Predicting essential genes for identifying potential drug targets in Aspergillus fumigatus. ScienceDirect. 50, 29--40.Google Scholar
- Plaimas, K., Eils, R., and König, R. (2010). Identifying essential genes in bacterial metabolic networks with machine learning methods. BMC Systems Biology, 4, 56.Google ScholarCross Ref
- Tianqi Chen, Carlos Guestrin (2016). XGBoost: A Scalable Tree Boosting System. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785--794.Google ScholarDigital Library
- IBM Knowledge Center: C 5.0 Node: https://www.ibm.com/support/knowledgecenter/en/SS3RA7_15.0.0/com.ibm.spss.modeler.help/c50node_general.htm. [Accessed: 27/06/2019].Google Scholar
- IBM SPSS Modeler 18: https://www-01.ibm.com/support/docview.wss?uid=swg24039399 [Accessed: 2/07/2019].Google Scholar
- e!Ensembl: https://asia.ensembl.org/index.html [Accessed: 02/03/2019].Google Scholar
- Database of Essential Genes (DEG) Database: http://www.essentialgene.org/ [Accessed: 12/03/2019].Google Scholar
- Rapidminer Studio framework: https://rapidminer.com/ [Accessed: 13/03/2019].Google Scholar
- DIP Database: http://dip-mbi.ucla.edu/ [Accessed: 13/03/2019].Google Scholar
- YeastNe: https://www.inetbio.org/yeastnet/downloadnetwork.php [Accessed: 14/07/2019].Google Scholar
- Wei-Hua Chen, Pablo Minguez, Martin J. Lercher, Peer Bork (2011). OGEE: an online gene essentiality database. Nucleic Acid Research, 40(D1), D901-D906.Google Scholar
- A Getle Introduction to XGBoost for Applied Machine Learning: https://machielearningmastery.com/gentle-introduction-xgboost-applied-machine-learning/ [Accessed: 23/06/2019].Google Scholar
- Determiing Creditworthiness for Loan Applications using C 5.0 Trees: https://rpubs.com/cyobero/C50 [Accessed: 23/06/2019].Google Scholar
- Das, Partha S. Sandip Chakroborty, Mondal, Keshab C. Ghosh, Tapash C. and Pati and Bikas, R (2016). Protein disorderness based prediction of essential genes of Saccahromyces cerevisiae: a machine learning approach. International Journal of Current Research 8(5), 31156--31160.Google Scholar
- Jiancheg Zhong, Jianxin Wang, Wei Peng et. al. (2013). Prediction of essential proteins based on gene expression programming. BMC Genomics, 14, 57.Google ScholarCross Ref
- Jianxin ang, Min Li, Huan Wang, and Yi Pan (2012). Identification of Essential Proteins Basd on Edge Clustering Coefficient. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9(4), 1070--1080.Google ScholarDigital Library
Index Terms
- An Efficient Approach for Prediction of Essential Protein from PPI Network
Recommendations
Discovering essential proteins based on PPI network and protein complex
Most computational methods for identifying essential proteins focus on the topological centrality of protein-protein interaction PPI networks. However, these methods have limitations, such as the difficulty for identifying essential proteins with low ...
Discovering low-connectivity essential proteins based on protein-protein interaction network
Essential proteins are crucial for the survival of cellular life and they are also important for many applications, such as drug design and the defense against human pathogens. Existing experimental approaches to identify essential proteins are time-...
Prediction of essential proteins by integration of PPI network topology and protein complexes information
ISBRA'11: Proceedings of the 7th international conference on Bioinformatics research and applicationsIdentifying essential proteins is important for understanding the minimal requirements for cellular survival and development. Numerous computational methods have been proposed to identify essential proteins from protein-protein interaction (PPI) ...
Comments