skip to main content
10.1145/3377049.3377128acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccaConference Proceedingsconference-collections
research-article

An Efficient Approach for Prediction of Essential Protein from PPI Network

Authors Info & Claims
Published:20 March 2020Publication History

ABSTRACT

Essential proteins play a vital role in survival as well as the reproduction of an organism. It can be identified by the proteinprotein interaction (PPI) network. By the characteristics of the interaction network, we can differentiate essential protein from non-essential ones. It is necessary to understand the PPI for getting knowledge of protein functions and the information they carry. With the knowledge of essential proteins, it is possible to identify disease, lethal proteins, and design drugs like antibiotics. As experimental methods are time-consuming, error-prone and laborious, the computational approach becomes popular to predict essential proteins. Many kinds of researches have been done to identify essential proteins. Among them, machine learning techniques are found more promising. In this paper, we have proposed a method to identify essential protein using eight features calculated from the PPI network. We prepare dataset and classify it with three classifiers named XGBoost (eXtreme Gradient Boosting) Tree, C 5.0 and an ensemble method that includes two classifiers (XGBoost Tree and C 5.0). After evaluating results, we found that C 5.0 outperforms the results of XGBoost Tree and ensemble method. We also compare the results with other existing methods and get a very good improvement over the others.

References

  1. Chiou-Yi Hor, Chang-Biau Yang, Zih-Jie Yang, Chiou-Ting Tseng (2013). Prediction of Protein Essentiality by the Support Vector Machine with Statistical Tests. Evolutionary Bioinformatics, 9, 387--416.Google ScholarGoogle ScholarCross RefCross Ref
  2. R. S. Kamath, A. G. Fraser, Y. Dong, G. Poulin, R. Durbin, M. Gotta, A. Kanapin, N. Le Bot Et al (2003). Systematic functional analysis of the Caenorhabditis elegans genome using RNAi. Nature, 421(6920), 231--237.Google ScholarGoogle ScholarCross RefCross Ref
  3. Zhang Xue, Acencio Marcio Luis, Lemke Ney (2016). Predicting Essential Genes and Proteins Based on Machine Learning and Network Topological Features: A Comprehensive Review. Frontiers in Physiology, 7(75), 1--11.Google ScholarGoogle Scholar
  4. Zhong J, Wang J, Peng W, et al (2015). A feature selection method for prediction essential protein. Tsinghua Science and Technology, 20(5), 491--499.Google ScholarGoogle ScholarCross RefCross Ref
  5. Fathima Shabnam C B and Sminu Izudheen (2016), UdoGeC: Essential Protein Prediction Using Domain And Gene Expression Profiles. ScienceDirect, 93, 1003--1009.Google ScholarGoogle Scholar
  6. T. Ito, T. Chiba, R. Ozawa, M. Yoshida, M. Hattori, and Y. Sakaki (2001). A comprehensive two-hybrid analysis to explore the yeast protein interactome. National Academy of Sciences, 98(8), 4569--4574.Google ScholarGoogle ScholarCross RefCross Ref
  7. O. Puig, F. Caspary, G. Rigaut, B. Rutz, E. Bouveret, E. Bragado-Nilsson, M. Wilm, and B. Sèraphin (2001). The tandem affinity purification (tap) method: A general procedure of protein complex purification. Methods, 24(3), 218--229.Google ScholarGoogle ScholarCross RefCross Ref
  8. Y. Ho, A. Gruhler, A. Heilbut, G. D. Bader, L. Moore, S.-L. Adams, A. Millar, P. Taylor, K. Bennett, and K. Boutilier (2002). Systematic identification of protein complexes in saccharomyces cerevisiae by mass spectrometry. Nature, 415(6868), 180--183.Google ScholarGoogle ScholarCross RefCross Ref
  9. Yanbin Wang, Zhu-Hong You, Shan Yang, Xiao Li, Tong-Hai Jiang and Xi Zhou (2019). A High Efficient Biological Language Model for Predicting Protein-Protein Interactions. Bioinformatics and Computational Biology, 8(2), 122.Google ScholarGoogle Scholar
  10. Partha S. Das, Sandip Chakroborty et. al. (2017). Machine Learning Based Prediction of essential Genes of Saccharomyces Cerevisiae Utilizing Protein Abundance as a Feature. International Journal of Current Research, 9(9), 56875--56878.Google ScholarGoogle Scholar
  11. G. del Rio, D. Koschtzki, and G. Coello (2009), How to identify essential genes from molecular networks?. BMC Systems Biology, 3, 102.Google ScholarGoogle ScholarCross RefCross Ref
  12. M. L. Acencio and N. Lemke (2009). Towards the prediction of essential genes by integration of network topology. BMC Bioinformatics, 10, 290.Google ScholarGoogle ScholarCross RefCross Ref
  13. J. Deng, L. Deng, S. Su, M. Zhang, X. Lin, L. Wei, A. A. Minai, D. J. Hassett, and L. J. Lu (2011). Investigating the predictability of essential genes across distantly related organisms using an integrative approach. Nucleic Acids Research, 39(3), 795--807.Google ScholarGoogle ScholarCross RefCross Ref
  14. Lu Y., Deng J., Rhodes J.C., Lu H., and Lu L.J. (2014). Predicting essential genes for identifying potential drug targets in Aspergillus fumigatus. ScienceDirect. 50, 29--40.Google ScholarGoogle Scholar
  15. Plaimas, K., Eils, R., and König, R. (2010). Identifying essential genes in bacterial metabolic networks with machine learning methods. BMC Systems Biology, 4, 56.Google ScholarGoogle ScholarCross RefCross Ref
  16. Tianqi Chen, Carlos Guestrin (2016). XGBoost: A Scalable Tree Boosting System. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785--794.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. IBM Knowledge Center: C 5.0 Node: https://www.ibm.com/support/knowledgecenter/en/SS3RA7_15.0.0/com.ibm.spss.modeler.help/c50node_general.htm. [Accessed: 27/06/2019].Google ScholarGoogle Scholar
  18. IBM SPSS Modeler 18: https://www-01.ibm.com/support/docview.wss?uid=swg24039399 [Accessed: 2/07/2019].Google ScholarGoogle Scholar
  19. e!Ensembl: https://asia.ensembl.org/index.html [Accessed: 02/03/2019].Google ScholarGoogle Scholar
  20. Database of Essential Genes (DEG) Database: http://www.essentialgene.org/ [Accessed: 12/03/2019].Google ScholarGoogle Scholar
  21. Rapidminer Studio framework: https://rapidminer.com/ [Accessed: 13/03/2019].Google ScholarGoogle Scholar
  22. DIP Database: http://dip-mbi.ucla.edu/ [Accessed: 13/03/2019].Google ScholarGoogle Scholar
  23. YeastNe: https://www.inetbio.org/yeastnet/downloadnetwork.php [Accessed: 14/07/2019].Google ScholarGoogle Scholar
  24. Wei-Hua Chen, Pablo Minguez, Martin J. Lercher, Peer Bork (2011). OGEE: an online gene essentiality database. Nucleic Acid Research, 40(D1), D901-D906.Google ScholarGoogle Scholar
  25. A Getle Introduction to XGBoost for Applied Machine Learning: https://machielearningmastery.com/gentle-introduction-xgboost-applied-machine-learning/ [Accessed: 23/06/2019].Google ScholarGoogle Scholar
  26. Determiing Creditworthiness for Loan Applications using C 5.0 Trees: https://rpubs.com/cyobero/C50 [Accessed: 23/06/2019].Google ScholarGoogle Scholar
  27. Das, Partha S. Sandip Chakroborty, Mondal, Keshab C. Ghosh, Tapash C. and Pati and Bikas, R (2016). Protein disorderness based prediction of essential genes of Saccahromyces cerevisiae: a machine learning approach. International Journal of Current Research 8(5), 31156--31160.Google ScholarGoogle Scholar
  28. Jiancheg Zhong, Jianxin Wang, Wei Peng et. al. (2013). Prediction of essential proteins based on gene expression programming. BMC Genomics, 14, 57.Google ScholarGoogle ScholarCross RefCross Ref
  29. Jianxin ang, Min Li, Huan Wang, and Yi Pan (2012). Identification of Essential Proteins Basd on Edge Clustering Coefficient. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9(4), 1070--1080.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An Efficient Approach for Prediction of Essential Protein from PPI Network

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICCA 2020: Proceedings of the International Conference on Computing Advancements
      January 2020
      517 pages
      ISBN:9781450377782
      DOI:10.1145/3377049

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 March 2020

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader