Detecting Transmembrane Proteins Using Decision Trees

Nikravan, Mohammad Hossein; Kumar, Ashwani; Zilles, Sandra

doi:10.1007/978-3-319-24282-8_13

Mohammad Hossein Nikravan¹⁵,
Ashwani Kumar¹⁵ &
Sandra Zilles¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9356))

Included in the following conference series:

International Conference on Discovery Science

996 Accesses

Abstract

Transmembrane (TM) proteins are proteins that span a cell membrane; their segments crossing the membrane are called TM domains. TM domain and TM protein detection are important problems in computational biology, but typical machine learning approaches yield classifiers that are difficult to interpret and hence yield no biological insight. We study both TM domain and TM protein detection with easy to interpret decision trees. For TM domain detection, the use of decision trees is already reported in the literature, but we provide a critical study of the existing approach, resulting in improved feature sets as well as observations on how to avoid biased training and test sets. In particular, we discover a motif known to be common to TM domains that was not discovered in previous research using machine learning. For TM protein detection, we propose a 2-layer learning method. This method can be generalized to deal with a large class of string classification problems. The method achieves sensitivity and specificity values of up to 92 % on the settings we experimented with, while providing intuitive classifiers that are easy to interpret for the domain expert.

This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
U and O are two rare amino acids found in some species. B, J, X, and Z are used in case of inconclusive identification of residues in the protein sequences.
2.
This suggests that it might be helpful to train and test the trees used at the first layer using segments that cover part of a TM domain and part of a non-TM segment. However, in this case it is not straightforward how to assign labels to the training and test data, i.e., to decide how much overlap a segment needs to have with a TM domain in order to be considered a positive example.

References

Arikawa, S., Miyano, S., Shinohara, A., Kuhara, S., Mukouchi, Y., Shinohara, T.: A machine discovery from amino acid sequences by decision trees over regular patterns. New Gener. Comput. 11, 361–375 (1993)
Article MATH Google Scholar
Chen, C.P., Kernytsky, A., Rost, B.: Transmembrane helix predictions revisited. Protein Sci. 12, 2774–2791 (2002)
Google Scholar
Flores-Mireles, A.L., Walker, J.N., Caparon, M., Hultgren, S.J.: Urinary tract infections: epidemiology, mechanisms of infection and treatment options. Nat. Rev. Microbiol. 13, 269–284 (2015)
Article Google Scholar
Fruh, V., Zhou, Y., Chen, D., Loch, C., Ab, E., Grinkova, Y.N., Verheij, H., Sligar, S.G., Bushweller, J.H., Siegal, G.: Application of fragment-based drug discovery to membrane proteins: identification of ligands of the integral membrane enzyme DsbB. Chem. Biol. 17, 881–891 (2010)
Article Google Scholar
He, J., Hu, H., Harrison, R., Tai, P.C., Pan, Y.: Transmembrane segments prediction and understanding using support vector machine and decision tree. Expert Syst. Appl. 30, 64–72 (2006)
Article Google Scholar
Kyte, J., Doolittle, R.F.: A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157, 105–132 (1982)
Article Google Scholar
Pasquier, C., Promponas, V.J., Hamodrakas, S.J.: PRED-CLASS: cascading neural networks for generalized protein classification and genome-wide applications. Proteins 44, 361–369 (2001)
Article Google Scholar
Pasquier, C., Promponas, V.J., Palaios, G.A., Hamodrakas, J.S., Hamodrakas, S.J.: A novel method for predicting transmembrane segments in proteins based on a statistical analysis of the SwissProt database: the PRED-TMR algorithm. Protein Eng. 12, 381–385 (1999)
Article Google Scholar
Ramasarma, T., Joshi, N.V., Sekar, K., Uthayakumar, M., Sherlin, D.: Transmembrane domains. Wiley, In Encyclopedia of Life Sciences (2012)
Book Google Scholar
Ribet, D., Cossart, P.: How bacterial pathogens colonize their hosts and invade deeper tissues. Microbes Infect. Inst. Pasteur 17, 173–183 (2015)
Article Google Scholar
Senes, A., Gerstein, M., Engelman, D.M.: Statistical analysis of amino acid patterns in transmembrane helices: the GxxxG motif occurs frequently and in association with beta-branched residues at neighboring positions. J. Mol. Biol. 296, 921–936 (2000)
Article Google Scholar
Tusnady, G.E., Simon, I.: Principles governing amino acid composition of integral membrane proteins: application to topology prediction. J. Mol. Biol. 283, 489–506 (1998)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Regina, Regina, SK, S4S 0A2, Canada
Mohammad Hossein Nikravan, Ashwani Kumar & Sandra Zilles

Authors

Mohammad Hossein Nikravan
View author publications
You can also search for this author in PubMed Google Scholar
Ashwani Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Sandra Zilles
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sandra Zilles .

Editor information

Editors and Affiliations

University of Ottawa, Ottawa, Ontario, Canada
Nathalie Japkowicz
Faculty of Computer Science, Dalhousie University, HALIFAX, Nova Scotia, Canada
Stan Matwin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nikravan, M.H., Kumar, A., Zilles, S. (2015). Detecting Transmembrane Proteins Using Decision Trees. In: Japkowicz, N., Matwin, S. (eds) Discovery Science. DS 2015. Lecture Notes in Computer Science(), vol 9356. Springer, Cham. https://doi.org/10.1007/978-3-319-24282-8_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-24282-8_13
Published: 25 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24281-1
Online ISBN: 978-3-319-24282-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics