Abstract
Transmembrane (TM) proteins are proteins that span a cell membrane; their segments crossing the membrane are called TM domains. TM domain and TM protein detection are important problems in computational biology, but typical machine learning approaches yield classifiers that are difficult to interpret and hence yield no biological insight. We study both TM domain and TM protein detection with easy to interpret decision trees. For TM domain detection, the use of decision trees is already reported in the literature, but we provide a critical study of the existing approach, resulting in improved feature sets as well as observations on how to avoid biased training and test sets. In particular, we discover a motif known to be common to TM domains that was not discovered in previous research using machine learning. For TM protein detection, we propose a 2-layer learning method. This method can be generalized to deal with a large class of string classification problems. The method achieves sensitivity and specificity values of up to 92 % on the settings we experimented with, while providing intuitive classifiers that are easy to interpret for the domain expert.
This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
U and O are two rare amino acids found in some species. B, J, X, and Z are used in case of inconclusive identification of residues in the protein sequences.
- 2.
This suggests that it might be helpful to train and test the trees used at the first layer using segments that cover part of a TM domain and part of a non-TM segment. However, in this case it is not straightforward how to assign labels to the training and test data, i.e., to decide how much overlap a segment needs to have with a TM domain in order to be considered a positive example.
References
Arikawa, S., Miyano, S., Shinohara, A., Kuhara, S., Mukouchi, Y., Shinohara, T.: A machine discovery from amino acid sequences by decision trees over regular patterns. New Gener. Comput. 11, 361–375 (1993)
Chen, C.P., Kernytsky, A., Rost, B.: Transmembrane helix predictions revisited. Protein Sci. 12, 2774–2791 (2002)
Flores-Mireles, A.L., Walker, J.N., Caparon, M., Hultgren, S.J.: Urinary tract infections: epidemiology, mechanisms of infection and treatment options. Nat. Rev. Microbiol. 13, 269–284 (2015)
Fruh, V., Zhou, Y., Chen, D., Loch, C., Ab, E., Grinkova, Y.N., Verheij, H., Sligar, S.G., Bushweller, J.H., Siegal, G.: Application of fragment-based drug discovery to membrane proteins: identification of ligands of the integral membrane enzyme DsbB. Chem. Biol. 17, 881–891 (2010)
He, J., Hu, H., Harrison, R., Tai, P.C., Pan, Y.: Transmembrane segments prediction and understanding using support vector machine and decision tree. Expert Syst. Appl. 30, 64–72 (2006)
Kyte, J., Doolittle, R.F.: A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157, 105–132 (1982)
Pasquier, C., Promponas, V.J., Hamodrakas, S.J.: PRED-CLASS: cascading neural networks for generalized protein classification and genome-wide applications. Proteins 44, 361–369 (2001)
Pasquier, C., Promponas, V.J., Palaios, G.A., Hamodrakas, J.S., Hamodrakas, S.J.: A novel method for predicting transmembrane segments in proteins based on a statistical analysis of the SwissProt database: the PRED-TMR algorithm. Protein Eng. 12, 381–385 (1999)
Ramasarma, T., Joshi, N.V., Sekar, K., Uthayakumar, M., Sherlin, D.: Transmembrane domains. Wiley, In Encyclopedia of Life Sciences (2012)
Ribet, D., Cossart, P.: How bacterial pathogens colonize their hosts and invade deeper tissues. Microbes Infect. Inst. Pasteur 17, 173–183 (2015)
Senes, A., Gerstein, M., Engelman, D.M.: Statistical analysis of amino acid patterns in transmembrane helices: the GxxxG motif occurs frequently and in association with beta-branched residues at neighboring positions. J. Mol. Biol. 296, 921–936 (2000)
Tusnady, G.E., Simon, I.: Principles governing amino acid composition of integral membrane proteins: application to topology prediction. J. Mol. Biol. 283, 489–506 (1998)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Nikravan, M.H., Kumar, A., Zilles, S. (2015). Detecting Transmembrane Proteins Using Decision Trees. In: Japkowicz, N., Matwin, S. (eds) Discovery Science. DS 2015. Lecture Notes in Computer Science(), vol 9356. Springer, Cham. https://doi.org/10.1007/978-3-319-24282-8_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-24282-8_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24281-1
Online ISBN: 978-3-319-24282-8
eBook Packages: Computer ScienceComputer Science (R0)