Skip to main content

Detecting Transmembrane Proteins Using Decision Trees

  • Conference paper
  • First Online:
Discovery Science (DS 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9356))

Included in the following conference series:

  • 996 Accesses

Abstract

Transmembrane (TM) proteins are proteins that span a cell membrane; their segments crossing the membrane are called TM domains. TM domain and TM protein detection are important problems in computational biology, but typical machine learning approaches yield classifiers that are difficult to interpret and hence yield no biological insight. We study both TM domain and TM protein detection with easy to interpret decision trees. For TM domain detection, the use of decision trees is already reported in the literature, but we provide a critical study of the existing approach, resulting in improved feature sets as well as observations on how to avoid biased training and test sets. In particular, we discover a motif known to be common to TM domains that was not discovered in previous research using machine learning. For TM protein detection, we propose a 2-layer learning method. This method can be generalized to deal with a large class of string classification problems. The method achieves sensitivity and specificity values of up to 92 % on the settings we experimented with, while providing intuitive classifiers that are easy to interpret for the domain expert.

This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    U and O are two rare amino acids found in some species. B, J, X, and Z are used in case of inconclusive identification of residues in the protein sequences.

  2. 2.

    This suggests that it might be helpful to train and test the trees used at the first layer using segments that cover part of a TM domain and part of a non-TM segment. However, in this case it is not straightforward how to assign labels to the training and test data, i.e., to decide how much overlap a segment needs to have with a TM domain in order to be considered a positive example.

References

  1. Arikawa, S., Miyano, S., Shinohara, A., Kuhara, S., Mukouchi, Y., Shinohara, T.: A machine discovery from amino acid sequences by decision trees over regular patterns. New Gener. Comput. 11, 361–375 (1993)

    Article  MATH  Google Scholar 

  2. Chen, C.P., Kernytsky, A., Rost, B.: Transmembrane helix predictions revisited. Protein Sci. 12, 2774–2791 (2002)

    Google Scholar 

  3. Flores-Mireles, A.L., Walker, J.N., Caparon, M., Hultgren, S.J.: Urinary tract infections: epidemiology, mechanisms of infection and treatment options. Nat. Rev. Microbiol. 13, 269–284 (2015)

    Article  Google Scholar 

  4. Fruh, V., Zhou, Y., Chen, D., Loch, C., Ab, E., Grinkova, Y.N., Verheij, H., Sligar, S.G., Bushweller, J.H., Siegal, G.: Application of fragment-based drug discovery to membrane proteins: identification of ligands of the integral membrane enzyme DsbB. Chem. Biol. 17, 881–891 (2010)

    Article  Google Scholar 

  5. He, J., Hu, H., Harrison, R., Tai, P.C., Pan, Y.: Transmembrane segments prediction and understanding using support vector machine and decision tree. Expert Syst. Appl. 30, 64–72 (2006)

    Article  Google Scholar 

  6. Kyte, J., Doolittle, R.F.: A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157, 105–132 (1982)

    Article  Google Scholar 

  7. Pasquier, C., Promponas, V.J., Hamodrakas, S.J.: PRED-CLASS: cascading neural networks for generalized protein classification and genome-wide applications. Proteins 44, 361–369 (2001)

    Article  Google Scholar 

  8. Pasquier, C., Promponas, V.J., Palaios, G.A., Hamodrakas, J.S., Hamodrakas, S.J.: A novel method for predicting transmembrane segments in proteins based on a statistical analysis of the SwissProt database: the PRED-TMR algorithm. Protein Eng. 12, 381–385 (1999)

    Article  Google Scholar 

  9. Ramasarma, T., Joshi, N.V., Sekar, K., Uthayakumar, M., Sherlin, D.: Transmembrane domains. Wiley, In Encyclopedia of Life Sciences (2012)

    Book  Google Scholar 

  10. Ribet, D., Cossart, P.: How bacterial pathogens colonize their hosts and invade deeper tissues. Microbes Infect. Inst. Pasteur 17, 173–183 (2015)

    Article  Google Scholar 

  11. Senes, A., Gerstein, M., Engelman, D.M.: Statistical analysis of amino acid patterns in transmembrane helices: the GxxxG motif occurs frequently and in association with beta-branched residues at neighboring positions. J. Mol. Biol. 296, 921–936 (2000)

    Article  Google Scholar 

  12. Tusnady, G.E., Simon, I.: Principles governing amino acid composition of integral membrane proteins: application to topology prediction. J. Mol. Biol. 283, 489–506 (1998)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sandra Zilles .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Nikravan, M.H., Kumar, A., Zilles, S. (2015). Detecting Transmembrane Proteins Using Decision Trees. In: Japkowicz, N., Matwin, S. (eds) Discovery Science. DS 2015. Lecture Notes in Computer Science(), vol 9356. Springer, Cham. https://doi.org/10.1007/978-3-319-24282-8_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24282-8_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24281-1

  • Online ISBN: 978-3-319-24282-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics