Abstract
G-protein coupled receptors (GPCRs) are a large superfamily of integral membrane proteins that transduce signals across the cell membrane. Because of that important property and other physiological roles undertaken by the GPCR family, they have been an important target of therapeutic drugs. The function of many GPCRs is not known and accurate classification of GPCRs can help us to predict their function. In this study we suggest a kernel based method to classify them at the subfamily and sub-subfamily level. To enhance the accuracy and sensitivity of classifiers at the sub-subfamily level that we were facing with a low number of sequences (imbalanced data), we used our new synthetic protein sequence oversampling (SPSO) algorithm and could gain an overall accuracy and Matthew’s correlation coefficient (MCC) of 98.4 % and 0.98 for class A, nearly 100% and 1 for class B and 96.95% and 0.91 for class C, respectively, at the subfamily level and overall accuracy and MCC of 97.93% and 0.95 at the sub-subfamily level. The results shows that Our oversampling technique can be used for other applications of protein classification with the problem of imbalanced data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Attwood, T.K., Croning, M.D.R., Gaulton, A.: Deriving structural and functional insights from a ligand-based hierarchical classification of G-protein coupled receptors. Protein Eng. 15, 7–12 (2002)
Herbert, T.E., Bouvier, M.: Structural and functional aspects of G protein-coupled receptor oligomerization. Biochem. Cell Biol. 76, 1–11 (1998)
Horn, F., Bettler, E., Oliveira, L., Campagne, L.F., Cohhen, F.E., Vriend, G.: GPCRDB information system for G protein-coupled receptors. Nucleic Acids Res. 31(1), 294–297 (2003)
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleaic Acids Res 25, 3389–3402 (1997)
Kim, J., Moriyama, E.N., Warr, C.G., Clyne, P.J., Carlson, J.R.: Identification of novel multi-transmembrane proteins from genomic databases using quasi-periodic structural properties. Bioinformatics 16(9), 767–775 (2000)
Elrod, D.W., Chou, K.C.: A study on the correlation of G-protein-coupled receptor types with amino acid composition. Protein Eng. 15, 713–715 (2002)
Qian, B., Soyer, O.S., Neubig, R.R.: Depicting a protein’s two faces: GPCR classification by phylogenetic tree-based HMM. FEBS Lett. 554, 95 (2003)
Karchin, R., Karplus, K., Haussler, D.: Classifying G-protein coupled receptors with support vector machines. Bioinformatics 18(1), 147–159 (2002)
Jaakkola, T., Diekhans, M., Haussler, D.: A discriminative framework for detecting remote protein homologies. Journal of Computational Biology 7(1-2), 95–114 (2000)
Leslie, C., Eskin, E., Noble, W.S.: The spectrum kernel: A string kernel for SVM protein classification. In: Altman, R.B., Dunker, A.K., Hunter, L., Lauderdale, K., Klein, T.E. (eds.) Proceedings of the Pacific Symposium on Biocomputing, pp. 564–575. World Scientific, New Jersey (2002)
Leslie, C., Eskin, E., Cohen, A., Weston, J., Noble, W.S.: Mismatch string kernel for SVM protein classification. Advances in Neural Information Processing System 15, 1441–1448 (2003)
Vert, J.-P., Saigo, H., Akustu, T.: Convolution and local alignment kernel. In: Schölkopf, B., Tsuda, K., Vert, J.-P. (eds.) Kernel Methods in Compuatational Biology. The MIT Press, Cambridge
Huang, Y., Cai, J., Li, Y.D.: Classifying G-protein coupled receptors with bagging classification tree. Computationa Biology and Chemistry 28, 275–280 (2004)
Bairoch, A., Apweiler, R.: The SWISS-PROT protein sequence data bank and its supplement TrEMBL. Nucleic Acids res. 29, 346–349 (2001)
Saigo, H., Vert, J.P., Ueda, N., Akustu, T.: Protein homology detection using string alignment kernels. Bioinformatics 20(11), 1682–1689 (2004)
Haussler, D.: Convolution kernels on discrete structures. Technical Report UCSC-CRL-99-10, Department of Computer Science, University of California at Santa Cruz (1999)
Pazzini, M., Marz, C., Murphi, P., Ali, K., Hume, T., Bruk, C.: Reducing misclassification costs. In: proceedings of the Eleventh International Conference on Machine Learning, pp. 217–225 (1994)
Japkowicz, N., Myers, C., Gluch, M.: A novelty detection approach to classification. In: Proceeding of the Fourteenth International Joint Conference on Artificial Intelilligence, pp. 10–15 (1995)
Japkowicz, N.: Learning from imbalanved data sets:A Comparison of various strategies. In: Proceedings of Learning from Imbalanced Data, pp. 10–15 (2000)
Veropoulos, K., Campbell, C., Cristianini, N.: Controlling the sensitivity of support vector machines. In: Proceedings of the International Joint Conference on AI, pp. 55–60 (1999)
Bhasin, M., Raghava, G.P.S.: GPCRpred: an SVM-based method for prediction of families and subfamilies of G-protein coupled receptors. Nucleic Acids res. 32, 383–389 (2004)
Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTALW: Improving the sesitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994)
Joachims, T.: Macking large scale svm learning practical. Technical Report LS8-24, Universitat Dortmond (1998)
Beigi, M., Zell, A.: SPSO: Synthetic Protein Sequence Oversampling for imbalanced protein data and remote homilogy detection. In: VII International Symposium on Biological and Medical Data Analysis ISBMDA (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Beigi, M., Zell, A. (2006). A Novel Method for Classifying Subfamilies and Sub-subfamilies of G-Protein Coupled Receptors. In: Maglaveras, N., Chouvarda, I., Koutkias, V., Brause, R. (eds) Biological and Medical Data Analysis. ISBMDA 2006. Lecture Notes in Computer Science(), vol 4345. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11946465_3
Download citation
DOI: https://doi.org/10.1007/11946465_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68063-5
Online ISBN: 978-3-540-68065-9
eBook Packages: Computer ScienceComputer Science (R0)