Abstract
Protein domains are discrete portion of protein sequence that can fold independently with their own function. Protein domain classification is important for multiple reasons, which including determines the protein function in order to manufacture new protein with new function. However, there are several issues that need to be addressed in protein domain classification which include increasing domain signal and accurate classify to their category. Therefore, to overcome this issue, this paper proposed a new approach to classify protein domain from protein subsequences and protein structure information using SVM sigmoid kernel. The proposed method consists of three phases: Data generating, creating sequence information and classification. The data generating phase selects potential protein and generates clear domain information. The creating sequence information phase used several calculations to generate protein structure information in order to optimize the domain signal. The classification phase involves SVM sigmoid kernel and performance evaluation. The performance of the approach method is evaluated in terms of sensitivity and specificity on single-domain and multiple-domain using dataset SCOP 1.75. The result on SVM sigmoid kernel shown higher accuracy compare with single neural network and double neural network for single and multiple domain prediction. This proposed approach develops in order to solve the problem of coincidently group into both categories either single or multiple domain. This method showed an improvement of classification in term of sensitivity, specificity and accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Melvin, I., Weston, J., Leslie, C.S., Noble, W.S.: Combining Classifiers for Improved Classification of Proteins from Sequence or Structure. BMC Bioinformatics 9, 38–389 (2008)
Portugaly, E., Harel, A., Linial, N., Linial, M.: EVEREST: Automatic Identification and Classification of Protein Domains in All Protein Sequences. BMC Bioinformatics 7, 27–286 (2006)
Nagaranjan, N., Yona, G.: Automatic Prediction of Protein Domain from Sequence Information using a Hybrid Learning System. Bioinformatics 20, 1335–1360 (2004)
Gewehr, J.E., Zimmer, R.: SSEP-Domain: Protein Domain Prediction by Alignment of Secondary Structure Elements and Profiles. Bioinformatics 22, 181–187 (2006)
Orengo, A., Michie, A.D., Jones, S., Jones, D.T., Swindells, M.B., Thornton, J.M.: CATH-a Hierarchic Classification of Protein Domain Structures. Structure 5, 1093–1108 (1997)
Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: A Structural Classification of Protein Database for the Investigation of Sequences and Structures. Journal of Molecular Biology 247, 536–540 (1995)
Pei, J., Grishin, N.V.: PROMALS: Towards Accurate Multiple Sequence Alignments of Distantly Related Protein. Bioinformatics 23, 802–808 (2007)
Vinayagam, A., Shi, J., Pugalenthi, G., Meenakshi, B., Blundell, T.L., Sowdhamini, R.: DDBASE2.0: Updated Domain Database with Improved Identification of Structural Domains. Bioinformatics 19, 1760–1764 (2003)
Lexa, M., Valle, G.: Pimex: Rapid Identification of Oligonucleotide Matches in whole Genomes. Bioinformatics 19, 2486–2488 (2003)
Finn, R.D., Mistry, J., Schuster-Bockler, B., Griffiths-Jones, S., Hollich, V., Lassmann, T., Moxon, S., Marshall, M., Khanna, A., Durbin, R., Eddy, S.R., Sonnhammer, E.L., Bateman, A.: Pfam: Clans, Web Tools and Services. Nucleic Acids Research 34, D247–D251 (2006)
Marchler, A., Anderson, J.B., Derbyshire, M.K., DeWeese-Scott, C., Gonzales, N.R., Gwadz, M., Hao, L., He, S., Hurwitz, D.I., Jackson, J.D., Zhaoxi, K., Krylov, D., Lanczycki, C.J., Liebert, C.A., Liu, C., Lu, F., Lu, S., Marchler, G.H., Mullokandov, M., Song, J.S., Thanki, N., Yamashita, R.A., Yin, J.J., Zhang, D., Bryant, S.H.: CDD: A Conserved Domain Database for Interactive Domain Family Analysis. Nucleic Acids Research 35, D237–D240 (2005)
Letunic, I., Copley, R.R., Pils, B., Pinkert, S., Schultz, J., Bork, P.: SMART 5: Domains in the Context of Genomes and Networks. Nucleic Acids Research 34, D257–D260 (2006)
Wheelan, S.J., Marchler-Bauer, A., Bryant, S.H.: Domain Size Distributions can Predict Domain Boundaries. Bioinformatics 16, 613–618 (2000)
Lu, T., Dou, Y., Zhang, C.: Fuzzy clustering of CPP family in plants with evolution and interaction analyses. BMC Bioinformatics 14, S10 (2013)
Chen, Y., Xu, J., Yang, B., Zhao, Y., He, W.: A novel method for prediction of protein interaction sites based on integrated RBF neural networks. Comput. Biol. Med. 42, 402–407 (2012)
Liang, L., Felgner, P.L.: Predicting antigenicity of proteins in a bacterial proteome; a protein microarray and naive Bayes classification approach. Chem. Biodivers. 9, 977–990 (2012)
Medina, F., Aguila, S., Baratto, M.C., Martorana, A., Basosi, R., Alderete, J.B., Vazquez-Duhalt, R.: Prediction model based on decision tree analysis for laccase mediators. Enzyme Microb. Technol. 52, 68–76 (2013)
Sun, H., Wang, S.: Penalized logistic regression for high-dimensional DNA methylation data with case-control studies. Bioinformatics 28, 1368–1375 (2012)
Xin, M., Jiansheng, W., Xiaoyun, X.: Identification of DNA-Binding Proteins Using Support Vector Machine with Sequence Information. Computational Mathematical Methods in Medicine 1, 524502 (2013)
Vinay, N., Monalisa, D., Sowmya, S.M., Ramya, K.S., Valadi, K.J.: Identification of Penicillin-binding proteins employing support vector machines and random forest. Bioinformation 9, 481–484 (2013)
Ruoying, C., Wenjing, C., Sixiao, Y., Di, W., Yong, W., Yingjie, T., Yong, S.: Rigorous assessment and integration of the sequence and structure based features to predict hot spots. BMC Bioinformatics 12, 311 (2011)
David, A., Hai, F., Owen, J.L., Rackham, D.W., Ralph, P., Cyrus, C., Julian, G.: SUPERFAMILY 1.75 including a domain-centric gene ontology method. Nucleic Acids Res. 39, D427–D434 (2011)
Kalsum, H.U., Shah, Z.A., Othman, R.M., Hassan, R., Rahim, S.M., Asmuni, H., Taliba, J., Zakaria, Z.: SPlitSSI-SVM: an algorithm to reduce the misleading and increase the strength of domain signal. Comput. Biol. Med. 39, 1013–1019 (2009)
Sievers, F., Wilm, A., Dineen, D., Gibson, T.J., Karplus, K., Li, W., Lopez, R., McWilliam, H., Remmert, M., Soding, J., Thompson, J.D., Higgins, D.G.: Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7, 539 (2011)
Eickholt, J., Deng, X., Cheng, J.: DoBo: Protein domain boundary prediction by integrating evolutionary signals and machine learning. BMC Bioinformatics 12, 1471 (2011)
Kalsum, H.U., Nazri, M.N., Shahreen, K.: A New Approach for Protein Domain Prediction by Using Double Stage Neural Network. Adv. Sci. Eng. Med. 6, 129–132 (2014)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Hassan, U.K., Nawi, N.M., Kasim, S., Ramli, A.A., Fudzee, M.F.M., Salamat, M.A. (2014). Classify a Protein Domain Using SVM Sigmoid Kernel. In: Herawan, T., Ghazali, R., Deris, M. (eds) Recent Advances on Soft Computing and Data Mining. Advances in Intelligent Systems and Computing, vol 287. Springer, Cham. https://doi.org/10.1007/978-3-319-07692-8_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-07692-8_14
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07691-1
Online ISBN: 978-3-319-07692-8
eBook Packages: EngineeringEngineering (R0)