Classify a Protein Domain Using SVM Sigmoid Kernel

Hassan, Ummi Kalsum; Nawi, Nazri Mohd.; Kasim, Shahreen; Ramli, Azizul Azhar; Fudzee, Mohd Farhan Md; Salamat, Mohamad Aizi

doi:10.1007/978-3-319-07692-8_14

Ummi Kalsum Hassan⁵,
Nazri Mohd. Nawi⁶,
Shahreen Kasim⁶,
Azizul Azhar Ramli⁶,
Mohd Farhan Md Fudzee⁶ &
…
Mohamad Aizi Salamat⁶

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 287))

1487 Accesses
1 Citations

Abstract

Protein domains are discrete portion of protein sequence that can fold independently with their own function. Protein domain classification is important for multiple reasons, which including determines the protein function in order to manufacture new protein with new function. However, there are several issues that need to be addressed in protein domain classification which include increasing domain signal and accurate classify to their category. Therefore, to overcome this issue, this paper proposed a new approach to classify protein domain from protein subsequences and protein structure information using SVM sigmoid kernel. The proposed method consists of three phases: Data generating, creating sequence information and classification. The data generating phase selects potential protein and generates clear domain information. The creating sequence information phase used several calculations to generate protein structure information in order to optimize the domain signal. The classification phase involves SVM sigmoid kernel and performance evaluation. The performance of the approach method is evaluated in terms of sensitivity and specificity on single-domain and multiple-domain using dataset SCOP 1.75. The result on SVM sigmoid kernel shown higher accuracy compare with single neural network and double neural network for single and multiple domain prediction. This proposed approach develops in order to solve the problem of coincidently group into both categories either single or multiple domain. This method showed an improvement of classification in term of sensitivity, specificity and accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Melvin, I., Weston, J., Leslie, C.S., Noble, W.S.: Combining Classifiers for Improved Classification of Proteins from Sequence or Structure. BMC Bioinformatics 9, 38–389 (2008)
Article Google Scholar
Portugaly, E., Harel, A., Linial, N., Linial, M.: EVEREST: Automatic Identification and Classification of Protein Domains in All Protein Sequences. BMC Bioinformatics 7, 27–286 (2006)
Article Google Scholar
Nagaranjan, N., Yona, G.: Automatic Prediction of Protein Domain from Sequence Information using a Hybrid Learning System. Bioinformatics 20, 1335–1360 (2004)
Article Google Scholar
Gewehr, J.E., Zimmer, R.: SSEP-Domain: Protein Domain Prediction by Alignment of Secondary Structure Elements and Profiles. Bioinformatics 22, 181–187 (2006)
Article Google Scholar
Orengo, A., Michie, A.D., Jones, S., Jones, D.T., Swindells, M.B., Thornton, J.M.: CATH-a Hierarchic Classification of Protein Domain Structures. Structure 5, 1093–1108 (1997)
Article Google Scholar
Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: A Structural Classification of Protein Database for the Investigation of Sequences and Structures. Journal of Molecular Biology 247, 536–540 (1995)
Google Scholar
Pei, J., Grishin, N.V.: PROMALS: Towards Accurate Multiple Sequence Alignments of Distantly Related Protein. Bioinformatics 23, 802–808 (2007)
Article Google Scholar
Vinayagam, A., Shi, J., Pugalenthi, G., Meenakshi, B., Blundell, T.L., Sowdhamini, R.: DDBASE2.0: Updated Domain Database with Improved Identification of Structural Domains. Bioinformatics 19, 1760–1764 (2003)
Article Google Scholar
Lexa, M., Valle, G.: Pimex: Rapid Identification of Oligonucleotide Matches in whole Genomes. Bioinformatics 19, 2486–2488 (2003)
Article Google Scholar
Finn, R.D., Mistry, J., Schuster-Bockler, B., Griffiths-Jones, S., Hollich, V., Lassmann, T., Moxon, S., Marshall, M., Khanna, A., Durbin, R., Eddy, S.R., Sonnhammer, E.L., Bateman, A.: Pfam: Clans, Web Tools and Services. Nucleic Acids Research 34, D247–D251 (2006)
Google Scholar
Marchler, A., Anderson, J.B., Derbyshire, M.K., DeWeese-Scott, C., Gonzales, N.R., Gwadz, M., Hao, L., He, S., Hurwitz, D.I., Jackson, J.D., Zhaoxi, K., Krylov, D., Lanczycki, C.J., Liebert, C.A., Liu, C., Lu, F., Lu, S., Marchler, G.H., Mullokandov, M., Song, J.S., Thanki, N., Yamashita, R.A., Yin, J.J., Zhang, D., Bryant, S.H.: CDD: A Conserved Domain Database for Interactive Domain Family Analysis. Nucleic Acids Research 35, D237–D240 (2005)
Google Scholar
Letunic, I., Copley, R.R., Pils, B., Pinkert, S., Schultz, J., Bork, P.: SMART 5: Domains in the Context of Genomes and Networks. Nucleic Acids Research 34, D257–D260 (2006)
Google Scholar
Wheelan, S.J., Marchler-Bauer, A., Bryant, S.H.: Domain Size Distributions can Predict Domain Boundaries. Bioinformatics 16, 613–618 (2000)
Article Google Scholar
Lu, T., Dou, Y., Zhang, C.: Fuzzy clustering of CPP family in plants with evolution and interaction analyses. BMC Bioinformatics 14, S10 (2013)
Google Scholar
Chen, Y., Xu, J., Yang, B., Zhao, Y., He, W.: A novel method for prediction of protein interaction sites based on integrated RBF neural networks. Comput. Biol. Med. 42, 402–407 (2012)
Article Google Scholar
Liang, L., Felgner, P.L.: Predicting antigenicity of proteins in a bacterial proteome; a protein microarray and naive Bayes classification approach. Chem. Biodivers. 9, 977–990 (2012)
Article Google Scholar
Medina, F., Aguila, S., Baratto, M.C., Martorana, A., Basosi, R., Alderete, J.B., Vazquez-Duhalt, R.: Prediction model based on decision tree analysis for laccase mediators. Enzyme Microb. Technol. 52, 68–76 (2013)
Article Google Scholar
Sun, H., Wang, S.: Penalized logistic regression for high-dimensional DNA methylation data with case-control studies. Bioinformatics 28, 1368–1375 (2012)
Article Google Scholar
Xin, M., Jiansheng, W., Xiaoyun, X.: Identification of DNA-Binding Proteins Using Support Vector Machine with Sequence Information. Computational Mathematical Methods in Medicine 1, 524502 (2013)
Google Scholar
Vinay, N., Monalisa, D., Sowmya, S.M., Ramya, K.S., Valadi, K.J.: Identification of Penicillin-binding proteins employing support vector machines and random forest. Bioinformation 9, 481–484 (2013)
Article Google Scholar
Ruoying, C., Wenjing, C., Sixiao, Y., Di, W., Yong, W., Yingjie, T., Yong, S.: Rigorous assessment and integration of the sequence and structure based features to predict hot spots. BMC Bioinformatics 12, 311 (2011)
Article Google Scholar
David, A., Hai, F., Owen, J.L., Rackham, D.W., Ralph, P., Cyrus, C., Julian, G.: SUPERFAMILY 1.75 including a domain-centric gene ontology method. Nucleic Acids Res. 39, D427–D434 (2011)
Google Scholar
Kalsum, H.U., Shah, Z.A., Othman, R.M., Hassan, R., Rahim, S.M., Asmuni, H., Taliba, J., Zakaria, Z.: SPlitSSI-SVM: an algorithm to reduce the misleading and increase the strength of domain signal. Comput. Biol. Med. 39, 1013–1019 (2009)
Article Google Scholar
Sievers, F., Wilm, A., Dineen, D., Gibson, T.J., Karplus, K., Li, W., Lopez, R., McWilliam, H., Remmert, M., Soding, J., Thompson, J.D., Higgins, D.G.: Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7, 539 (2011)
Article Google Scholar
Eickholt, J., Deng, X., Cheng, J.: DoBo: Protein domain boundary prediction by integrating evolutionary signals and machine learning. BMC Bioinformatics 12, 1471 (2011)
Article Google Scholar
Kalsum, H.U., Nazri, M.N., Shahreen, K.: A New Approach for Protein Domain Prediction by Using Double Stage Neural Network. Adv. Sci. Eng. Med. 6, 129–132 (2014)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer Science and Information Technology, Kolej Poly-Tech MARA, Batu Pahat, Johor, Malaysia
Ummi Kalsum Hassan
Software Multimedia Centre, Faculty of Computer Science and Information Technology, Universiti Tun Hussain Onn, Batu Pahat, Johor, Malaysia
Nazri Mohd. Nawi, Shahreen Kasim, Azizul Azhar Ramli, Mohd Farhan Md Fudzee & Mohamad Aizi Salamat

Authors

Ummi Kalsum Hassan
View author publications
You can also search for this author in PubMed Google Scholar
Nazri Mohd. Nawi
View author publications
You can also search for this author in PubMed Google Scholar
Shahreen Kasim
View author publications
You can also search for this author in PubMed Google Scholar
Azizul Azhar Ramli
View author publications
You can also search for this author in PubMed Google Scholar
Mohd Farhan Md Fudzee
View author publications
You can also search for this author in PubMed Google Scholar
Mohamad Aizi Salamat
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Information System Faculty of Comp. Sci. & Info. Tech., University of Malaya, Kuala Lumpur, Malaysia
Tutut Herawan
Faculty of Comp. Sci. and Info. Tech, Universiti Tun Hussein Onn Malaysia, Parit Raja, Malaysia
Rozaida Ghazali
Faculty of Comp. Sci. and Info. Tech., Universiti Tun Hussein Onn Malaysia, Parit Raja, Malaysia
Mustafa Mat Deris

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hassan, U.K., Nawi, N.M., Kasim, S., Ramli, A.A., Fudzee, M.F.M., Salamat, M.A. (2014). Classify a Protein Domain Using SVM Sigmoid Kernel. In: Herawan, T., Ghazali, R., Deris, M. (eds) Recent Advances on Soft Computing and Data Mining. Advances in Intelligent Systems and Computing, vol 287. Springer, Cham. https://doi.org/10.1007/978-3-319-07692-8_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-07692-8_14
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07691-1
Online ISBN: 978-3-319-07692-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics