Skip to main content

Classify a Protein Domain Using SVM Sigmoid Kernel

  • Conference paper
Recent Advances on Soft Computing and Data Mining

Abstract

Protein domains are discrete portion of protein sequence that can fold independently with their own function. Protein domain classification is important for multiple reasons, which including determines the protein function in order to manufacture new protein with new function. However, there are several issues that need to be addressed in protein domain classification which include increasing domain signal and accurate classify to their category. Therefore, to overcome this issue, this paper proposed a new approach to classify protein domain from protein subsequences and protein structure information using SVM sigmoid kernel. The proposed method consists of three phases: Data generating, creating sequence information and classification. The data generating phase selects potential protein and generates clear domain information. The creating sequence information phase used several calculations to generate protein structure information in order to optimize the domain signal. The classification phase involves SVM sigmoid kernel and performance evaluation. The performance of the approach method is evaluated in terms of sensitivity and specificity on single-domain and multiple-domain using dataset SCOP 1.75. The result on SVM sigmoid kernel shown higher accuracy compare with single neural network and double neural network for single and multiple domain prediction. This proposed approach develops in order to solve the problem of coincidently group into both categories either single or multiple domain. This method showed an improvement of classification in term of sensitivity, specificity and accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Melvin, I., Weston, J., Leslie, C.S., Noble, W.S.: Combining Classifiers for Improved Classification of Proteins from Sequence or Structure. BMC Bioinformatics 9, 38–389 (2008)

    Article  Google Scholar 

  2. Portugaly, E., Harel, A., Linial, N., Linial, M.: EVEREST: Automatic Identification and Classification of Protein Domains in All Protein Sequences. BMC Bioinformatics 7, 27–286 (2006)

    Article  Google Scholar 

  3. Nagaranjan, N., Yona, G.: Automatic Prediction of Protein Domain from Sequence Information using a Hybrid Learning System. Bioinformatics 20, 1335–1360 (2004)

    Article  Google Scholar 

  4. Gewehr, J.E., Zimmer, R.: SSEP-Domain: Protein Domain Prediction by Alignment of Secondary Structure Elements and Profiles. Bioinformatics 22, 181–187 (2006)

    Article  Google Scholar 

  5. Orengo, A., Michie, A.D., Jones, S., Jones, D.T., Swindells, M.B., Thornton, J.M.: CATH-a Hierarchic Classification of Protein Domain Structures. Structure 5, 1093–1108 (1997)

    Article  Google Scholar 

  6. Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: A Structural Classification of Protein Database for the Investigation of Sequences and Structures. Journal of Molecular Biology 247, 536–540 (1995)

    Google Scholar 

  7. Pei, J., Grishin, N.V.: PROMALS: Towards Accurate Multiple Sequence Alignments of Distantly Related Protein. Bioinformatics 23, 802–808 (2007)

    Article  Google Scholar 

  8. Vinayagam, A., Shi, J., Pugalenthi, G., Meenakshi, B., Blundell, T.L., Sowdhamini, R.: DDBASE2.0: Updated Domain Database with Improved Identification of Structural Domains. Bioinformatics 19, 1760–1764 (2003)

    Article  Google Scholar 

  9. Lexa, M., Valle, G.: Pimex: Rapid Identification of Oligonucleotide Matches in whole Genomes. Bioinformatics 19, 2486–2488 (2003)

    Article  Google Scholar 

  10. Finn, R.D., Mistry, J., Schuster-Bockler, B., Griffiths-Jones, S., Hollich, V., Lassmann, T., Moxon, S., Marshall, M., Khanna, A., Durbin, R., Eddy, S.R., Sonnhammer, E.L., Bateman, A.: Pfam: Clans, Web Tools and Services. Nucleic Acids Research 34, D247–D251 (2006)

    Google Scholar 

  11. Marchler, A., Anderson, J.B., Derbyshire, M.K., DeWeese-Scott, C., Gonzales, N.R., Gwadz, M., Hao, L., He, S., Hurwitz, D.I., Jackson, J.D., Zhaoxi, K., Krylov, D., Lanczycki, C.J., Liebert, C.A., Liu, C., Lu, F., Lu, S., Marchler, G.H., Mullokandov, M., Song, J.S., Thanki, N., Yamashita, R.A., Yin, J.J., Zhang, D., Bryant, S.H.: CDD: A Conserved Domain Database for Interactive Domain Family Analysis. Nucleic Acids Research 35, D237–D240 (2005)

    Google Scholar 

  12. Letunic, I., Copley, R.R., Pils, B., Pinkert, S., Schultz, J., Bork, P.: SMART 5: Domains in the Context of Genomes and Networks. Nucleic Acids Research 34, D257–D260 (2006)

    Google Scholar 

  13. Wheelan, S.J., Marchler-Bauer, A., Bryant, S.H.: Domain Size Distributions can Predict Domain Boundaries. Bioinformatics 16, 613–618 (2000)

    Article  Google Scholar 

  14. Lu, T., Dou, Y., Zhang, C.: Fuzzy clustering of CPP family in plants with evolution and interaction analyses. BMC Bioinformatics 14, S10 (2013)

    Google Scholar 

  15. Chen, Y., Xu, J., Yang, B., Zhao, Y., He, W.: A novel method for prediction of protein interaction sites based on integrated RBF neural networks. Comput. Biol. Med. 42, 402–407 (2012)

    Article  Google Scholar 

  16. Liang, L., Felgner, P.L.: Predicting antigenicity of proteins in a bacterial proteome; a protein microarray and naive Bayes classification approach. Chem. Biodivers. 9, 977–990 (2012)

    Article  Google Scholar 

  17. Medina, F., Aguila, S., Baratto, M.C., Martorana, A., Basosi, R., Alderete, J.B., Vazquez-Duhalt, R.: Prediction model based on decision tree analysis for laccase mediators. Enzyme Microb. Technol. 52, 68–76 (2013)

    Article  Google Scholar 

  18. Sun, H., Wang, S.: Penalized logistic regression for high-dimensional DNA methylation data with case-control studies. Bioinformatics 28, 1368–1375 (2012)

    Article  Google Scholar 

  19. Xin, M., Jiansheng, W., Xiaoyun, X.: Identification of DNA-Binding Proteins Using Support Vector Machine with Sequence Information. Computational Mathematical Methods in Medicine 1, 524502 (2013)

    Google Scholar 

  20. Vinay, N., Monalisa, D., Sowmya, S.M., Ramya, K.S., Valadi, K.J.: Identification of Penicillin-binding proteins employing support vector machines and random forest. Bioinformation 9, 481–484 (2013)

    Article  Google Scholar 

  21. Ruoying, C., Wenjing, C., Sixiao, Y., Di, W., Yong, W., Yingjie, T., Yong, S.: Rigorous assessment and integration of the sequence and structure based features to predict hot spots. BMC Bioinformatics 12, 311 (2011)

    Article  Google Scholar 

  22. David, A., Hai, F., Owen, J.L., Rackham, D.W., Ralph, P., Cyrus, C., Julian, G.: SUPERFAMILY 1.75 including a domain-centric gene ontology method. Nucleic Acids Res. 39, D427–D434 (2011)

    Google Scholar 

  23. Kalsum, H.U., Shah, Z.A., Othman, R.M., Hassan, R., Rahim, S.M., Asmuni, H., Taliba, J., Zakaria, Z.: SPlitSSI-SVM: an algorithm to reduce the misleading and increase the strength of domain signal. Comput. Biol. Med. 39, 1013–1019 (2009)

    Article  Google Scholar 

  24. Sievers, F., Wilm, A., Dineen, D., Gibson, T.J., Karplus, K., Li, W., Lopez, R., McWilliam, H., Remmert, M., Soding, J., Thompson, J.D., Higgins, D.G.: Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7, 539 (2011)

    Article  Google Scholar 

  25. Eickholt, J., Deng, X., Cheng, J.: DoBo: Protein domain boundary prediction by integrating evolutionary signals and machine learning. BMC Bioinformatics 12, 1471 (2011)

    Article  Google Scholar 

  26. Kalsum, H.U., Nazri, M.N., Shahreen, K.: A New Approach for Protein Domain Prediction by Using Double Stage Neural Network. Adv. Sci. Eng. Med. 6, 129–132 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Hassan, U.K., Nawi, N.M., Kasim, S., Ramli, A.A., Fudzee, M.F.M., Salamat, M.A. (2014). Classify a Protein Domain Using SVM Sigmoid Kernel. In: Herawan, T., Ghazali, R., Deris, M. (eds) Recent Advances on Soft Computing and Data Mining. Advances in Intelligent Systems and Computing, vol 287. Springer, Cham. https://doi.org/10.1007/978-3-319-07692-8_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-07692-8_14

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-07691-1

  • Online ISBN: 978-3-319-07692-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics