Skip to main content

Pairwise Protein Substring Alignment with Latent Semantic Analysis and Support Vector Machines to Detect Remote Protein Homology

  • Conference paper
Ubiquitous Computing and Multimedia Applications (UCMA 2011)

Abstract

Remote protein homology detection has been widely used as a part of the analysis of protein structure and function. In this study, the good quality of protein feature vectors is the main aspect to detect remote protein homology; as it will assist discriminative classifier model to discriminate all the proteins into homologue or non-homologue members precisely. In order for the protein feature vectors to be characterized as having good quality, the feature vectors must contain high protein structural similarity information and are represented in low dimension which is free from any contaminated data. In this study, the contaminated data which originates from protein dataset was investigated. This contaminated data may prevent remote protein homology detection framework to produce the best representation of high protein structural similarity information in order to detect the homology of proteins. To reduce the contaminated data and extract high protein structural similarity information, some research has been done on the extraction of protein feature vectors and protein similarity. The extraction of protein feature vectors of good quality is believed could assist in getting better result for remote protein homology detection. Where, the good quality of protein feature vectors containing the useful protein similarity information and represent in low dimension will be used to identify protein family precisely by discriminative classifier model. Referring to this factor, a method which combines Protein Substring Scoring (PSS) and Pairwise Protein Substring Alignment (PPSA) from sequence comparison model, chi-square and Singular Value Decomposition (SVD) from generative model, and Support Vector Machine (SVM) as discriminative classifier model is introduced.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. Journal of Computational Biology 215(3), 403–410 (1990)

    Google Scholar 

  2. Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(1), 121–167 (1998)

    Article  Google Scholar 

  3. Cai, Y.D., Liu, X.J., Xu, X.B., Zhou, G.P.: Support vector machines for predicting protein structural class. BMC Bioinformatics 2(3), 1471–2105 (2001)

    Google Scholar 

  4. Chou, K.C.: Review: structural bioinformatics and its impact to biomedical science. Current Medicinal Chemistry 11(16), 2105–2134 (2004)

    Article  Google Scholar 

  5. Chou, K.C., Elrod, D.W.: Prediction of membrane protein types and subcellular locations. Proteins: Structure Function Genetics 34(1), 137–153 (1999)

    Article  Google Scholar 

  6. Chou, K.C., Shen, H.B.: Predicting protein subcellular location by fusing multiple classifiers. Journal of Biochemistry and Cell 99(2), 517–527 (2006)

    Article  Google Scholar 

  7. Dong, Q.W., Lin, L., Wang, X.L., Li, M.H.: A pattern-based SVM for protein remote homology detection. In: International Conference on Machine Learning and Cybernetics of the Guangzhou of China, pp. 3363–3368 (2005)

    Google Scholar 

  8. Dong, Q., Wang, X.L., Lin, L.: Application of latent semantic analysis to protein remote homology detection. Bioinformatics 22(3), 285–290 (2006)

    Article  MathSciNet  Google Scholar 

  9. Fukushima, A., Wada, M., Kanaya, S., Arita, M.: SVD based anatomy of gene expressions for correlation analysis in arabidopsis thaliania. DNA Research 15(1), 367–374 (2008)

    Article  Google Scholar 

  10. Gabrys, B., Howlet, R.J., Jain, L.C.: Knowledge-Based intelligent information and engineering systems. In: Proceeding of the Tenth Conference KES of the Bournemouth of United Kingdom, pp. 393–400 (2006)

    Google Scholar 

  11. Gotoh, O.: An improved algorithm for matching biological sequences. Molecul Biology 162(1), 705–708 (1982)

    Article  Google Scholar 

  12. Jaakkola, T., Diekhans, M., Haussler, D.: A discriminative framework for detecting remote protein homologies. Journal of Bioinformatics and Computational Biology 7(1-2), 95–114 (2000)

    Google Scholar 

  13. Kelil, A., Wang, S., Brzezinski, R., Fleury, A.: CLUSS: clustering of protein sequences based on a new similarity measure. BMC Bioinformatics 8(1), 1–19 (2007)

    Article  Google Scholar 

  14. Kuang, R., Ie, E., Wang, K., Wang, K., Siddiqi, M., Freund, Y., Leslie, C.: Profile-Based string kernels for remote homology detection and motif extraction. Journal of Bioinformatics and Computational Biology 3(3), 152–160 (2004)

    Google Scholar 

  15. Landauer, T.K., Foltz, P.W., Laham, D.: Introduction to latent semantic analysis. Discourse Process 25(1), 259–284 (1998)

    Article  Google Scholar 

  16. Liao, L., Noble, S.N.: Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. Journal of Computational Biology 10(1), 857–868 (2003)

    Article  Google Scholar 

  17. Mohseni-Zadeh, S., Brezellec, P., Risler, J.L.: Cluster-C, an algorithm for the large-scale clustering of protein sequences based on the extraction of maximal cliques. Computational Biology and Chemistry 28(1), 211–218 (2004)

    Article  MATH  Google Scholar 

  18. Pearson, W.R.: Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymo 183(1), 63–98 (1990)

    Article  Google Scholar 

  19. Rigoutsos, I., Floratos, A.: Combinatorial pattern discovery in biological sequences: the TEIRESIAS algorithm. Bioinformatics 14(1), 55–67 (1998)

    Article  Google Scholar 

  20. Tang, Y., Jing, B., Zhang, Y.Q.: Granular support vector machines with association rules mining for protein homology prediction. Artificial Intelligence in Medicine 25(1), 121–134 (2005)

    Article  Google Scholar 

  21. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the International Conference on Machine Learning of the Salvador of Brazil, pp. 412–420 (1997)

    Google Scholar 

  22. Zaki, M.N., Deris, S.: Detecting remote protein evolutionary relationships via string scoring method. International Journal of Biomedical Sciences 2(1), 59–66 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ismail, S., Othman, R.M., Kasim, S. (2011). Pairwise Protein Substring Alignment with Latent Semantic Analysis and Support Vector Machines to Detect Remote Protein Homology. In: Kim, Th., Adeli, H., Robles, R.J., Balitanas, M. (eds) Ubiquitous Computing and Multimedia Applications. UCMA 2011. Communications in Computer and Information Science, vol 151. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20998-7_60

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20998-7_60

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20997-0

  • Online ISBN: 978-3-642-20998-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics