Abstract
Remote protein homology detection has been widely used as a part of the analysis of protein structure and function. In this study, the good quality of protein feature vectors is the main aspect to detect remote protein homology; as it will assist discriminative classifier model to discriminate all the proteins into homologue or non-homologue members precisely. In order for the protein feature vectors to be characterized as having good quality, the feature vectors must contain high protein structural similarity information and are represented in low dimension which is free from any contaminated data. In this study, the contaminated data which originates from protein dataset was investigated. This contaminated data may prevent remote protein homology detection framework to produce the best representation of high protein structural similarity information in order to detect the homology of proteins. To reduce the contaminated data and extract high protein structural similarity information, some research has been done on the extraction of protein feature vectors and protein similarity. The extraction of protein feature vectors of good quality is believed could assist in getting better result for remote protein homology detection. Where, the good quality of protein feature vectors containing the useful protein similarity information and represent in low dimension will be used to identify protein family precisely by discriminative classifier model. Referring to this factor, a method which combines Protein Substring Scoring (PSS) and Pairwise Protein Substring Alignment (PPSA) from sequence comparison model, chi-square and Singular Value Decomposition (SVD) from generative model, and Support Vector Machine (SVM) as discriminative classifier model is introduced.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. Journal of Computational Biology 215(3), 403–410 (1990)
Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(1), 121–167 (1998)
Cai, Y.D., Liu, X.J., Xu, X.B., Zhou, G.P.: Support vector machines for predicting protein structural class. BMC Bioinformatics 2(3), 1471–2105 (2001)
Chou, K.C.: Review: structural bioinformatics and its impact to biomedical science. Current Medicinal Chemistry 11(16), 2105–2134 (2004)
Chou, K.C., Elrod, D.W.: Prediction of membrane protein types and subcellular locations. Proteins: Structure Function Genetics 34(1), 137–153 (1999)
Chou, K.C., Shen, H.B.: Predicting protein subcellular location by fusing multiple classifiers. Journal of Biochemistry and Cell 99(2), 517–527 (2006)
Dong, Q.W., Lin, L., Wang, X.L., Li, M.H.: A pattern-based SVM for protein remote homology detection. In: International Conference on Machine Learning and Cybernetics of the Guangzhou of China, pp. 3363–3368 (2005)
Dong, Q., Wang, X.L., Lin, L.: Application of latent semantic analysis to protein remote homology detection. Bioinformatics 22(3), 285–290 (2006)
Fukushima, A., Wada, M., Kanaya, S., Arita, M.: SVD based anatomy of gene expressions for correlation analysis in arabidopsis thaliania. DNA Research 15(1), 367–374 (2008)
Gabrys, B., Howlet, R.J., Jain, L.C.: Knowledge-Based intelligent information and engineering systems. In: Proceeding of the Tenth Conference KES of the Bournemouth of United Kingdom, pp. 393–400 (2006)
Gotoh, O.: An improved algorithm for matching biological sequences. Molecul Biology 162(1), 705–708 (1982)
Jaakkola, T., Diekhans, M., Haussler, D.: A discriminative framework for detecting remote protein homologies. Journal of Bioinformatics and Computational Biology 7(1-2), 95–114 (2000)
Kelil, A., Wang, S., Brzezinski, R., Fleury, A.: CLUSS: clustering of protein sequences based on a new similarity measure. BMC Bioinformatics 8(1), 1–19 (2007)
Kuang, R., Ie, E., Wang, K., Wang, K., Siddiqi, M., Freund, Y., Leslie, C.: Profile-Based string kernels for remote homology detection and motif extraction. Journal of Bioinformatics and Computational Biology 3(3), 152–160 (2004)
Landauer, T.K., Foltz, P.W., Laham, D.: Introduction to latent semantic analysis. Discourse Process 25(1), 259–284 (1998)
Liao, L., Noble, S.N.: Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. Journal of Computational Biology 10(1), 857–868 (2003)
Mohseni-Zadeh, S., Brezellec, P., Risler, J.L.: Cluster-C, an algorithm for the large-scale clustering of protein sequences based on the extraction of maximal cliques. Computational Biology and Chemistry 28(1), 211–218 (2004)
Pearson, W.R.: Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymo 183(1), 63–98 (1990)
Rigoutsos, I., Floratos, A.: Combinatorial pattern discovery in biological sequences: the TEIRESIAS algorithm. Bioinformatics 14(1), 55–67 (1998)
Tang, Y., Jing, B., Zhang, Y.Q.: Granular support vector machines with association rules mining for protein homology prediction. Artificial Intelligence in Medicine 25(1), 121–134 (2005)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the International Conference on Machine Learning of the Salvador of Brazil, pp. 412–420 (1997)
Zaki, M.N., Deris, S.: Detecting remote protein evolutionary relationships via string scoring method. International Journal of Biomedical Sciences 2(1), 59–66 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ismail, S., Othman, R.M., Kasim, S. (2011). Pairwise Protein Substring Alignment with Latent Semantic Analysis and Support Vector Machines to Detect Remote Protein Homology. In: Kim, Th., Adeli, H., Robles, R.J., Balitanas, M. (eds) Ubiquitous Computing and Multimedia Applications. UCMA 2011. Communications in Computer and Information Science, vol 151. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20998-7_60
Download citation
DOI: https://doi.org/10.1007/978-3-642-20998-7_60
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20997-0
Online ISBN: 978-3-642-20998-7
eBook Packages: Computer ScienceComputer Science (R0)