Relevant and Non-Redundant Amino Acid Sequence Selection for Protein Functional Site Identification

Relevant and Non-Redundant Amino Acid Sequence Selection for Protein Functional Site Identification

Chandra Das, Pradipta Maji
Copyright: © 2010 |Volume: 2 |Issue: 2 |Pages: 25
ISSN: 1942-9045|EISSN: 1942-9037|EISBN13: 9781609604196|DOI: 10.4018/jssci.2010040102
Cite Article Cite Article

MLA

Das, Chandra, and Pradipta Maji. "Relevant and Non-Redundant Amino Acid Sequence Selection for Protein Functional Site Identification." IJSSCI vol.2, no.2 2010: pp.19-43. http://doi.org/10.4018/jssci.2010040102

APA

Das, C. & Maji, P. (2010). Relevant and Non-Redundant Amino Acid Sequence Selection for Protein Functional Site Identification. International Journal of Software Science and Computational Intelligence (IJSSCI), 2(2), 19-43. http://doi.org/10.4018/jssci.2010040102

Chicago

Das, Chandra, and Pradipta Maji. "Relevant and Non-Redundant Amino Acid Sequence Selection for Protein Functional Site Identification," International Journal of Software Science and Computational Intelligence (IJSSCI) 2, no.2: 19-43. http://doi.org/10.4018/jssci.2010040102

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

In order to apply a powerful pattern recognition algorithm to predict functional sites in proteins, amino acids cannot be used directly as inputs since they are non-numerical variables. Therefore, they need encoding prior to input. In this regard, the bio-basis function maps a non-numerical sequence space to a numerical feature space. One of the important issues for the bio-basis function is how to select a minimum set of bio-basis strings with maximum information. In this paper, an efficient method to select bio-basis strings for the bio-basis function is described integrating the concepts of the Fisher ratio and “degree of resemblance”. The integration enables the method to select a minimum set of most informative bio-basis strings. The “degree of resemblance” enables efficient selection of a set of distinct bio-basis strings. In effect, it reduces the redundant features in numerical feature space. Quantitative indices are proposed for evaluating the quality of selected bio-basis strings. The effectiveness of the proposed bio-basis string selection method, along with a comparison with existing methods, is demonstrated on different data sets.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.