Improving Prediction Accuracy via Subspace Modeling in a Statistical Geometry Based Computational Protein Mutagenesis

Improving Prediction Accuracy via Subspace Modeling in a Statistical Geometry Based Computational Protein Mutagenesis

Majid Masso
Copyright: © 2010 |Volume: 1 |Issue: 4 |Pages: 15
ISSN: 1947-9115|EISSN: 1947-9123|EISBN13: 9781613502921|DOI: 10.4018/jkdb.2010100103
Cite Article Cite Article

MLA

Masso, Majid. "Improving Prediction Accuracy via Subspace Modeling in a Statistical Geometry Based Computational Protein Mutagenesis." IJKDB vol.1, no.4 2010: pp.54-68. http://doi.org/10.4018/jkdb.2010100103

APA

Masso, M. (2010). Improving Prediction Accuracy via Subspace Modeling in a Statistical Geometry Based Computational Protein Mutagenesis. International Journal of Knowledge Discovery in Bioinformatics (IJKDB), 1(4), 54-68. http://doi.org/10.4018/jkdb.2010100103

Chicago

Masso, Majid. "Improving Prediction Accuracy via Subspace Modeling in a Statistical Geometry Based Computational Protein Mutagenesis," International Journal of Knowledge Discovery in Bioinformatics (IJKDB) 1, no.4: 54-68. http://doi.org/10.4018/jkdb.2010100103

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

A computational mutagenesis is detailed whereby each single residue substitution in a protein chain of primary sequence length N is represented as a sparse N-dimensional feature vector, whose M << N nonzero components locally quantify environmental perturbations occurring at the mutated position and its neighbors in the protein structure. The methodology makes use of both the Delaunay tessellation algorithm for representing protein structures, as well as a four-body, knowledge based, statistical contact potential. Feature vectors for each subset of mutants due to all possible residue substitutions at a particular position cohabit the same M-dimensional subspace, where the value of M and the identities of the M nonzero components are similarly position dependent. The approach is used to characterize a large experimental dataset of single residue substitutions in bacteriophage T4 lysozyme, each categorized as either unaffected or affected based on the measured level of mutant activity relative to that of the native protein. Performance of a single classifier trained with the collective set of mutants in N-space is compared to that of an ensemble of position-specific classifiers trained using disjoint mutant subsets residing in significantly smaller subspaces. Results suggest that significant improvements can be achieved through subspace modeling.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.