Authors:
Jing Hu
and
Yihang Du
Affiliation:
Department of Computer Science, Franklin & Marshall College, Lancaster, PA, U.S.A.
Keyword(s):
K-Nearest Neighbors Method, Bit-Score Weighted Euclidean Distance, Feature Selection, Moonlighting Proteins.
Abstract:
High-throughput proteomics projects have resulted in a rapid accumulation of protein sequences in public databases. For the majority of these proteins, limited functional information has been known so far. Moonlighting proteins (MPs) are a class of proteins which perform at least two physiologically relevant distinct biochemical or biophysical functions. These proteins play important functional roles in enzymatic catalysis process, signal transduction, cellular regulation, and biological pathways. However, it has been proven to be difficult, time-consuming, and expensive to identify MPs experimentally. Therefore, computational approaches which can predict MPs are needed. In this study, we present MPKNN, a K-nearest neighbors method which can identify MPs with high efficiency and accuracy. The method is based on the bit-score weighted Euclidean distance, which is calculated from selected features derived from protein sequence. On a benchmark dataset, our method achieved 83% overall ac
curacy, 0.64 MCC, 0.87 F-measure, and 0.86 AUC.
(More)