Research article
Support vector machine with a Pearson VII function kernel for discriminating halophilic and non-halophilic proteins

https://doi.org/10.1016/j.compbiolchem.2013.05.001Get rights and content
Under a Creative Commons license
open access

Highlights

  • We defined the normalized amino acid composition and found halophilic proteins have fewer small residues.

  • We introduced a novel kernel named Pearson VII function kernel and developed a support vector machine classifier to discriminate halophilic and non-halophilic proteins.

  • The prediction accuracy was very encouraging and reached 91.7%.

  • The reason of worse performance for small proteins may be related with the missing of some important residues.

Abstract

Understanding of proteins adaptive to hypersaline environment and identifying them is a challenging task and would help to design stable proteins. Here, we have systematically analyzed the normalized amino acid compositions of 2121 halophilic and 2400 non-halophilic proteins. The results showed that halophilic protein contained more Asp at the expense of Lys, Ile, Cys and Met, fewer small and hydrophobic residues, and showed a large excess of acidic over basic amino acids. Then, we introduce a support vector machine method to discriminate the halophilic and non-halophilic proteins, by using a novel Pearson VII universal function based kernel. In the three validation check methods, it achieved an overall accuracy of 97.7%, 91.7% and 86.9% and outperformed other machine learning algorithms. We also address the influence of protein size on prediction accuracy and found the worse performance for small size proteins might be some significant residues (Cys and Lys) were missing in the proteins.

Abbreviations

Weka
Waikato environment for knowledge analysis
RBF neural network
radial basis function neural network
SVMs
support vector machine
PUFK
Pearson VII universal function kernel
SE
sensitivity
SP
specificity
ACC
accuracy
MCC
Matthew’s correlation coefficient
ROC
receiver operating characteristic
TP
true positives
FN
false negatives
TN
true negatives
FP
false positives

Keywords

Halophile
Pearson VII function kernel
Support vector machine
Amino acid composition
Hypersaline adaptation

Cited by (0)