Abstract
A huge number of protein sequences have been generated and collected. However, the functions of most of them are still unknown. Protein subcellular localization is important to elucidate protein function. It would be worthwhile to develop a method to predict the subcellular location for a given protein when only the amino acid sequence of the protein is known. Although many efforts have been done to accomplish such a task, there is the need for further research to improve the accuracy of prediction. In this paper, with K-local Hyperplane Distance Nearest Neighbor algorithm (HKNN) as base classifier, an ensemble classifier is proposed to predict the subcellular locations of proteins in eukaryotic cells. Each basic HKNN classifiers are constructed from a separated feature set, and finally combined with majority voting scheme. Results obtained through 5-fold cross-validation test on the same protein dataset showed an improvement in pre-diction accuracy over existing algorithms.
Supported by National Science Foundation of China under grant No. 60603007 and Science and Technology Development Foundation of Shandong Province, China under grant No. 2006GG2201005.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Chou, K.C.: Review: prediction of protein structural classes and subcellular locations. Current Protein and Peptide Science 1, 171–208 (2000)
Park, K.J., Kanehisa, M.: Prediction of Protein Subcellular Locations by Support Vector Machines using Compositions of Amino Acids and Amino Acid Pairs. Bioinformatics 19(13), 1656–1663 (2003)
Hua, S.J., Sun, Z.R.: Support vector machine approach for protein subcellular localization prediction. Bioinformatics 17(8), 721–728 (2001)
Matsuda, S., et al.: A novel representation of protein sequences for prediction of subcellular location using support vector machines. Protein Science 14, 2804–2813 (2005)
Cai, Y.D., et al.: Artificial neural network model for predicting protein subcellular location. Computers and Chemistry 26, 179–182 (2002)
Emanuelsson, O., et al.: Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. Journal of Molecular Biology 300(4), 1005–1016 (2000)
Lu, Z., et al.: Predicting subcellular localization of proteins using machine-learned classifiers. Bioinformatics 20(4), 547–556 (2004)
Huang, Y., Li, Y.: Prediction of protein subcellular locations using fuzzy k-NN method. Bioinformatics 20(1), 21–28 (2004)
Nakashima, H., Nishikawa, K.: Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies. Journal of Molecular Biology 238(1), 54–61 (1994)
Chou, K.C.: Prediction of protein cellular attributes using pseudo-amino-acid-composition. Proteins 43(3), 246–255 (2001)
Nair, R., Rost, B.: Better prediction of sub-cellular localization by combining evolutionary and structural information. Proteins 53, 917–930 (2003)
Cai, Y.D., et al.: Support vector machines for predicting membrane protein types by using functional domain composition. Biophysical Journal 84(5), 3257–3263 (2003)
Vincent, P., Bengio, Y.: K-local hyperplane and convex distance nearest neighbor algorithms. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) NIPS. Advances in Neural Information Processing Systems, vol. 14, pp. 985–992. MIT Press, Cambridge, MA (2002)
Yang, M.Q., Yang, J.Y.: Identification of Intrinsically Unstructured Regions in Proteins Using Primary Structure. In: Arabnia, H.R., Valafar, H. (eds.) BIOCOMP 2006. Proceedings of the 2006 International Conference on Bioinformatics & Computational Biology, pp. 303–309. CSREA Press (2006)
Freund, Y.: Boosting a weak learning algorithm by majority. Information and computation 121(2), 256–285 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Liu, H., Feng, H., Zhu, D. (2007). Prediction of Protein Subcellular Locations by Combining K-Local Hyperplane Distance Nearest Neighbor. In: Alhajj, R., Gao, H., Li, J., Li, X., Zaïane, O.R. (eds) Advanced Data Mining and Applications. ADMA 2007. Lecture Notes in Computer Science(), vol 4632. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73871-8_32
Download citation
DOI: https://doi.org/10.1007/978-3-540-73871-8_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73870-1
Online ISBN: 978-3-540-73871-8
eBook Packages: Computer ScienceComputer Science (R0)