Abstract
Feature selection is very important procedure in many pattern recognition problems. It is effective in reducing dimensionality, removing irrelevant data, and increasing accuracy of a classifier. In our previous work we propose a classifier combining the support vector machine (SVM) classifier with regularized discriminant analysis (RDA) classifier used to protein fold recognition problem. However high dimensionality of the feature vectors and small number of samples in the training data set caused that the problem is ill-posed for an RDA classifier and the feature selection is crucible for the accuracy of the classifier. In this paper we propose a simple and effective algorithm based on the class similarity which solves our problem and helps us to achieve very good acuracy on a real-world data set.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Baldi, P., Brunak, S., Chauvin, Y., Andersen, C., Nielsen, H.: Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16, 412–424 (2000)
Bologna, G., Appel, R.D.: A comparison study on protein fold recognition. In: Proceedings of the 9th ICONIP, Singapore, vol. 5, pp. 2492–2496 (2002)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. Software (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chmielnicki, W., Sta̧por, K.: Protein Fold Recognition with Combined SVM-RDA Classifier. In: Graña Romay, M., Corchado, E., Garcia Sebastian, M.T. (eds.) HAIS 2010. LNCS, vol. 6076, pp. 162–169. Springer, Heidelberg (2010)
Devijver, P.A., Kittler, J.: Pattern Recognition: A Statistical Approach. Prentice-Hall, Englewood Cliffs (1982)
Ding, C.H., Dubchak, I.: Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17, 349–358 (2001)
Dubchak, I., Muchnik, I., Kim, S.H.: Protein folding class predictor for SCOP: approach based on global descriptors. In: Proceedings ISMB (1997)
Friedman, J.H.: Regularized Discriminant Analysis. Journal of the American Statistical Association 84(405), 165–175 (1989)
Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd edn. Academic Press, New York (1990)
Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Haindl, M., Somol, P., Ververidis, D., Kotropoulos, C.: Feature Selection Based on Mutual Correlation. In: Proceedings of Progress in Pattern Recognition, Image Analysis and Application, vol. 4225, pp. 569–577 (2006)
Hobohm, U., Sander, C.: Enlarged representative set of Proteins. Protein Sci. 3, 522–524 (1994)
Hobohm, U., Scharf, M., Schneider, R., Sander, C.: Selection of a representative set of structures from the Brookhaven Protein Bank. Protein Sci. 1, 409–417 (1992)
Lai, C., Reinders, M.J., Wessels, L.: Random subspace method for multivariate feature selection. Pattern Recognition Letters 27(10), 1067–1076 (2006)
Liu, C.L., Fujisawa, H.: Classification and Learning for Character Recognition: Comparison of Methods and Remaining Problems. In: Proc. Int. Workshop on Neural Networks and Learning in Document Analysis and Recognition, Seoul, Korea (2005)
Lo Conte, L., Ailey, B., Hubbard, T.J.P., Brenner, S.E., Murzin, A.G., Chotchia, C.: SCOP: A structural classification of protein database. Nucleic Acids Res. 28, 257–259 (2000)
Nanni, L.: A novel ensemble of classifiers for protein fold recognition. Neurocomputing 69, 2434–2437 (2006)
Okun, O.: Protein fold recognition with k-local hyperplane distance nearest neighbor algorithm. In: Proceedings of the Second European Workshop on Data Mining and Text Mining in Bioinformatics, Pisa, Italy, September 24, pp. 51–57 (2004)
Pal, N.R., Chakraborty, D.: Some new features for protein fold recognition. In: Artificial Neural Networks and Neural Information Processing ICANN/ICONIP, Turkey, Istanbul, June 26–29, vol. 2714, pp. 1176–1183 (2003)
Shen, H.B., Chou, K.C.: Ensemble classifier for protein fold pattern recognition. Bioinformatics 22, 1717–1722 (2006)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chmielnicki, W., Sta̧por, K. (2011). An Effective Feature Selection Algorithm Based on the Class Similarity Used with a SVM-RDA Classifier to Protein Fold Recognition. In: Corchado, E., Kurzyński, M., Woźniak, M. (eds) Hybrid Artificial Intelligent Systems. HAIS 2011. Lecture Notes in Computer Science(), vol 6679. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21222-2_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-21222-2_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21221-5
Online ISBN: 978-3-642-21222-2
eBook Packages: Computer ScienceComputer Science (R0)