Skip to main content

An Effective Feature Selection Algorithm Based on the Class Similarity Used with a SVM-RDA Classifier to Protein Fold Recognition

  • Conference paper
  • 1418 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6679))

Abstract

Feature selection is very important procedure in many pattern recognition problems. It is effective in reducing dimensionality, removing irrelevant data, and increasing accuracy of a classifier. In our previous work we propose a classifier combining the support vector machine (SVM) classifier with regularized discriminant analysis (RDA) classifier used to protein fold recognition problem. However high dimensionality of the feature vectors and small number of samples in the training data set caused that the problem is ill-posed for an RDA classifier and the feature selection is crucible for the accuracy of the classifier. In this paper we propose a simple and effective algorithm based on the class similarity which solves our problem and helps us to achieve very good acuracy on a real-world data set.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baldi, P., Brunak, S., Chauvin, Y., Andersen, C., Nielsen, H.: Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16, 412–424 (2000)

    Article  Google Scholar 

  2. Bologna, G., Appel, R.D.: A comparison study on protein fold recognition. In: Proceedings of the 9th ICONIP, Singapore, vol. 5, pp. 2492–2496 (2002)

    Google Scholar 

  3. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. Software (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm

  4. Chmielnicki, W., Sta̧por, K.: Protein Fold Recognition with Combined SVM-RDA Classifier. In: Graña Romay, M., Corchado, E., Garcia Sebastian, M.T. (eds.) HAIS 2010. LNCS, vol. 6076, pp. 162–169. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  5. Devijver, P.A., Kittler, J.: Pattern Recognition: A Statistical Approach. Prentice-Hall, Englewood Cliffs (1982)

    MATH  Google Scholar 

  6. Ding, C.H., Dubchak, I.: Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17, 349–358 (2001)

    Article  Google Scholar 

  7. Dubchak, I., Muchnik, I., Kim, S.H.: Protein folding class predictor for SCOP: approach based on global descriptors. In: Proceedings ISMB (1997)

    Google Scholar 

  8. Friedman, J.H.: Regularized Discriminant Analysis. Journal of the American Statistical Association 84(405), 165–175 (1989)

    Article  MathSciNet  Google Scholar 

  9. Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd edn. Academic Press, New York (1990)

    MATH  Google Scholar 

  10. Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. Journal of Machine Learning Research 3, 1157–1182 (2003)

    MATH  Google Scholar 

  11. Haindl, M., Somol, P., Ververidis, D., Kotropoulos, C.: Feature Selection Based on Mutual Correlation. In: Proceedings of Progress in Pattern Recognition, Image Analysis and Application, vol. 4225, pp. 569–577 (2006)

    Google Scholar 

  12. Hobohm, U., Sander, C.: Enlarged representative set of Proteins. Protein Sci. 3, 522–524 (1994)

    Article  Google Scholar 

  13. Hobohm, U., Scharf, M., Schneider, R., Sander, C.: Selection of a representative set of structures from the Brookhaven Protein Bank. Protein Sci. 1, 409–417 (1992)

    Article  Google Scholar 

  14. Lai, C., Reinders, M.J., Wessels, L.: Random subspace method for multivariate feature selection. Pattern Recognition Letters 27(10), 1067–1076 (2006)

    Article  Google Scholar 

  15. Liu, C.L., Fujisawa, H.: Classification and Learning for Character Recognition: Comparison of Methods and Remaining Problems. In: Proc. Int. Workshop on Neural Networks and Learning in Document Analysis and Recognition, Seoul, Korea (2005)

    Google Scholar 

  16. Lo Conte, L., Ailey, B., Hubbard, T.J.P., Brenner, S.E., Murzin, A.G., Chotchia, C.: SCOP: A structural classification of protein database. Nucleic Acids Res. 28, 257–259 (2000)

    Article  Google Scholar 

  17. Nanni, L.: A novel ensemble of classifiers for protein fold recognition. Neurocomputing 69, 2434–2437 (2006)

    Article  Google Scholar 

  18. Okun, O.: Protein fold recognition with k-local hyperplane distance nearest neighbor algorithm. In: Proceedings of the Second European Workshop on Data Mining and Text Mining in Bioinformatics, Pisa, Italy, September 24, pp. 51–57 (2004)

    Google Scholar 

  19. Pal, N.R., Chakraborty, D.: Some new features for protein fold recognition. In: Artificial Neural Networks and Neural Information Processing ICANN/ICONIP, Turkey, Istanbul, June 26–29, vol. 2714, pp. 1176–1183 (2003)

    Google Scholar 

  20. Shen, H.B., Chou, K.C.: Ensemble classifier for protein fold pattern recognition. Bioinformatics 22, 1717–1722 (2006)

    Article  Google Scholar 

  21. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chmielnicki, W., Sta̧por, K. (2011). An Effective Feature Selection Algorithm Based on the Class Similarity Used with a SVM-RDA Classifier to Protein Fold Recognition. In: Corchado, E., Kurzyński, M., Woźniak, M. (eds) Hybrid Artificial Intelligent Systems. HAIS 2011. Lecture Notes in Computer Science(), vol 6679. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21222-2_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21222-2_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21221-5

  • Online ISBN: 978-3-642-21222-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics