Skip to main content

Feature Ranking for Protein Classification

  • Conference paper
Computer Recognition Systems

Part of the book series: Advances in Soft Computing ((AINSC,volume 30))

  • 1561 Accesses

Abstract

In this paper, a knowledge discovery framework is used for protein classification. The processing is achieved in three steps: feature extraction, feature ranking and feature selection. Inspirited from text mining results for the first step, we use n-grams descriptors; descriptors are ranked from chi-2 statistical indices in the second step; and in the final step, the subset of descriptors is selected which will minimize the prediction error rate using a k-nearest neighbor classifier. Experiments show that this framework gives good results: the dimensionality reduction is effective and increases the classifier performances.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Fayyad UM, Shapiro G, Smyth P (1996) From data mining to knowledge discovery: An overview, Advances in Knowledge Discovery and Data Mining. AAAI Press and the MIT Press, Chapter 1: 1–34

    Google Scholar 

  2. Sebastiani F (2002) Machine learning in automated text categorisation. In ACM Surveys, 34(1): 1–47

    Article  Google Scholar 

  3. Mhamdi F, Elloumi M, Rakotomalala R (2004) Textmining, features selection and datamining for proteins classification. In IEEE/ICTTA’04, Damascus, Syria

    Google Scholar 

  4. Hastie T, Tibshirani R, Friedman J (2001) The Elements of Statical Learning: Datamining, Inference, and Prediction, Springer-Verlag

    Google Scholar 

  5. Lefébure R, Venturi G, (2001) Data mining: Gestion de la relation client personnalisation de sites web, Eyrolles

    Google Scholar 

  6. Molina LC, Belanche L, Nebot A (2002) Feature Selection Algorithms: A Survey and Experimental Evaluation, In ICDM’02, Maebashi City, Japan

    Google Scholar 

  7. Duch W, Wieczorek T, Biesiada J, Blachnik M (2004) Comparison of feature ranking methods based on information entropy Proc. of International Joint Conference on Neural Networks (IJCNN), Budapest, IEEE Press: 1415–1420

    Google Scholar 

  8. Isabelle G, André E (2003) An introduction to variable and feature selection. Journal of Machine Learning Research 3: 1157–1182

    Article  MATH  Google Scholar 

  9. Murzin GA, Brenner ES, Hubbard T, Chothia C (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Bio.. 247: 536–540

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mhamdi, F., Rakotomalala, R., Elloumi, M. (2005). Feature Ranking for Protein Classification. In: Kurzyński, M., Puchała, E., Woźniak, M., żołnierek, A. (eds) Computer Recognition Systems. Advances in Soft Computing, vol 30. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-32390-2_72

Download citation

  • DOI: https://doi.org/10.1007/3-540-32390-2_72

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25054-8

  • Online ISBN: 978-3-540-32390-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics