Abstract
In this paper, a knowledge discovery framework is used for protein classification. The processing is achieved in three steps: feature extraction, feature ranking and feature selection. Inspirited from text mining results for the first step, we use n-grams descriptors; descriptors are ranked from chi-2 statistical indices in the second step; and in the final step, the subset of descriptors is selected which will minimize the prediction error rate using a k-nearest neighbor classifier. Experiments show that this framework gives good results: the dimensionality reduction is effective and increases the classifier performances.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Fayyad UM, Shapiro G, Smyth P (1996) From data mining to knowledge discovery: An overview, Advances in Knowledge Discovery and Data Mining. AAAI Press and the MIT Press, Chapter 1: 1–34
Sebastiani F (2002) Machine learning in automated text categorisation. In ACM Surveys, 34(1): 1–47
Mhamdi F, Elloumi M, Rakotomalala R (2004) Textmining, features selection and datamining for proteins classification. In IEEE/ICTTA’04, Damascus, Syria
Hastie T, Tibshirani R, Friedman J (2001) The Elements of Statical Learning: Datamining, Inference, and Prediction, Springer-Verlag
Lefébure R, Venturi G, (2001) Data mining: Gestion de la relation client personnalisation de sites web, Eyrolles
Molina LC, Belanche L, Nebot A (2002) Feature Selection Algorithms: A Survey and Experimental Evaluation, In ICDM’02, Maebashi City, Japan
Duch W, Wieczorek T, Biesiada J, Blachnik M (2004) Comparison of feature ranking methods based on information entropy Proc. of International Joint Conference on Neural Networks (IJCNN), Budapest, IEEE Press: 1415–1420
Isabelle G, André E (2003) An introduction to variable and feature selection. Journal of Machine Learning Research 3: 1157–1182
Murzin GA, Brenner ES, Hubbard T, Chothia C (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Bio.. 247: 536–540
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mhamdi, F., Rakotomalala, R., Elloumi, M. (2005). Feature Ranking for Protein Classification. In: Kurzyński, M., Puchała, E., Woźniak, M., żołnierek, A. (eds) Computer Recognition Systems. Advances in Soft Computing, vol 30. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-32390-2_72
Download citation
DOI: https://doi.org/10.1007/3-540-32390-2_72
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25054-8
Online ISBN: 978-3-540-32390-7
eBook Packages: EngineeringEngineering (R0)