Abstract
High throughput data domains such as proteomics and expression mining frequently challenge data mining algorithms with unprecedented dimensionality and size. High dimensionality has detrimental effects on the performance of data mining algorithms. Many dimensionality reduction techniques, including feature ranking and selection, have been used to diminish the curse of dimensionality. Protein classification and feature ranking are both classical problem domains, which have been explored extensively in the past. We propose a data mining based algorithm to address the problem of ranking and selecting feature descriptors for the physiochemical properties of a protein, which are generally used for discriminative method protein classification. We present a novel data shrinking-based method of ranking and feature descriptor selection for physiochemical properties. The proposed methodology is employed to demonstrate the discriminative power of top ranked features for protein structural classification. Our experimental study shows that our top ranked feature descriptors produce competitive and superior classification results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ong, S., Lin, H., Chen, Y., Li, Z., Cao, Z.: Efficacy of Different Protein Descriptors in Predicting Protein Functional Families. BMC Bioinformatics 8 (2007)
Tan, A., Gilbert, D., Deville, Y.: Multi-Class Protein Fold Classification using a New Ensemble Machine Learning Approach. Genome Informatics 14, 206–217 (2003)
Chinnasamy, A., Sung, W., Mittal, A.: Protein Structural and Fold Prediction using Tree-Augmented Naïve Bayesian Classifier. In: Proceedings of 9th Pacific Symposium on Biocomputing, pp. 387–398. World Scientific Press, Hawaii (2004)
Ding, C., Dubchak, I.: Multi-Class Protein Fold Recognition using Support Vector Machines and Neural Networks. Bioinformatics Journal 17, 349–358 (2001)
Shi, Y., Song, Y., Zhang, A.: A Shrinking-Based Approach for Multi-Dimensional Data Analysis. In: Proceedings of 29th Very Large Data Bases Conference, pp. 440–451 (2003)
Shi, Y., Song, Y., Zhang, A.: A Shrinking-Based Clustering Approach for Multi-Dimensional Data. IEEE Transaction on Knowledge and Data Engineering 17, 1389–1403 (2005)
Lin, K., Lin, C.Y., Huang, C., Chang, H., Yang, C., Lin, C.T., Tang, C., Hsu, D.: Feature Selection and Combination Criteria for Improving Accuracy in Protein Structure Prediction. IEEE Transaction on NanoBioscience 6, 186–196 (2007)
Mhamdi, F., Rakotomalala, R., Elloumi, M.: Feature Ranking for Protein Classification. Computer Recognition Systems 30, 611–617 (2005)
Rakotomalala, R., Mhamdi, F., Elloumi, M.: Hybrid Feature Ranking for Proteins Classification. Advanced Data Mining and Applications 3584, 610–617 (2005)
Lin, C., Lin, K., Huang, C., Chang, H., Yang, C., Lin, C., Tang, C., Hsu, D.: Feature Selection and Combination Criteria for Improving Predictive Accuracy in Protein Structure Classification. In: Proceedings of 5th IEEE Symposium on Bioinformatics and Bioengineering, pp. 311–315 (2005)
Shi, Y., Song, Y., Zhang, A.: A Shrinking-Based Dimension Reduction Approach for Multi-Dimensional Data Analysis. In: Proceedings of 16th International Conference on Scientific and Statistical Database Management, Greece, pp. 427–428 (2004)
Kundu, S.: Gravitational Clustering: A New Approach Based on the Spatial Distribution of the Points. Pattern Recognition 32, 1149–1160 (1999)
Ravi, T., Gowda, K.: Clustering of Symbolic Objects using Gravitational Approach. IEEE Transactions on Systems, Man, and Cybernetics –Part B: Cybernetics 29, 888–894 (1999)
Gomez, J., Dasgupta, D., Nasraoui, O.: A New Gravitational Clustering Algorithm. In: Proceedings of 3rd SIAM International Conference on Data Mining, San Francisco (2003)
Georgescu, B., Shimshoni, I., Meer, P.: Mean Shift Based Clustering in High Dimensions: A Texture Classification Example. In: Proceedings of 9th IEEE International Conference on Computer Vision, vol. 1, pp. 456–464 (2003)
Wang, X., Qiu, W., Zamar, R.: CLUES: A Non-Parametric Clustering Method Based on Local Shrinking. Computational Statistics & Data Analysis 52, 286–298 (2007)
Duch, W., Wieczorek, T., Biesiada, J., Blachnik, M.: Comparison of Feature Ranking Methods Based on Information Entropy. In: Proceedings of IEEE International Joint Conference on Neural Networks, Budapest, pp. 1415–1419 (2004)
Kira, K., Rendell, L.: A Practical Approach to Feature Selection. In: Proceedings of 9th International Workshop on Machine Learning, pp. 249–256 (1992)
Liu, H., Setiono, R.: Chi2: Feature Selection and Descretization of Numeric Attributes. In: Proceedings of 7th International Conference on Tools with Artificial Intelligence, pp. 388–391 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dua, S., Saini, S. (2009). Data Shrinking Based Feature Ranking for Protein Classification. In: Prasad, S.K., Routray, S., Khurana, R., Sahni, S. (eds) Information Systems, Technology and Management. ICISTM 2009. Communications in Computer and Information Science, vol 31. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00405-6_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-00405-6_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00404-9
Online ISBN: 978-3-642-00405-6
eBook Packages: Computer ScienceComputer Science (R0)