Data Shrinking Based Feature Ranking for Protein Classification

Dua, Sumeet; Saini, Sheetal

doi:10.1007/978-3-642-00405-6_10

Sumeet Dua⁴ &
Sheetal Saini⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 31))

Included in the following conference series:

International Conference on Information Systems, Technology and Management

1337 Accesses
1 Citations

Abstract

High throughput data domains such as proteomics and expression mining frequently challenge data mining algorithms with unprecedented dimensionality and size. High dimensionality has detrimental effects on the performance of data mining algorithms. Many dimensionality reduction techniques, including feature ranking and selection, have been used to diminish the curse of dimensionality. Protein classification and feature ranking are both classical problem domains, which have been explored extensively in the past. We propose a data mining based algorithm to address the problem of ranking and selecting feature descriptors for the physiochemical properties of a protein, which are generally used for discriminative method protein classification. We present a novel data shrinking-based method of ranking and feature descriptor selection for physiochemical properties. The proposed methodology is employed to demonstrate the discriminative power of top ranked features for protein structural classification. Our experimental study shows that our top ranked feature descriptors produce competitive and superior classification results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ong, S., Lin, H., Chen, Y., Li, Z., Cao, Z.: Efficacy of Different Protein Descriptors in Predicting Protein Functional Families. BMC Bioinformatics 8 (2007)
Google Scholar
Tan, A., Gilbert, D., Deville, Y.: Multi-Class Protein Fold Classification using a New Ensemble Machine Learning Approach. Genome Informatics 14, 206–217 (2003)
Google Scholar
Chinnasamy, A., Sung, W., Mittal, A.: Protein Structural and Fold Prediction using Tree-Augmented Naïve Bayesian Classifier. In: Proceedings of 9th Pacific Symposium on Biocomputing, pp. 387–398. World Scientific Press, Hawaii (2004)
Google Scholar
Ding, C., Dubchak, I.: Multi-Class Protein Fold Recognition using Support Vector Machines and Neural Networks. Bioinformatics Journal 17, 349–358 (2001)
Article Google Scholar
Shi, Y., Song, Y., Zhang, A.: A Shrinking-Based Approach for Multi-Dimensional Data Analysis. In: Proceedings of 29th Very Large Data Bases Conference, pp. 440–451 (2003)
Google Scholar
Shi, Y., Song, Y., Zhang, A.: A Shrinking-Based Clustering Approach for Multi-Dimensional Data. IEEE Transaction on Knowledge and Data Engineering 17, 1389–1403 (2005)
Article Google Scholar
Lin, K., Lin, C.Y., Huang, C., Chang, H., Yang, C., Lin, C.T., Tang, C., Hsu, D.: Feature Selection and Combination Criteria for Improving Accuracy in Protein Structure Prediction. IEEE Transaction on NanoBioscience 6, 186–196 (2007)
Article Google Scholar
Mhamdi, F., Rakotomalala, R., Elloumi, M.: Feature Ranking for Protein Classification. Computer Recognition Systems 30, 611–617 (2005)
Article Google Scholar
Rakotomalala, R., Mhamdi, F., Elloumi, M.: Hybrid Feature Ranking for Proteins Classification. Advanced Data Mining and Applications 3584, 610–617 (2005)
Article Google Scholar
Lin, C., Lin, K., Huang, C., Chang, H., Yang, C., Lin, C., Tang, C., Hsu, D.: Feature Selection and Combination Criteria for Improving Predictive Accuracy in Protein Structure Classification. In: Proceedings of 5th IEEE Symposium on Bioinformatics and Bioengineering, pp. 311–315 (2005)
Google Scholar
Shi, Y., Song, Y., Zhang, A.: A Shrinking-Based Dimension Reduction Approach for Multi-Dimensional Data Analysis. In: Proceedings of 16th International Conference on Scientific and Statistical Database Management, Greece, pp. 427–428 (2004)
Google Scholar
Kundu, S.: Gravitational Clustering: A New Approach Based on the Spatial Distribution of the Points. Pattern Recognition 32, 1149–1160 (1999)
Article Google Scholar
Ravi, T., Gowda, K.: Clustering of Symbolic Objects using Gravitational Approach. IEEE Transactions on Systems, Man, and Cybernetics –Part B: Cybernetics 29, 888–894 (1999)
Article Google Scholar
Gomez, J., Dasgupta, D., Nasraoui, O.: A New Gravitational Clustering Algorithm. In: Proceedings of 3rd SIAM International Conference on Data Mining, San Francisco (2003)
Google Scholar
Georgescu, B., Shimshoni, I., Meer, P.: Mean Shift Based Clustering in High Dimensions: A Texture Classification Example. In: Proceedings of 9^th IEEE International Conference on Computer Vision, vol. 1, pp. 456–464 (2003)
Google Scholar
Wang, X., Qiu, W., Zamar, R.: CLUES: A Non-Parametric Clustering Method Based on Local Shrinking. Computational Statistics & Data Analysis 52, 286–298 (2007)
Article MathSciNet MATH Google Scholar
Duch, W., Wieczorek, T., Biesiada, J., Blachnik, M.: Comparison of Feature Ranking Methods Based on Information Entropy. In: Proceedings of IEEE International Joint Conference on Neural Networks, Budapest, pp. 1415–1419 (2004)
Google Scholar
Kira, K., Rendell, L.: A Practical Approach to Feature Selection. In: Proceedings of 9th International Workshop on Machine Learning, pp. 249–256 (1992)
Google Scholar
Liu, H., Setiono, R.: Chi2: Feature Selection and Descretization of Numeric Attributes. In: Proceedings of 7th International Conference on Tools with Artificial Intelligence, pp. 388–391 (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Louisiana Tech University, Ruston, LA 71272, USA
Sumeet Dua & Sheetal Saini

Authors

Sumeet Dua
View author publications
You can also search for this author in PubMed Google Scholar
Sheetal Saini
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Science, Georgia State University, 34 Peachtree Street, P.O. Box, Atlanta, GA, USA
Sushil K. Prasad
Institute of Management Technology, Ghaziabad, India
Susmi Routray & Reema Khurana &
Department of Computer and Information Science and Technology, University of Florida, USA
Sartaj Sahni

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dua, S., Saini, S. (2009). Data Shrinking Based Feature Ranking for Protein Classification. In: Prasad, S.K., Routray, S., Khurana, R., Sahni, S. (eds) Information Systems, Technology and Management. ICISTM 2009. Communications in Computer and Information Science, vol 31. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00405-6_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-00405-6_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00404-9
Online ISBN: 978-3-642-00405-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics