Skip to main content

Data Shrinking Based Feature Ranking for Protein Classification

  • Conference paper
Information Systems, Technology and Management (ICISTM 2009)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 31))

Abstract

High throughput data domains such as proteomics and expression mining frequently challenge data mining algorithms with unprecedented dimensionality and size. High dimensionality has detrimental effects on the performance of data mining algorithms. Many dimensionality reduction techniques, including feature ranking and selection, have been used to diminish the curse of dimensionality. Protein classification and feature ranking are both classical problem domains, which have been explored extensively in the past. We propose a data mining based algorithm to address the problem of ranking and selecting feature descriptors for the physiochemical properties of a protein, which are generally used for discriminative method protein classification. We present a novel data shrinking-based method of ranking and feature descriptor selection for physiochemical properties. The proposed methodology is employed to demonstrate the discriminative power of top ranked features for protein structural classification. Our experimental study shows that our top ranked feature descriptors produce competitive and superior classification results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ong, S., Lin, H., Chen, Y., Li, Z., Cao, Z.: Efficacy of Different Protein Descriptors in Predicting Protein Functional Families. BMC Bioinformatics 8 (2007)

    Google Scholar 

  2. Tan, A., Gilbert, D., Deville, Y.: Multi-Class Protein Fold Classification using a New Ensemble Machine Learning Approach. Genome Informatics 14, 206–217 (2003)

    Google Scholar 

  3. Chinnasamy, A., Sung, W., Mittal, A.: Protein Structural and Fold Prediction using Tree-Augmented Naïve Bayesian Classifier. In: Proceedings of 9th Pacific Symposium on Biocomputing, pp. 387–398. World Scientific Press, Hawaii (2004)

    Google Scholar 

  4. Ding, C., Dubchak, I.: Multi-Class Protein Fold Recognition using Support Vector Machines and Neural Networks. Bioinformatics Journal 17, 349–358 (2001)

    Article  Google Scholar 

  5. Shi, Y., Song, Y., Zhang, A.: A Shrinking-Based Approach for Multi-Dimensional Data Analysis. In: Proceedings of 29th Very Large Data Bases Conference, pp. 440–451 (2003)

    Google Scholar 

  6. Shi, Y., Song, Y., Zhang, A.: A Shrinking-Based Clustering Approach for Multi-Dimensional Data. IEEE Transaction on Knowledge and Data Engineering 17, 1389–1403 (2005)

    Article  Google Scholar 

  7. Lin, K., Lin, C.Y., Huang, C., Chang, H., Yang, C., Lin, C.T., Tang, C., Hsu, D.: Feature Selection and Combination Criteria for Improving Accuracy in Protein Structure Prediction. IEEE Transaction on NanoBioscience 6, 186–196 (2007)

    Article  Google Scholar 

  8. Mhamdi, F., Rakotomalala, R., Elloumi, M.: Feature Ranking for Protein Classification. Computer Recognition Systems 30, 611–617 (2005)

    Article  Google Scholar 

  9. Rakotomalala, R., Mhamdi, F., Elloumi, M.: Hybrid Feature Ranking for Proteins Classification. Advanced Data Mining and Applications 3584, 610–617 (2005)

    Article  Google Scholar 

  10. Lin, C., Lin, K., Huang, C., Chang, H., Yang, C., Lin, C., Tang, C., Hsu, D.: Feature Selection and Combination Criteria for Improving Predictive Accuracy in Protein Structure Classification. In: Proceedings of 5th IEEE Symposium on Bioinformatics and Bioengineering, pp. 311–315 (2005)

    Google Scholar 

  11. Shi, Y., Song, Y., Zhang, A.: A Shrinking-Based Dimension Reduction Approach for Multi-Dimensional Data Analysis. In: Proceedings of 16th International Conference on Scientific and Statistical Database Management, Greece, pp. 427–428 (2004)

    Google Scholar 

  12. Kundu, S.: Gravitational Clustering: A New Approach Based on the Spatial Distribution of the Points. Pattern Recognition 32, 1149–1160 (1999)

    Article  Google Scholar 

  13. Ravi, T., Gowda, K.: Clustering of Symbolic Objects using Gravitational Approach. IEEE Transactions on Systems, Man, and Cybernetics –Part B: Cybernetics 29, 888–894 (1999)

    Article  Google Scholar 

  14. Gomez, J., Dasgupta, D., Nasraoui, O.: A New Gravitational Clustering Algorithm. In: Proceedings of 3rd SIAM International Conference on Data Mining, San Francisco (2003)

    Google Scholar 

  15. Georgescu, B., Shimshoni, I., Meer, P.: Mean Shift Based Clustering in High Dimensions: A Texture Classification Example. In: Proceedings of 9th IEEE International Conference on Computer Vision, vol. 1, pp. 456–464 (2003)

    Google Scholar 

  16. Wang, X., Qiu, W., Zamar, R.: CLUES: A Non-Parametric Clustering Method Based on Local Shrinking. Computational Statistics & Data Analysis 52, 286–298 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  17. Duch, W., Wieczorek, T., Biesiada, J., Blachnik, M.: Comparison of Feature Ranking Methods Based on Information Entropy. In: Proceedings of IEEE International Joint Conference on Neural Networks, Budapest, pp. 1415–1419 (2004)

    Google Scholar 

  18. Kira, K., Rendell, L.: A Practical Approach to Feature Selection. In: Proceedings of 9th International Workshop on Machine Learning, pp. 249–256 (1992)

    Google Scholar 

  19. Liu, H., Setiono, R.: Chi2: Feature Selection and Descretization of Numeric Attributes. In: Proceedings of 7th International Conference on Tools with Artificial Intelligence, pp. 388–391 (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dua, S., Saini, S. (2009). Data Shrinking Based Feature Ranking for Protein Classification. In: Prasad, S.K., Routray, S., Khurana, R., Sahni, S. (eds) Information Systems, Technology and Management. ICISTM 2009. Communications in Computer and Information Science, vol 31. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00405-6_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00405-6_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00404-9

  • Online ISBN: 978-3-642-00405-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics