Skip to main content

DRFE: Dynamic Recursive Feature Elimination for Gene Identification Based on Random Forest

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4234))

Abstract

Determining the relevant features is a combinatorial task in various fields of machine learning such as text mining, bioinformatics, pattern recognition, etc. Several scholars have developed various methods to extract the relevant features but no method is really superior. Breiman proposed Random Forest to classify a pattern based on CART tree algorithm and his method turns out good results compared to other classifiers. Taking advantages of Random Forest and using wrapper approach which was first introduced by Kohavi et. al, we propose an algorithm named Dynamic Recursive Feature Elimination (DRFE) to find the optimal subset of features for reducing noise of the data and increasing the performance of classifiers. In our method, we use Random Forest as induced classifier and develop our own defined feature elimination function by adding extra terms to the feature scoring. We conducted experiments with two public datasets: Colon cancer and Leukemia cancer. The experimental results of the real world data showed that the proposed method has higher prediction rate compared to the baseline algorithm. The obtained results are comparable and sometimes have better performance than the widely used classification methods in the same literature of feature selection.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   139.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kohavi, R., John, G.H.: Wrappers for Feature Subset Selection. Artificial Intelligence, 273–324 (1997)

    Google Scholar 

  2. Blum, A.L., Langley, P.: Selection of Relevant Features and Examples in Machine Learning. Artificial Intelligence, 245–271 (1997)

    Google Scholar 

  3. Breiman, L.: Random forest. Machine Learning 45, 5–32 (2001)

    Article  MATH  Google Scholar 

  4. Torkkola, K., Venkatesan, S.: Huan Liu: Sensor selection for maneuver classification. In: Proceedings. The 7th International IEEE Conference on Intelligent Transportation Systems, pp. 636–641 (2004)

    Google Scholar 

  5. Wu, Y., Zhang, A.: Feature selection for classifying high-dimensional numerical data. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 251–258 (2004)

    Google Scholar 

  6. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. John Wiley, Chichester (2001)

    MATH  Google Scholar 

  7. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Chapman and Hall, New York (1984)

    MATH  Google Scholar 

  8. Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, J.P., Mesirov, J., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286, 531–537 (1999)

    Article  Google Scholar 

  9. Fröhlich, H., Chapelle, O., Schölkopf, B.: Feature Selection for Support Vector Machines by Means of Genetic Algorithms. In: 15th IEEE International Conference on Tools with Artificial Intelligence, p. 142 (2003)

    Google Scholar 

  10. Chen, X.-w.: Gene Selection for Cancer Classification Using Bootstrapped Genetic Algorithms and Support Vector Machines. In: IEEE Computer Society Bioinformatics Conference, p. 504 (2003)

    Google Scholar 

  11. Zhang, H., Yu, C.-Y., Singer, B.: Cell and tumor classification using gene expression data: Construction of forests. Proceeding of the National Academy of Sciences of the United States of America 100, 4168–4172 (2003)

    Article  Google Scholar 

  12. Das, S.: Filters, wrappers and a boosting-based hybrid for feature selection. In: Proceedings of the 18th ICML (2001)

    Google Scholar 

  13. Ng, A.Y.: On feature selection: learning with exponentially many irrelevant features as training examples. In: Proceedings of the Fifteenth International Conference on Machine Learning (1998)

    Google Scholar 

  14. Xing, E., Jordan, M., Carp, R.: Feature selection for highdimensional genomic microarray data. In: Proc. of the 18th ICML (2001)

    Google Scholar 

  15. Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D., Levine, A.: Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays. Proceedings of National Academy of Sciences of the United States of American 96, 6745–6750 (1999)

    Article  Google Scholar 

  16. Nguyen, H.-N., Ohn, S.-Y., Park, J., Park, K.-S.: Combined Kernel Function Approach in SVM for Diagnosis of Cancer. In: Proceedings of the First International Conference on Natural Computation (2005)

    Google Scholar 

  17. Su, T., Basu, M., Toure, A.: Multi-Domain Gating Network for Classification of Cancer Cells using Gene Expression Data. In: Proceedings of the International Joint Conference on Neural Networks, pp. 286–289 (2002)

    Google Scholar 

  18. Mehta, M., Agrawal, R., Rissanen, J.: SLIQ: A Fast Scalable Classifier for Data Mining. In: Proceeding of the International Conference on Extending Database Technology, pp. 18–32 (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nguyen, HN., Ohn, SY. (2006). DRFE: Dynamic Recursive Feature Elimination for Gene Identification Based on Random Forest. In: King, I., Wang, J., Chan, LW., Wang, D. (eds) Neural Information Processing. ICONIP 2006. Lecture Notes in Computer Science, vol 4234. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11893295_1

Download citation

  • DOI: https://doi.org/10.1007/11893295_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-46484-6

  • Online ISBN: 978-3-540-46485-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics