Skip to main content

A Statistical Method for Determining Importance of Variables in an Information System

  • Conference paper
Book cover Rough Sets and Current Trends in Computing (RSCTC 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4259))

Included in the following conference series:

Abstract

A new method for estimation of attributes’ importance for supervised classification, based on the random forest approach, is presented. Essentially, an iterative scheme is applied, with each step consisting of several runs of the random forest program. Each run is performed on a suitably modified data set: values of each attribute found unimportant at earlier steps are randomly permuted between objects. At each step, apparent importance of an attribute is calculated and the attribute is declared unimportant if its importance is not uniformly better than that of the attributes earlier found unimportant. The procedure is repeated until only attributes scoring better than the randomized ones are retained. Statistical significance of the results so obtained is verified. This method has been applied to 12 data sets of biological origin. The method was shown to be more reliable than that based on standard application of a random forest to assess attributes’ importance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cowell, R.G., Dawid, A.P., Lauritzen, S.L., Spiegelharter, D.J.: Probabilistic networks and expert systems. Springer, New York (1999)

    MATH  Google Scholar 

  2. Bishop, C.M.: Neural Networks for Pattern Recognition. Clarendon Press, Oxford (1996)

    MATH  Google Scholar 

  3. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth International Group, Monterey (1984)

    MATH  Google Scholar 

  4. Ripley, B.D.: Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (1996)

    MATH  Google Scholar 

  5. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2001)

    MATH  Google Scholar 

  6. Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers, San Francisco (1988)

    Google Scholar 

  7. Pawlak, Z.: Information systems theoretical foundations. Inf. Syst. 6, 205–218 (1981); Rough Set Theory

    Article  MATH  Google Scholar 

  8. Komorowski, J., Oehrn, A., Skowron, A.: ROSETTA Rough Sets. In: Klsgen, W., Zytkow, J. (eds.) Handbook of Data Mining and Knowledge Discovery, pp. 554–559. Oxford University Press, Oxford (2002)

    Google Scholar 

  9. Bazan, J.G., Szczuka, M.S.: RSES and rSESlib - A collection of tools for rough set computations. In: Ziarko, W.P., Yao, Y. (eds.) RSCTC 2000. LNCS (LNAI), vol. 2005, pp. 106–113. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  10. Pawlak, Z.: Rough Sets. Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht (1991); Rough Set Theory

    MATH  Google Scholar 

  11. Ågotnes, T., Komorowski, J., Løken, T.: Taming Large Rule Models in Rough Set Approaches. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS, vol. 1704, pp. 193–203. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  12. Makosa, E.: Rule Tuning, MSc Thesis, The Linnaeus Center for Bioinformatics, Uppsala University (2005)

    Google Scholar 

  13. Nguyen, H.S., Nguyen, S.H.: Pattern extraction from data. Fundamenta Informaticae 34, 129–144 (1998)

    MATH  MathSciNet  Google Scholar 

  14. Nguyen, H.S., Skowron, A., Synak, P.: Discovery of data patterns with applications to decomposition and classfification problems. In: Polkowski, L., Skowron, A. (eds.) Rough Sets in Knowledge Discovery 2: Applications, Case Studies and Software Systems, pp. 55–97. Physica-Verlag, Heidelberg (1998)

    Google Scholar 

  15. Breiman, L.: Random Forests. Machine Learning 45, 5–32 (2001), Also see the bibliography at: http://www.stat.berkeley.edu/~breiman/RandomForests/cc_papers.htm

    Article  MATH  Google Scholar 

  16. Freund, Y., Schapire, R.: Experiments with a new boosting algorithm. In: Machine Learning: Proceedings of the Thirteenth International Conference, pp. 148–156. Morgan Kauffman, San Francisco (1996), Also see the bibliography at: http://www.cs.princeton.edu/~schapire/boost.html

    Google Scholar 

  17. Duentsch, I., Gediga, G.: Uncertainty Measures of Rough Set Prediction. Artif. Intell. 106, 109–137 (1998)

    Article  MATH  Google Scholar 

  18. Duentsch, I., Gediga, G.: Statistical evaluation of rough set dependency analysis. Int. J. Hum.-Comput. Stud. 46, 589–604 (1997)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rudnicki, W.R., Kierczak, M., Koronacki, J., Komorowski, J. (2006). A Statistical Method for Determining Importance of Variables in an Information System. In: Greco, S., et al. Rough Sets and Current Trends in Computing. RSCTC 2006. Lecture Notes in Computer Science(), vol 4259. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11908029_58

Download citation

  • DOI: https://doi.org/10.1007/11908029_58

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-47693-1

  • Online ISBN: 978-3-540-49842-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics