A Statistical Method for Determining Importance of Variables in an Information System

Rudnicki, Witold R.; Kierczak, Marcin; Koronacki, Jacek; Komorowski, Jan

doi:10.1007/11908029_58

Witold R. Rudnicki²⁵,
Marcin Kierczak²⁶,
Jacek Koronacki²⁷ &
…
Jan Komorowski^25,26

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4259))

Included in the following conference series:

International Conference on Rough Sets and Current Trends in Computing

1340 Accesses
15 Citations

Abstract

A new method for estimation of attributes’ importance for supervised classification, based on the random forest approach, is presented. Essentially, an iterative scheme is applied, with each step consisting of several runs of the random forest program. Each run is performed on a suitably modified data set: values of each attribute found unimportant at earlier steps are randomly permuted between objects. At each step, apparent importance of an attribute is calculated and the attribute is declared unimportant if its importance is not uniformly better than that of the attributes earlier found unimportant. The procedure is repeated until only attributes scoring better than the randomized ones are retained. Statistical significance of the results so obtained is verified. This method has been applied to 12 data sets of biological origin. The method was shown to be more reliable than that based on standard application of a random forest to assess attributes’ importance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cowell, R.G., Dawid, A.P., Lauritzen, S.L., Spiegelharter, D.J.: Probabilistic networks and expert systems. Springer, New York (1999)
MATH Google Scholar
Bishop, C.M.: Neural Networks for Pattern Recognition. Clarendon Press, Oxford (1996)
MATH Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth International Group, Monterey (1984)
MATH Google Scholar
Ripley, B.D.: Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (1996)
MATH Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2001)
MATH Google Scholar
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers, San Francisco (1988)
Google Scholar
Pawlak, Z.: Information systems theoretical foundations. Inf. Syst. 6, 205–218 (1981); Rough Set Theory
Article MATH Google Scholar
Komorowski, J., Oehrn, A., Skowron, A.: ROSETTA Rough Sets. In: Klsgen, W., Zytkow, J. (eds.) Handbook of Data Mining and Knowledge Discovery, pp. 554–559. Oxford University Press, Oxford (2002)
Google Scholar
Bazan, J.G., Szczuka, M.S.: RSES and rSESlib - A collection of tools for rough set computations. In: Ziarko, W.P., Yao, Y. (eds.) RSCTC 2000. LNCS (LNAI), vol. 2005, pp. 106–113. Springer, Heidelberg (2001)
Chapter Google Scholar
Pawlak, Z.: Rough Sets. Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht (1991); Rough Set Theory
MATH Google Scholar
Ågotnes, T., Komorowski, J., Løken, T.: Taming Large Rule Models in Rough Set Approaches. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS, vol. 1704, pp. 193–203. Springer, Heidelberg (1999)
Chapter Google Scholar
Makosa, E.: Rule Tuning, MSc Thesis, The Linnaeus Center for Bioinformatics, Uppsala University (2005)
Google Scholar
Nguyen, H.S., Nguyen, S.H.: Pattern extraction from data. Fundamenta Informaticae 34, 129–144 (1998)
MATH MathSciNet Google Scholar
Nguyen, H.S., Skowron, A., Synak, P.: Discovery of data patterns with applications to decomposition and classfification problems. In: Polkowski, L., Skowron, A. (eds.) Rough Sets in Knowledge Discovery 2: Applications, Case Studies and Software Systems, pp. 55–97. Physica-Verlag, Heidelberg (1998)
Google Scholar
Breiman, L.: Random Forests. Machine Learning 45, 5–32 (2001), Also see the bibliography at: http://www.stat.berkeley.edu/~breiman/RandomForests/cc_papers.htm
Article MATH Google Scholar
Freund, Y., Schapire, R.: Experiments with a new boosting algorithm. In: Machine Learning: Proceedings of the Thirteenth International Conference, pp. 148–156. Morgan Kauffman, San Francisco (1996), Also see the bibliography at: http://www.cs.princeton.edu/~schapire/boost.html
Google Scholar
Duentsch, I., Gediga, G.: Uncertainty Measures of Rough Set Prediction. Artif. Intell. 106, 109–137 (1998)
Article MATH Google Scholar
Duentsch, I., Gediga, G.: Statistical evaluation of rough set dependency analysis. Int. J. Hum.-Comput. Stud. 46, 589–604 (1997)
Article Google Scholar

Download references

Author information

Authors and Affiliations

ICM, Warsaw University, Pawinskiego 5a, Warsaw, Poland
Witold R. Rudnicki & Jan Komorowski
The Linnaeus Centre for Bioinformatics, Uppsala University, Husargatan 3, Uppsala, Sweden
Marcin Kierczak & Jan Komorowski
Institute of Computer Science, Polish Academy of Sciences, J.K. Ordona 21, Warsaw, Poland
Jacek Koronacki

Authors

Witold R. Rudnicki
View author publications
You can also search for this author in PubMed Google Scholar
Marcin Kierczak
View author publications
You can also search for this author in PubMed Google Scholar
Jacek Koronacki
View author publications
You can also search for this author in PubMed Google Scholar
Jan Komorowski
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Economics, University of Catania, Corso Italia, 55, 95129, Catania, Italy
Salvatore Greco
Graduate School of Engineering, Department of Electrical Engineering and Computer Sciences, University of Hyogo, 2167 Shosha, 671-2280,, Himeji, Hyogo, Japan
Yutaka Hata
Department of Medical Informatics, Faculty of Medicine, Shimane University, 89-1 Enya-cho, Izumo, 693-8501, Shimane, Japan
Shoji Hirano
Department of Systems Innovation, Graduate School of Engineering Science, Osaka University, 1-3, Machikaneyama, Toyonaka, 560-8531, Osaka, Japan
Masahiro Inuiguchi
Department of Risk Engineering, School of Systems and Information Engineering, University of Tsukuba, 305-8573, Ibaraki, Japan
Sadaaki Miyamoto
Institute of Mathematics, Warsaw University, Banacha 2, 02-097, Warsaw, Poland
Hung Son Nguyen
Systems Research Institute, Polish Academy of Sciences, 01-447, Warsaw, Poland
Roman Słowiński

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rudnicki, W.R., Kierczak, M., Koronacki, J., Komorowski, J. (2006). A Statistical Method for Determining Importance of Variables in an Information System. In: Greco, S., et al. Rough Sets and Current Trends in Computing. RSCTC 2006. Lecture Notes in Computer Science(), vol 4259. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11908029_58

Download citation

DOI: https://doi.org/10.1007/11908029_58
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-47693-1
Online ISBN: 978-3-540-49842-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics