Abstract
In this paper we distinguish three different types of missing attribute values: lost values (e.g., erased values), “do not care” conditions (attribute values that were irrelevant for classification a case), and attribute-concept values (“do not care” conditions restricted to a specific concept). As it is known, subset and concept approximations should be used for knowledge acquisition from incomplete data sets. We report results of experiments on seven well-known incomplete data sets using nine strategies: interpreting missing attribute values in three different ways and using both lower and upper, subset and concept approximations (note that subset lower approximations are identical with concept lower approximations). Additionally, in the data sets cases with more than approximately 70% of missing attribute values, these values were removed from the original data sets and then all nine strategies were applied. Our conclusions are that any two of our nine strategies are incomparable in terms of error rates (5% significance level, two-tailed test). However, for some data sets removing cases with an excessive number of missing attribute values improves the error rate.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
1. Greco, S., Matarazzo, B., and Slowinski, R.: Dealing with missing data in rough set analysis of multi-attribute and multi-criteria decision problems. In Decision Making: Recent developments and Worldwide Applications, ed. by S. H. Zanakis, G. Doukidis, and Z. Zopounidis, Kluwer Academic Publishers, Dordrecht, Boston, London, 2000, 295–316.
2. Grzymala-Busse, J.W.: On the unknown attribute values in learning from examples. Proc. of the ISMIS-91, 6th International Symposium on Methodologies for Intelligent Systems, Charlotte, North Carolina, October 16–19, 1991. Lecture Notes in Artificial Intelligence, vol. 542, Springer-Verlag, Berlin, Heidelberg, New York (1991) 368–377.
3. Grzymala-Busse, J. W.: LERS—A system for learning from examples based on rough sets. In Intelligent Decision Support. Handbook of Applications and Advances of the Rough Sets Theory. Slowinski, R. (ed.), Kluwer Academic Publishers, Dordrecht, Boston, London (1992) 3–18.
4. Grzymala-Busse, J. W.: A new version of the rule induction system LERS. Fundamenta Informaticae 31 (1997), 27–39.
5. Grzymala-Busse., J.W.: MLEM2: A new algorithm for rule induction from imperfect data. Proceedings of the 9th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, IPMU 2002, July 1–5, Annecy, France, 243–250.
6. Grzymala-Busse, J.W.: Rough set strategies to data with missing attribute values. Workshop Notes, Foundations and New Directions of Data Mining, the 3-rd International Conference on Data Mining, Melbourne, FL, USA, November 19–22, 2003, 56–63.
7. Grzymala-Busse, J.W.: Data with missing attribute values: Generalization of idiscernibility relation and rule induction. Transactions on Rough Sets, Lecture Notes in Computer Science Journal Subline, Springer-Verlag, vol. 1 (2004) 78– 95.
8. Grzymala-Busse, J.W.: Characteristic relations for incomplete data: A generalization of the indiscernibility relation. Proceedings of the RSCTC'2004, the Fourth International Conference on Rough Sets and Current Trends in Computing, Uppsala, Sweden, June 1–5, 2004. Lecture Notes in Artificial Intelligence 3066, Springer-Verlag 2004, 244–253.
9. Grzymala-Busse, J.W.: Three approaches to missing attribute values—A rough set perspective. Proceedings of the Workshop on Foundation of Data Mining, associated with the Fourth IEEE International Conference on Data Mining, Brighton, UK, November 1–4, 2004, 55–62.
10. Grzymala-Busse, J.W.: Incomplete data and generalization of indiscernibility relation, definability, and approximations. Proceedings of the RSFDGrC'2005, the Tenth International Conference on Rough Sets, Fuzzy Sets, data Mining, and Granular Computing, Springer-Verlag, Regina, Canada, September 1–3, 2005, 244–253.
11. Grzymala-Busse, J.W. and Hu, M.: A comparison of several approaches to missing attribute values in data mining. Proceedings of the Second International Conference on Rough Sets and Current Trends in Computing RSCTC'2000, Ban., Canada, October 16–19, 2000, 340–347.
12. Grzymala-Busse, J.W. and Siddhaye, S.: Rough set approaches to rule induction from incomplete data. Proceedings of the IPMU'2004, the 10th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Perugia, Italy, July 4–9, 2004, vol. 2, 923–930.
13. Grzymala-Busse, J.W. and Wang A.Y.: Modified algorithms LEM1 and LEM2 for rule induction from data with missing attribute values. Proc. of the Fifth International Workshop on Rough Sets and Soft Computing (RSSC'97) at the Third Joint Conference on Information Sciences (JCIS'97), Research Triangle Park, NC, March 2–5, 1997, 69–72.
14. Kryszkiewicz, M.: Rough set approach to incomplete information systems. Proceedings of the Second Annual Joint Conference on Information Sciences, Wrightsville Beach, NC, September 28–October 1, 1995, 194–197.
15. Kryszkiewicz, M.: Rules in incomplete information systems. Information Sciences 113 (1999) 271–292.
16. Lin, T.Y.: Topological and fuzzy rough sets. In Intelligent Decision Support. Handbook of Applications and Advances of the Rough Sets Theory, ed. by R. Slowinski, Kluwer Academic Publishers, Dordrecht, Boston, London (1992) 287–304.
17. Pawlak, Z.: Rough Sets. International Journal of Computer and Information Sciences 11 (1982) 341–356.
18. Pawlak, Z.: Rough Sets. Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht, Boston, London (1991).
19. Slowinski, R. and Vanderpooten, D.: A generalized de.nition of rough approximations based on similarity. IEEE Transactions on Knowledge and Data Engineering 12 (2000) 331–336.
20. Stefanowski, J.: Algorithms of Decision Rule Induction in Data Mining. Poznan University of Technology Press, Poznan, Poland (2001).
21. Stefanowski, J. and Tsoukias, A.: On the extension of rough sets under incomplete information. Proceedings of the 7th International Workshop on New Directions in Rough Sets, Data Mining, and Granular-Soft Computing, RSFDGrC' 1999, Ube, Yamaguchi, Japan, November 8-10, 1999, 73–81.
22. Stefanowski, J. and Tsoukias, A.: Incomplete information tables and rough classi.cation. Computational Intelligence 17 (2001) 545–566.
23. Yao, Y.Y.: On the generalizing rough set theory. Proc. of the 9th Int. Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing (RSFDGrC'2003), Chongqing, China, October 19-22, 2003, 44–51.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer
About this paper
Cite this paper
Grzymała-Busse, J.W., Santoso, S. (2006). Experiments on Data with Three Interpretations of Missing Attribute Values—A Rough Set Approach. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds) Intelligent Information Processing and Web Mining. Advances in Soft Computing, vol 35. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-33521-8_14
Download citation
DOI: https://doi.org/10.1007/3-540-33521-8_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33520-7
Online ISBN: 978-3-540-33521-4
eBook Packages: EngineeringEngineering (R0)