Abstract
In this paper, we review possible strategies for handling missing values in separate-and-conquer rule learning algorithms, and compare them experimentally on a large number of datasets. In particular through a careful study with data with controlled levels of missing values we get additional insights on the strategies’ different biases w.r.t. attributes with missing values. Somewhat surprisingly, a strategy that implements a strong bias against the use of attributes with missing values, exhibits the best average performance on 24 datasets from the UCI repository.
Similar content being viewed by others
Notes
The amputed attributes were checking_status, duration, credit_history for credit-g, bxqsq, rimmx, wkna8 for krkp, and intensity-mean, rawred-mean, hue-mean for segment.
References
Breiman, L., Friedman, J. H., Olshen, R., & Stone, C. (1984). Classification and regression trees. Pacific Grove: Wadsworth & Brooks.
Bruha, I., & Franek, F. (1996). Comparison of various routines for unknown attribute value processing: The covering paradigm. International Journal of Pattern Recognition and Artificial Intelligence, 10(8), 939–955.
Burdick, D., Deshpande, P. M., Jayram, T. S., Ramakrishnan, R., & Vaithyanathan, S. (2007). OLAP over uncertain and imprecise data. The International Journal on Very Large Data Bases, 16(1), 123–144.
Clark, P., & Boswell R. (1991). Rule induction with CN2: Some recent improvements. In Proceedings of the 5th European working session on learning (EWSL-91) (pp. 151–163). Porto: Springer.
Clark, P., & Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3(4), 261–283.
Cohen, W. W. (1995). Fast effective rule induction. In A. Prieditis, & S. Russell (Eds.), Proceedings of the 12th international conference on machine learning (ML-95) (pp. 115–123). Lake Tahoe: Morgan Kaufmann.
Dardzinska, A., & Ras, Z. W. (2006). Extracting rules from incomplete decision systems: System ERID. In T. Y. Lin, S. Ohsuga, C.-J. Liau, & X. Hu (Eds.), Foundations and novel approaches in data mining. Studies in computational intelligence (Vol. 6, pp. 143–153). Berlin: Springer.
Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30.
Fujikawa, Y., & Ho, T.-B. (2002). Proceedings of the 6th Pacific-Asia conference on advances in knowledge discovery and data mining (pakdd 2002). In M.-S. Cheng, P. S. Yu, & Bing Liu (Eds.), PAKDD. Lecture notes in computer science (Vol. 2336, pp. 549–554). Taipei: Springer.
Fürnkranz, J. (1999). Separate-and-conquer rule learning. Artificial Intelligence Review, 13(1), 3–54.
Gamberger, D., Lavrač, N., & Fürnkranz, J. (2008). Handling unknown and imprecise attribute values in propositional rule learning: A feature-based approach. In T.-B. Ho, & Z.-H. Zhou (Eds.), Proceedings of the 10th Pacific rim international conference on artificial intelligence (PRICAI-08) (pp. 636–645). Hanoi: Springer.
Ghahramani, Z., & Jordan, M. I. (1994). Advances in neural information processing systems 6 (nips-93). In J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), NIPS (pp. 120–127). Denver: Morgan Kaufmann.
Grzymala-Busse, J. W. (2005a). LERS—a data mining system. In O. Maimon, & L. Rokach (Eds.), The data mining and knowledge discovery handbook (pp. 1347–1351). Berlin: Springer.
Grzymala-Busse, J. W. (2005b). Characteristic relations for incomplete data: A generalization of the indiscernibility relation. In J. F. Peters, & A. Skowron (Eds.), Transactions on rough sets IV (pp. 58–68). Berlin: Springer.
Grzymala-Busse, J. W. (1991). On the unknown attribute values in learning from examples. In Z. W. Ras, & M. Zemankova (Eds.), Proceedings of the 6th international symposium on methodologies for intelligent systems (ISMIS-91) (pp. 368–377). Charlotte, N.C.
Grzymala-Busse, J. W., & Grzymala-Busse, W. J. (2005). Handling missing attribute values. In O. Maimon, & L. Rokach (Eds.), Data mining and knowledge discovery handbook (pp. 37–57). Berlin: Springer.
Grzymala-Busse, J. W., & Hu, M. (2000). A comparison of several approaches to missing attribute values in data mining. In Rough sets and current trends in computing (pp. 378–385).
Grzymala-Busse, J. W., & Wang, A. Y. (1997). Modified algorithms LEM1 and LEM2 for rule induction from data with missing attribute values. In Proceedings of the fifth international workshop on rough sets and soft computing (RSSC 1997) (pp. 69–72).
Grzymala-Busse, J. W., Grzymala-Busse, W. J., & Goodwin, L. K. (1999). A closest fit approach to missing attribute values in preterm birth data. In N. Zhong, A. Skowron, & S. Ohsuga (Eds.), Proceedings of the 7th international workshop on new directions in rough sets, data mining, and granular-soft computing. Lecture notes in computer science (Vol. 1711, pp. 405–413). Yamaguchi: Springer.
Hettich, S., Blake, C. L., & Merz, C. J. (1998). UCI repository of machine learning databases. Irvine: Department of Information and Computer Science, University of California at Irvine. http://www.ics.uci.edu/~mlearn/MLRepository.html.
Iman, R. L., & Davenport, J. M. (1980). Approximations in the critical region of the Friedman statistic. Communications in Statistics—Theory and Methods, 9(6), 571–595.
Janssen, F., & Fürnkranz, J. (2008). An empirical investigation of the trade-off between consistency and coverage in rule learning heuristics. In J.-F. Boulicaut, M. Berthold, & T. Horváth (Eds.), Proceedings of the 11th international conference on discovery science (DS-08) (pp. 40–51). Budapest: Springer.
Janssen, F., & Fürnkranz, J. (2010). On the quest for optimal rule learning heuristics. Machine Learning 78(3), 343–379.
Kryszkiewicz, M. (1999a). Association rules in incomplete databases. In N. Zhong, & L. Zhou (Eds.), Proceedings of the 3rd Pacific-Asia conference on methodologies for knowledge discovery and data mining (PAKDD-99) (pp. 84–93). Beijing, China.
Kryszkiewicz, M. (1999b). Rules in incomplete information systems. Information Sciences, 113(3–4), 271–292.
Lakshminarayan, K., Harp, S. A., & Samad, T. (1999). Imputation of missing data in industrial databases. Applied Intelligence, 11(3), 259–275.
Latkowski, R. (2003). On decomposition for incomplete data. Fundamenta Informaticae, 54(1), 1–16.
Latkowski, R., & Mikołajczyk, M. (2004). Data decomposition and decision rule joining for classification of data with missing values. In J. F. Peters, A. Skowron, D. Duboi, J. W. Grzymala-Busse, M. Inuiguchi, & L. Polkowski (Eds.), Transactions on rough sets II (pp. 299–320). Berlin: Springer.
Lavrač N., Fürnkranz,v, & Gamberger, D. (2010). Explicit feature construction and manipulation for covering rule learning algorithms. In J. Koronacki, Z. Ras, S. T. Wierzchon, & J. Kacprzyk (Eds.), Advances in machine learning II—Dedicated to the memory of Professor Ryszard S. Michalski (pp. 121–146). Berlin: Springer.
Li, D., Deogun, J. S., Spaulding, W., & Shuart, B. (2005). Dealing with missing data: Algorithms based on fuzzy set and rough set theories. In J. F. Peters, & A. Skowron (Eds.), Transactions on rough sets IV (pp. 37–57). Berlin: Springer.
Li, T., Ruan, D., & Song, J. (2007). Dynamic maintenance and decision rules with rough set under characteristic relation. In Proceedings of the international conference on wireless communications, networking and mobile computing (pp. 3713–3716).
Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: Wiley.
Nakata, M., & Sakai, H. (2005). Rough Sets Handling Missing Values Probabilistically Interpreted. In D. Slezak, G. Wang, M. S. Szczuka, I. Düntsch, & Y. Yao (Eds.), Proceedings of the 10th international conference on rough sets, fuzzy sets, data mining, and granular computing (RSFDGrC-05), part I (pp. 325–334).
Pawlak, Z. (1991). Rough sets: Theoretical aspects of reasoning about data. Dordrecht: Kluwer Academic (ISBN 0-7923-1472-7)
Ross Quinlan, J. (1989). Unknown attribute values in induction. In Proceedings of the 6th international workshop on machine learning (ML-89) (pp. 164–168).
Rubin, D. B. (1996). Multiple imputation after 18+ years. Journal of the American Statistical Association, 91, 473–489.
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.
Saar-Tsechansky, M., & Provost, F. (2007). Handling missing values when applying classification models. Journal of Machine Learning Research, 8, 1625–1657.
Schafer, J. L. (1997). Analysis of incomplete multivariate data. Boca Raton: Chapman & Hall/CRC.
Stefanowski, J., & Tsoukiàs, A. (2001). Incomplete information tables and rough classification. Computational Intelligence, 17(3), 545–566.
Twala, B., Cartwright,v, & Shepperd, M. J. (2005). Comparison of various methods for handling incomplete data in software engineering databases. In Proceedings of the international symposium on empirical software engineering (ISESE-05) (pp. 105–114).
Wang, G. (2002). Extension of rough set under incomplete information systems. In Proceedings of the IEEE international conference on fuzzy systems (FUZZ-IEEE-02) (pp. 1098–1103).
Witten, I. H., & Frank, E. (2005). Data mining—practical machine learning tools and techniques with Java implementations (2nd ed.). Lake Tahoe: Morgan Kaufmann.
Wohlrab, L. (2009). Comparison of different methods for handling missing attribute values in the SeCo rule learner. Independent Study Project, Knowledge Engineering Group, TU Darmstadt (in German).
Wong, A. K. C., & Chiu, D. K. Y. (1987). Synthesizing statistical knowledge from incomplete mixed-mode data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 9(6), 796–805.
Wu, X., & Barbará, D. (2002a). Learning missing values from summary constraints. SIGKDD Explorations, 4(1), 21–30.
Wu, X., & Barbará, D. (2002). Modeling and Imputation of Large Incomplete Multidimensional Datasets. In Proceedings of the 4th international conference on data warehousing and knowledge discovery (DaWaK-02) (pp. 286–295). Berlin: Springer.
Zou, Y., An, A., & Huang, X. (2005). Evaluation and automatic selection of methods for handling missing data. In X. Hu, Q. Liu, A. Skowron, T. Y. Lin, R. R. Yager, & B. Zhang (Eds.), Proceedings of the IEEE international conference on granular computing (pp. 728–733). Washington: IEEE.
Acknowledgements
We would like to thank Nada Lavrač and Dragan Gamberger for interesting discussions on their pessimistic value strategy. This research was supported by the German Science Foundation (DFG) under grant FU 580/2.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wohlrab, L., Fürnkranz, J. A review and comparison of strategies for handling missing values in separate-and-conquer rule learning. J Intell Inf Syst 36, 73–98 (2011). https://doi.org/10.1007/s10844-010-0121-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-010-0121-8