Abstract
When tackling real-life datasets, it is common to face the existence of scrambled missing values within data. Considered as ”dirty data”, it is usually removed during the pre-processing step of the KDD process. Starting from the fact that ”making up this missing data is better than throwing it away”, we present a new approach trying to complete the missing data. The main singularity of the introduced approach is that it sheds light on a fruitful synergy between generic basis of association rules and the topic of missing values handling. In fact, beyond interesting compactness rate, such generic association rules make it possible to get a considerable reduction of conflicts during the completion step. A new metric called ”Robustness” is also introduced, and aims to select the robust association rule for the completion of a missing value whenever a conflict appears. Carried out experiments on benchmark datasets confirm the soundness of our approach. Thus, it reduces conflict during the completion step while offering a high percentage of correct completion accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM-SIGMOD International Conference on Management of Data, Washington D. C., USA, pp. 207–216 (May 1993)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) Proceedings of the 20th International Conference on Very Large Databases, Santiago, Chile, pp. 478–499 (1994)
Bastide, Y., Pasquier, N., Taouil, R., Lakhal, L., Stumme, G.: Mining Minimal Non-redundant Association Rules Using Frequent Closed Itemsets. In: Palamidessi, C., Moniz Pereira, L., Lloyd, J.W., Dahl, V., Furbach, U., Kerber, M., Lau, K.-K., Sagiv, Y., Stuckey, P.J. (eds.) CL 2000. LNCS (LNAI), vol. 1861, Springer, Heidelberg (2000)
Ben Yahia, S., Arour, K., Jaoua, A.: Completing missing values in databases using discovered association rules. In: Proceedings of the International Conference on Artificial and Computational Intelligence for Decision, Control and Automation in Engineering and Industrial Applications (ACIDCA 2000), Monastir, Tunisia, March 22-24, pp. 138–143 (2000)
Boulicaut, J.-F., Bykowski, A., Rigotti, C.: Approximation of frequency queries by means of free-sets. In: Zighed, A.D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 75–85. Springer, Heidelberg (2000)
Brin, S., Motwani, R., Silverstein, C.: Beyond market baskets: Generalizing association rules to correlations. In: Proceedings of the International Conference on Management of Data (ACM SIGMOD), Tucson, Arizona, USA, May 13–15, pp. 265–276. ACM Press, New York (1997)
Buntine, W.L.: Operations for learning with graphical models. Journal of Artificial Intelligence 2, 159–225 (1994)
Bykowski, A., Rigotti, C.: A condensed representation to find frequent patterns. In: Proceedings of the ACM SIGMOD-SIGACT-SIGART symposium of Principles of Database Systems, Santa Barbara, USA, pp. 267–273.
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 39(1), 1–38 (1977)
Jami, S., Jen, T., Laurent, D., Loizou, G., Sy, O.: Extraction de règles d’association pour la prédiction de valeurs manquantes. ARIMA journal (3), 103–124 (2005)
Kryszkiewicz, M.: Probabilistic approach to association rules in incomplete databases. In: Lu, H., Zhou, A. (eds.) WAIM 2000. LNCS, vol. 1846, pp. 133–138. Springer, Heidelberg (2000)
Little, R.J.A., Rubin, D.B.: Statistical analysis with missing data. Wiley, New York (2002)
Pasquier, N., Bastide, Y., Touil, R., Lakhal, L.: Discovering frequent closed itemsets. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1998)
Giudici, P., Castelo, R.: Improving Markov Chain Monte Carlo model search for data mining. Machine Learning 50(1–2), 127–158 (2003)
Ragel, A.: Exploration des bases incomplètes: Application à l’aide au pré-traitement des valeurs manquantes. PhD Thesis, Université de Caen, Basse Normandie (December 1999)
Ragel, A., Crémilleux, B.: Treatment of missing values for association rules. In: Wu, X., Kotagiri, R., Korb, K.B. (eds.) PAKDD 1998. LNCS, vol. 1394, pp. 258–270. Springer, Heidelberg (1998)
Ragel, A., Crémilleux, B.: MVC - a preprocessing method to deal with missing values. Knowledge-Based System Journal 12(5–6), 285–291 (1999)
Ramoni, M., Sebastiani, P.: Bayesian inference with missing data using bound and collapse. Journal of Computational and Graphical Statistics 9(4), 779–800 (2000)
Rioult, F.: Knowledge discovery in databases containing missing values or a very large number of attributes. PhD Thesis, Université de Caen, Basse Normandie (November 2005)
Rioult, F., Crémilleux, B.: Condensed representations in presence of missing values. In: Proceedings of the International symposium on Intelligent Data Analysis, Berlin, Germany, pp. 578–588 (2003)
Smyth, P., Goodman, R.M.: An information theoretic approach to rule induction from databases. IEEE Trans. On Knowledge And Data Engineering 4, 301–316 (1992)
Wu, C., Wun, C., Chou, H.: Using association rules for completing missing data. In: Proceedings of 4th International Conference on Hybrid Intelligent Systems (HIS 2004), Kitakyushu, Japan, 5-8 December 2004, pp. 236–241. IEEE Computer Society Press, Los Alamitos (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ben Othman, L., Ben Yahia, S. (2008). Yet Another Approach for Completing Missing Values. In: Yahia, S.B., Nguifo, E.M., Belohlavek, R. (eds) Concept Lattices and Their Applications. CLA 2006. Lecture Notes in Computer Science(), vol 4923. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78921-5_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-78921-5_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78920-8
Online ISBN: 978-3-540-78921-5
eBook Packages: Computer ScienceComputer Science (R0)