Yet Another Approach for Completing Missing Values

Ben Othman, Leila; Ben Yahia, Sadok

doi:10.1007/978-3-540-78921-5_10

Leila Ben Othman¹ &
Sadok Ben Yahia¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4923))

Included in the following conference series:

International Conference on Concept Lattices and Their Applications

608 Accesses
4 Citations

Abstract

When tackling real-life datasets, it is common to face the existence of scrambled missing values within data. Considered as ”dirty data”, it is usually removed during the pre-processing step of the KDD process. Starting from the fact that ”making up this missing data is better than throwing it away”, we present a new approach trying to complete the missing data. The main singularity of the introduced approach is that it sheds light on a fruitful synergy between generic basis of association rules and the topic of missing values handling. In fact, beyond interesting compactness rate, such generic association rules make it possible to get a considerable reduction of conflicts during the completion step. A new metric called ”Robustness” is also introduced, and aims to select the robust association rule for the completion of a missing value whenever a conflict appears. Carried out experiments on benchmark datasets confirm the soundness of our approach. Thus, it reduces conflict during the completion step while offering a high percentage of correct completion accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM-SIGMOD International Conference on Management of Data, Washington D. C., USA, pp. 207–216 (May 1993)
Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) Proceedings of the 20th International Conference on Very Large Databases, Santiago, Chile, pp. 478–499 (1994)
Google Scholar
Bastide, Y., Pasquier, N., Taouil, R., Lakhal, L., Stumme, G.: Mining Minimal Non-redundant Association Rules Using Frequent Closed Itemsets. In: Palamidessi, C., Moniz Pereira, L., Lloyd, J.W., Dahl, V., Furbach, U., Kerber, M., Lau, K.-K., Sagiv, Y., Stuckey, P.J. (eds.) CL 2000. LNCS (LNAI), vol. 1861, Springer, Heidelberg (2000)
Chapter Google Scholar
Ben Yahia, S., Arour, K., Jaoua, A.: Completing missing values in databases using discovered association rules. In: Proceedings of the International Conference on Artificial and Computational Intelligence for Decision, Control and Automation in Engineering and Industrial Applications (ACIDCA 2000), Monastir, Tunisia, March 22-24, pp. 138–143 (2000)
Google Scholar
Boulicaut, J.-F., Bykowski, A., Rigotti, C.: Approximation of frequency queries by means of free-sets. In: Zighed, A.D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 75–85. Springer, Heidelberg (2000)
Chapter Google Scholar
Brin, S., Motwani, R., Silverstein, C.: Beyond market baskets: Generalizing association rules to correlations. In: Proceedings of the International Conference on Management of Data (ACM SIGMOD), Tucson, Arizona, USA, May 13–15, pp. 265–276. ACM Press, New York (1997)
Google Scholar
Buntine, W.L.: Operations for learning with graphical models. Journal of Artificial Intelligence 2, 159–225 (1994)
Google Scholar
Bykowski, A., Rigotti, C.: A condensed representation to find frequent patterns. In: Proceedings of the ACM SIGMOD-SIGACT-SIGART symposium of Principles of Database Systems, Santa Barbara, USA, pp. 267–273.
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 39(1), 1–38 (1977)
MATH MathSciNet Google Scholar
Jami, S., Jen, T., Laurent, D., Loizou, G., Sy, O.: Extraction de règles d’association pour la prédiction de valeurs manquantes. ARIMA journal (3), 103–124 (2005)
Google Scholar
Kryszkiewicz, M.: Probabilistic approach to association rules in incomplete databases. In: Lu, H., Zhou, A. (eds.) WAIM 2000. LNCS, vol. 1846, pp. 133–138. Springer, Heidelberg (2000)
Chapter Google Scholar
Little, R.J.A., Rubin, D.B.: Statistical analysis with missing data. Wiley, New York (2002)
MATH Google Scholar
Pasquier, N., Bastide, Y., Touil, R., Lakhal, L.: Discovering frequent closed itemsets. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1998)
Chapter Google Scholar
Giudici, P., Castelo, R.: Improving Markov Chain Monte Carlo model search for data mining. Machine Learning 50(1–2), 127–158 (2003)
Article MATH Google Scholar
Ragel, A.: Exploration des bases incomplètes: Application à l’aide au pré-traitement des valeurs manquantes. PhD Thesis, Université de Caen, Basse Normandie (December 1999)
Google Scholar
Ragel, A., Crémilleux, B.: Treatment of missing values for association rules. In: Wu, X., Kotagiri, R., Korb, K.B. (eds.) PAKDD 1998. LNCS, vol. 1394, pp. 258–270. Springer, Heidelberg (1998)
Google Scholar
Ragel, A., Crémilleux, B.: MVC - a preprocessing method to deal with missing values. Knowledge-Based System Journal 12(5–6), 285–291 (1999)
Article Google Scholar
Ramoni, M., Sebastiani, P.: Bayesian inference with missing data using bound and collapse. Journal of Computational and Graphical Statistics 9(4), 779–800 (2000)
Article MathSciNet Google Scholar
Rioult, F.: Knowledge discovery in databases containing missing values or a very large number of attributes. PhD Thesis, Université de Caen, Basse Normandie (November 2005)
Google Scholar
Rioult, F., Crémilleux, B.: Condensed representations in presence of missing values. In: Proceedings of the International symposium on Intelligent Data Analysis, Berlin, Germany, pp. 578–588 (2003)
Google Scholar
Smyth, P., Goodman, R.M.: An information theoretic approach to rule induction from databases. IEEE Trans. On Knowledge And Data Engineering 4, 301–316 (1992)
Article Google Scholar
Wu, C., Wun, C., Chou, H.: Using association rules for completing missing data. In: Proceedings of 4th International Conference on Hybrid Intelligent Systems (HIS 2004), Kitakyushu, Japan, 5-8 December 2004, pp. 236–241. IEEE Computer Society Press, Los Alamitos (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Faculty of Sciences of Tunis, Campus University, 1060, Tunis, Tunisia
Leila Ben Othman & Sadok Ben Yahia

Authors

Leila Ben Othman
View author publications
You can also search for this author in PubMed Google Scholar
Sadok Ben Yahia
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Sadok Ben Yahia Engelbert Mephu Nguifo Radim Belohlavek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ben Othman, L., Ben Yahia, S. (2008). Yet Another Approach for Completing Missing Values. In: Yahia, S.B., Nguifo, E.M., Belohlavek, R. (eds) Concept Lattices and Their Applications. CLA 2006. Lecture Notes in Computer Science(), vol 4923. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78921-5_10

Download citation

DOI: https://doi.org/10.1007/978-3-540-78921-5_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78920-8
Online ISBN: 978-3-540-78921-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics