Skip to main content

Yet Another Approach for Completing Missing Values

  • Conference paper
Concept Lattices and Their Applications (CLA 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4923))

Included in the following conference series:

Abstract

When tackling real-life datasets, it is common to face the existence of scrambled missing values within data. Considered as ”dirty data”, it is usually removed during the pre-processing step of the KDD process. Starting from the fact that ”making up this missing data is better than throwing it away”, we present a new approach trying to complete the missing data. The main singularity of the introduced approach is that it sheds light on a fruitful synergy between generic basis of association rules and the topic of missing values handling. In fact, beyond interesting compactness rate, such generic association rules make it possible to get a considerable reduction of conflicts during the completion step. A new metric called ”Robustness” is also introduced, and aims to select the robust association rule for the completion of a missing value whenever a conflict appears. Carried out experiments on benchmark datasets confirm the soundness of our approach. Thus, it reduces conflict during the completion step while offering a high percentage of correct completion accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM-SIGMOD International Conference on Management of Data, Washington D. C., USA, pp. 207–216 (May 1993)

    Google Scholar 

  2. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) Proceedings of the 20th International Conference on Very Large Databases, Santiago, Chile, pp. 478–499 (1994)

    Google Scholar 

  3. Bastide, Y., Pasquier, N., Taouil, R., Lakhal, L., Stumme, G.: Mining Minimal Non-redundant Association Rules Using Frequent Closed Itemsets. In: Palamidessi, C., Moniz Pereira, L., Lloyd, J.W., Dahl, V., Furbach, U., Kerber, M., Lau, K.-K., Sagiv, Y., Stuckey, P.J. (eds.) CL 2000. LNCS (LNAI), vol. 1861, Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  4. Ben Yahia, S., Arour, K., Jaoua, A.: Completing missing values in databases using discovered association rules. In: Proceedings of the International Conference on Artificial and Computational Intelligence for Decision, Control and Automation in Engineering and Industrial Applications (ACIDCA 2000), Monastir, Tunisia, March 22-24, pp. 138–143 (2000)

    Google Scholar 

  5. Boulicaut, J.-F., Bykowski, A., Rigotti, C.: Approximation of frequency queries by means of free-sets. In: Zighed, A.D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 75–85. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  6. Brin, S., Motwani, R., Silverstein, C.: Beyond market baskets: Generalizing association rules to correlations. In: Proceedings of the International Conference on Management of Data (ACM SIGMOD), Tucson, Arizona, USA, May 13–15, pp. 265–276. ACM Press, New York (1997)

    Google Scholar 

  7. Buntine, W.L.: Operations for learning with graphical models. Journal of Artificial Intelligence 2, 159–225 (1994)

    Google Scholar 

  8. Bykowski, A., Rigotti, C.: A condensed representation to find frequent patterns. In: Proceedings of the ACM SIGMOD-SIGACT-SIGART symposium of Principles of Database Systems, Santa Barbara, USA, pp. 267–273.

    Google Scholar 

  9. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 39(1), 1–38 (1977)

    MATH  MathSciNet  Google Scholar 

  10. Jami, S., Jen, T., Laurent, D., Loizou, G., Sy, O.: Extraction de règles d’association pour la prédiction de valeurs manquantes. ARIMA journal (3), 103–124 (2005)

    Google Scholar 

  11. Kryszkiewicz, M.: Probabilistic approach to association rules in incomplete databases. In: Lu, H., Zhou, A. (eds.) WAIM 2000. LNCS, vol. 1846, pp. 133–138. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  12. Little, R.J.A., Rubin, D.B.: Statistical analysis with missing data. Wiley, New York (2002)

    MATH  Google Scholar 

  13. Pasquier, N., Bastide, Y., Touil, R., Lakhal, L.: Discovering frequent closed itemsets. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  14. Giudici, P., Castelo, R.: Improving Markov Chain Monte Carlo model search for data mining. Machine Learning 50(1–2), 127–158 (2003)

    Article  MATH  Google Scholar 

  15. Ragel, A.: Exploration des bases incomplètes: Application à l’aide au pré-traitement des valeurs manquantes. PhD Thesis, Université de Caen, Basse Normandie (December 1999)

    Google Scholar 

  16. Ragel, A., Crémilleux, B.: Treatment of missing values for association rules. In: Wu, X., Kotagiri, R., Korb, K.B. (eds.) PAKDD 1998. LNCS, vol. 1394, pp. 258–270. Springer, Heidelberg (1998)

    Google Scholar 

  17. Ragel, A., Crémilleux, B.: MVC - a preprocessing method to deal with missing values. Knowledge-Based System Journal 12(5–6), 285–291 (1999)

    Article  Google Scholar 

  18. Ramoni, M., Sebastiani, P.: Bayesian inference with missing data using bound and collapse. Journal of Computational and Graphical Statistics 9(4), 779–800 (2000)

    Article  MathSciNet  Google Scholar 

  19. Rioult, F.: Knowledge discovery in databases containing missing values or a very large number of attributes. PhD Thesis, Université de Caen, Basse Normandie (November 2005)

    Google Scholar 

  20. Rioult, F., Crémilleux, B.: Condensed representations in presence of missing values. In: Proceedings of the International symposium on Intelligent Data Analysis, Berlin, Germany, pp. 578–588 (2003)

    Google Scholar 

  21. Smyth, P., Goodman, R.M.: An information theoretic approach to rule induction from databases. IEEE Trans. On Knowledge And Data Engineering 4, 301–316 (1992)

    Article  Google Scholar 

  22. Wu, C., Wun, C., Chou, H.: Using association rules for completing missing data. In: Proceedings of 4th International Conference on Hybrid Intelligent Systems (HIS 2004), Kitakyushu, Japan, 5-8 December 2004, pp. 236–241. IEEE Computer Society Press, Los Alamitos (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Sadok Ben Yahia Engelbert Mephu Nguifo Radim Belohlavek

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ben Othman, L., Ben Yahia, S. (2008). Yet Another Approach for Completing Missing Values. In: Yahia, S.B., Nguifo, E.M., Belohlavek, R. (eds) Concept Lattices and Their Applications. CLA 2006. Lecture Notes in Computer Science(), vol 4923. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78921-5_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78921-5_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78920-8

  • Online ISBN: 978-3-540-78921-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics