Skip to main content

Analysis and Visualization of Missing Value Patterns

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 611))

Abstract

Missing values in datasets form a very relevant and often overlooked problem in many fields. Most algorithms are not able to handle missing values for training a predictive model or analyzing a dataset. For this reason, records with missing values are either rejected or repaired. However, both repairing and rejecting affects the dataset and the final results, creating bias and uncertainty. Therefore, knowledge about the nature of missing values and the underlying mechanisms behind them are of vital importance. To gain more in-depth insight into the underlying structures and patterns of missing values, the concept of Monotone Mixture Patterns is introduced and used to analyze the patterns of missing values in datasets. Several visualization methods are proposed to present the “patterns of missingness” in an informative way. Finally, an algorithm to generate missing values in datasets is provided to form the basis of a benchmarking tool. This algorithm can generate a large variety of missing value patterns for testing and comparing different algorithms that handle missing values.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Bache, K., Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml

  2. Carpenter, J.R., Kenward, M.G.: Multiple Imputation and its Application, 1st edn. Wiley, New York (2013)

    Book  MATH  Google Scholar 

  3. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference and Prediction, 2nd edn. Springer, New York (2009)

    Book  MATH  Google Scholar 

  4. Howell, D.C.: The Analysis of Missing Data. Sage, London (2007)

    Google Scholar 

  5. Lakshminarayan, K., Harp, S.A., Samad, T.: Imputation of missing data in industrial databases. Appl. Intell. 11, 259–275 (1999)

    Article  Google Scholar 

  6. Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data, 2nd edn. Wiley, New York (2002)

    MATH  Google Scholar 

  7. Rajaraman, A., Ullman, J.D., Ullman, J.D., Ullman, J.D.: Mining of Massive Datasets, vol. 77. University Press Cambridge, Cambridge (2012)

    MATH  Google Scholar 

  8. Rubin, D.B.: Inference and missing data. Biometrika 63(3), 581 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  9. Saar-tsechansky, M., Provost, F.: Handling missing values when applying classification models. J. Mach. Learn. Res. 8, 1625–1657 (2007)

    MATH  Google Scholar 

  10. Seaman, S.R., White, I.R.: Review of inverse probability weighting for dealing with missing data. Stat. Methods Med. Res. 22(3), 278–295 (2013)

    Article  MathSciNet  Google Scholar 

  11. van Stein, B.: Missing data visualisation (2015). https://github.com/Basvanstein/MissingDataVis

  12. Williams, D., Carin, L.: Incomplete-data classification using logistic regression. In: Proceedings of the 22nd International Conference on Machine learning, pp. 972–979. ACM (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bas van Stein .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

van Stein, B., Kowalczyk, W., Bäck, T. (2016). Analysis and Visualization of Missing Value Patterns. In: Carvalho, J., Lesot, MJ., Kaymak, U., Vieira, S., Bouchon-Meunier, B., Yager, R. (eds) Information Processing and Management of Uncertainty in Knowledge-Based Systems. IPMU 2016. Communications in Computer and Information Science, vol 611. Springer, Cham. https://doi.org/10.1007/978-3-319-40581-0_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-40581-0_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-40580-3

  • Online ISBN: 978-3-319-40581-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics