Feature Based Multivariate Data Imputation

Petrozziello, Alessio; Jordanov, Ivan

doi:10.1007/978-3-030-13709-0_3

Alessio Petrozziello¹⁷ &
Ivan Jordanov¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11331))

Included in the following conference series:

International Conference on Machine Learning, Optimization, and Data Science

2250 Accesses

Abstract

We investigate a new multivariate data imputation approach for dealing with variety of types of missingness. The proposed approach relies on the aggregation of the most suitable methods from a multitude of imputation techniques, adjusted to each feature of the dataset. We report results from comparison with two single imputation techniques (Random Guessing and Median Imputation) and four state-of-the-art multivariate methods (K-Nearest Neighbour Imputation, Bagged Tree Imputation, Missing Imputation Chained Equations, and Bayesian Principal Component Analysis Imputation) on several datasets from the public domain, demonstrating favorable performance for our model. The proposed method, namely Feature Guided Data Imputation is compared with the other tested methods in three different experimental settings: Missing Completely at Random, Missing at Random and Missing Not at Random with 25% missing data in the test set over five-fold cross validation. Furthermore, the proposed model has straightforward implementation and can easily incorporate other imputation techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

On Combining Imputation Methods for Handling Missing Data

Scalable Model-Based Cascaded Imputation of Missing Data

Missing Data Imputation and Its Effect on the Accuracy of Classification

References

Enders, C.K.: Applied Missing Data Analysis. Guildford Press, Guidford (2010)
Google Scholar
Schmitt, P., Mandel, J., Guedj, M.: A comparison of six methods for missing data imputation. J. of Biometrics Biostat. 6(1), 1–6 (2015)
Google Scholar
Jordanov, I., Petrov, N., Petrozziello, A.: Classifiers accuracy improvement based on missing data imputation. J. Artif. Intell. Soft Comput. Res. 8(1), 33–48 (2018)
Article Google Scholar
Cohen, J., Cohen, P., West, S.G., Aiken, L.S.: Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. Routledge, Abingdon (2013)
Book Google Scholar
Sarro, F., Petrozziello, A., Harman, M.: Multi-objective software effort estimation. In: 2016 IEEE/ACM 38th IEEE International Conference on Software Engineering (ICSE), Austin (2016)
Google Scholar
Osborne, J., Overbay, A.: Best practices in data cleaning. Best Pract. Quant. Methods 1(1), 205–213 (2008)
Article Google Scholar
Rahman, G., Islam, Z.: A decision tree-based missing value imputation technique for data pre-processing. In: Proceedings of the 9th Australasian Data Mining Conference (2011)
Google Scholar
Frènay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 5(5), 845–869 (2014)
Article MATH Google Scholar
Valdiviezo, C., Van Aelst, S.: Tree-based prediction on incomplete data using imputation or surrogate decisions. Inf. Sci. 311, 163–181 (2015)
Article Google Scholar
Troyanskaya, O., et al.: Missing value estimation methods for DNA microarrays. Bioinformatics 17(6), 520–525 (2001)
Article Google Scholar
Cartwright, M., Shepperd, M.J., Song, Q.: Dealing with missing software project data. In: Proceedings of the 9th International Software Metrics Symposium (2003)
Google Scholar
Batista, G., Monard, M.: A study of K-nearest neighbour as a model-based method to treat missing data. In: Argentine Symposium on Artificial Intelligence (2001)
Google Scholar
Lee, M.C., Mitra, R.: Multiply imputing missing values in data sets with mixed measurement scales using a sequence of generalised linear models. Comput. Stat. Data Anal. 95(1), 24–38 (2016)
Article MathSciNet MATH Google Scholar
Graham, J.W.: Missing data analysis: making it work in the real world. Annu. Rev. Psychol. 60, 549–576 (2009)
Article Google Scholar
Bartlett, J., Seaman, S., White, I., Carpenter, J.: Multiple imputation of covariates by fully conditional specification: accommodating the substantive model. Stat. Methods Med. Res. 24(4), 462–487 (2015)
Article MathSciNet Google Scholar
Oba, S., Sato, M.-A., Takemasa, I., Monden, M., Matsubara, K.-I., Ishii, S.: A Bayesian missing value estimation method for gene expression profile data. Bioinformatics 19(16), 2088–2096 (2003)
Article Google Scholar
Petrozziello, A., Jordanov, I.: Column-wise guided data imputation. Proc. Comput. Sci. 108(1), 2282–2286 (2017)
Article Google Scholar
Alcalá-Fdez, J., et al.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Log. Soft Comput. 17(2–3), 255–287 (2011)
Google Scholar
Pan, X.-Y., Tian, Y., Huang, Y., Shen, H.-B.: Towards better accuracy for missing value estimation of epistatic miniarray profiling data by a novel ensemble approach. Genomics 97(5), 257–264 (2011)
Article Google Scholar
Willmott, C.J., Matsuura, K.: Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim. Res. 30(1), 79–82 (2005)
Article Google Scholar
Chai, T., Draxler, R.: Root mean square error (RMSE) or mean absolute error (MAE)?–Arguments against avoiding RMSE in the literature. Geosci. Model Dev. 7(3), 1247–1250 (2014)
Article Google Scholar
Whigham, P.A., Owen, C.A., Macdonell, S.G.: A baseline model for software effort estimation. ACM Trans. Softw. Eng. Methodol. (TOSEM) 24(3), 20 (2015)
Article Google Scholar
Gòmez-Carracedo, M., Andrade, J., Lòpez-Mahìa, P., Muniategui, S., Prada, D.: A practical comparison of single and multiple imputation methods to handle complex missing data in air quality datasets. Chemometr. Intell. Lab. Syst. 134(1), 23–33 (2014)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing, University of Portsmouth, Portsmouth, UK
Alessio Petrozziello & Ivan Jordanov

Authors

Alessio Petrozziello
View author publications
You can also search for this author in PubMed Google Scholar
Ivan Jordanov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alessio Petrozziello .

Editor information

Editors and Affiliations

University of Catania, Catania, Italy and University of Reading, Reading, UK
Giuseppe Nicosia
University of Florida, Gainesville, FL, USA
Panos Pardalos
University of Catania, Catania, Italy
Giovanni Giuffrida
Harvard University, Cambridge, MA, USA
Renato Umeton
IBM, Tivoli Research Lab, Rome, Italy
Vincenzo Sciacca

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Petrozziello, A., Jordanov, I. (2019). Feature Based Multivariate Data Imputation. In: Nicosia, G., Pardalos, P., Giuffrida, G., Umeton, R., Sciacca, V. (eds) Machine Learning, Optimization, and Data Science. LOD 2018. Lecture Notes in Computer Science(), vol 11331. Springer, Cham. https://doi.org/10.1007/978-3-030-13709-0_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-13709-0_3
Published: 14 February 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-13708-3
Online ISBN: 978-3-030-13709-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Feature Based Multivariate Data Imputation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

On Combining Imputation Methods for Handling Missing Data

Scalable Model-Based Cascaded Imputation of Missing Data

Missing Data Imputation and Its Effect on the Accuracy of Classification

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Feature Based Multivariate Data Imputation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

On Combining Imputation Methods for Handling Missing Data

Scalable Model-Based Cascaded Imputation of Missing Data

Missing Data Imputation and Its Effect on the Accuracy of Classification

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation