Abstract
Statistical matching aims at combining information available in distinct sample surveys referred to the same target population. The matching is usually based on a set of common variables shared by the available data sources. For matching purposes just a subset of all the common variables should be used, the so called matching variables. The paper presents a novel method for selecting the matching variables based on the analysis of the uncertainty characterizing the matching framework. The uncertainty is caused by unavailability of data for estimating parameters describing the association/correlation between variables not jointly observed in a single data source. The paper focuses on the case of categorical variables and presents a sequential procedure for identifying the most effective subset of common variables in reducing the overall uncertainty.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agresti A (2013) Categorical data analysis, 3rd edn. Wiley, New York
Agresti A, Yang MC (1987) An empirical investigation of some effects of sparseness in contingency tables. Comput Stat Data Anal 5:9–21
Bishop YM, Fienberg SE, Holland PW (1975) Discrete Multivariate Analysis: Theory and Practice. MIT. Press, Cambridge, MA. Paperback edition
Cohen ML (1991) Statistical matching and microsimulation models. In: Citro, H (ed) Improving information for social policy decisions: The uses of microsimulation modeling, vol II Technical papers, Washington D.C
Conti PL, Marella D, Scanu M (2012) Uncertainty analysis in statistical matching. J Official Stat 28:69–88
D’Orazio M, Di Zio M, Scanu M (2006) Statistical matching: theory and practice. Wiley, Chichester
D’Orazio M (2016) StatMatch: statistical matching (aka data fusion). R package version 1.2.4 http://CRAN.R-project.org/package=StatMatch
Särndal CE, Swensson B, Wretman J (1992) Model assisted survey sampling. Springer, New York
Vantaggi B (2008) Statistical matching of multiple sources: a look through coherence. In J Approximate Reasoning 49:701–711
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing Switzerland
About this paper
Cite this paper
D’Orazio, M., Di Zio, M., Scanu, M. (2017). The Use of Uncertainty to Choose Matching Variables in Statistical Matching. In: Ferraro, M., et al. Soft Methods for Data Science. SMPS 2016. Advances in Intelligent Systems and Computing, vol 456. Springer, Cham. https://doi.org/10.1007/978-3-319-42972-4_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-42972-4_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42971-7
Online ISBN: 978-3-319-42972-4
eBook Packages: EngineeringEngineering (R0)