Skip to main content
Log in

Simulation of close-to-reality population data for household surveys with application to EU-SILC

  • Published:
Statistical Methods & Applications Aims and scope Submit manuscript

Abstract

Statistical simulation in survey statistics is usually based on repeatedly drawing samples from population data. Furthermore, population data may be used in courses on survey statistics to explain issues regarding, e.g., sampling designs. Since the availability of real population data is in general very limited, it is necessary to generate synthetic data for such applications. The simulated data need to be as realistic as possible, while at the same time ensuring data confidentiality. This paper proposes a method for generating close-to-reality population data for complex household surveys. The procedure consists of four steps for setting up the household structure, simulating categorical variables, simulating continuous variables and splitting continuous variables into different components. It is not required to perform all four steps so that the framework is applicable to a broad class of surveys. In addition, the proposed method is evaluated in an application to the European Union Statistics on Income and Living Conditions (EU-SILC).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Alfons A (2010) \({\tt{simFrame}}\): simulation framework. R package version 0.3.7

  • Alfons A, Kraft S (2010) \({\tt{simPopulation}}\): simulation of synthetic populations for surveys based on sample data. R package version 0.2.1

  • Alfons A, Templ M, Filzmoser P (2010a) An object-oriented framework for statistical simulation: the R package \({\tt{simFrame}}\). J Stat Softw 37(3): 1–36

    Google Scholar 

  • Alfons A, Templ M, Filzmoser P (2010b) Simulation of EU-SILC population data: using the R package \({\tt{simPopulation}}\). Research Report CS-2010-5, Department of Statistics and Probability Theory, Vienna University of Technology

  • Atkinson T, Cantillon B, Marlier E, Nolan B (2002) Social indicators: the EU and social inclusion. Oxford University Press, New York ISBN 0-19-925349-8

    Google Scholar 

  • Clarke G (1996) Microsimulation: an introduction. In: Clarke G (ed) Microsimulation for urban and regional policy analysis. Pion, London

    Google Scholar 

  • Drechsler J, Bender S, Rässler S (2008) Comparing fully and partially synthetic datasets for statistical disclosure control in the German IAB Establishment Panel. Trans Data Priv 1(3): 105–130

    MathSciNet  Google Scholar 

  • Embrechts P, Klüppelberg G, Mikosch T (1997) Modelling extremal events for insurance and finance. Springer, New York ISBN 3-540-60931-8

    MATH  Google Scholar 

  • Eurostat (2004) Description of target variables: cross-sectional and longitudinal. EU-SILC 065/04, Eurostat, Luxembourg

  • Horvitz D, Thompson D (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47(260): 663–685

    Article  MathSciNet  MATH  Google Scholar 

  • Kendall M, Stuart A (1967) The advanced theory of statistics, vol 2, 2nd edn. Charles Griffin & Co. Ltd, London

    Google Scholar 

  • Kleiber C, Kotz S (2003) Statistical size distributions in economics and actuarial sciences. Wiley, Hoboken ISBN 0-471-15064-9

    Book  MATH  Google Scholar 

  • Kraft S (2009) Simulation of a population for the European living and income conditions survey. Master’s thesis, Vienna University of Technology

  • Meyer D, Zeileis A, Hornik K (2006) The \({\tt{strucplot}}\) framework: visualizing multi-way contingency tables with \({\tt{vcd}}\). J Stat Softw 17(3): 1–48

    Google Scholar 

  • Meyer D, Zeileis A, Hornik K (2010) \({\tt{vcd}}\): visualizing categorical data. R package version 1.2–9

  • Münnich R, Schürle J (2003) On the simulation of complex universes in the case of applying the German Microcensus. DACSEIS research paper series No. 4, University of Tübingen

  • Münnich R, Schürle J, Bihler W, Boonstra HJ, Knotterus P, Nieuwenbroek N, Haslinger A, Laaksonen S, Eckmair D, Quatember A, Wagner H, Renfer JP, Oetliker U, Wiegert R (2003) Monte Carlo simulation study of European surveys. DACSEIS Deliverables D3.1 and D3.2, University of Tübingen

  • Raghunathan T, Reiter J, Rubin D (2003) Multiple imputation for statistical disclosure limitation. J Off Stat 19(1): 1–16

    Google Scholar 

  • R Development Core Team (2010) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, ISBN 3-900051-07-0

  • Reiter J (2009) Using multiple imputation to integrate and disseminate confidential microdata. Int Stat Rev 77(2): 179–195

    Article  Google Scholar 

  • Rubin D (1993) Discussion: statistical disclosure limitation. J Off Stat 9(2): 461–468

    Google Scholar 

  • Sarkar D (2008) Lattice: multivariate data visualization with R. Springer, New York ISBN 978-0-387-75968-5

    MATH  Google Scholar 

  • Sarkar D (2011) \({\tt{lattice}}\): lattice graphics. R package version 0.19-17

  • Simonoff J (2003) Analyzing categorical data. Springer, New York ISBN 0-387-00749-0

    MATH  Google Scholar 

  • Templ M, Alfons A (2010) Disclosure risk of synthetic population data with application in the case of EU-SILC. In: Domingo-Ferrer J, Magkos E (eds) Privacy in statistical databases. Lecture notes in computer science, vol 6344. Springer, Heidelberg, pp 174–186

    Google Scholar 

  • Walker A (1977) An efficient method for generating discrete random variables with general distributions. ACM Trans Math Softw 3(3): 253–256

    Article  MATH  Google Scholar 

  • Weisberg S (2005) Applied linear regression, 3rd edn. Wiley, Hoboken ISBN 0-471-66379-4

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andreas Alfons.

Additional information

This work was partly funded by the European Union (represented by the European Commission) within the 7th framework programme for research (Theme 8, Socio-Economic Sciences and Humanities, Project AMELI (Advanced Methodology for European Laeken Indicators), Grant Agreement No. 217322). Visit http://ameli.surveystatistics.net for more information on the project.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alfons, A., Kraft, S., Templ, M. et al. Simulation of close-to-reality population data for household surveys with application to EU-SILC. Stat Methods Appl 20, 383–407 (2011). https://doi.org/10.1007/s10260-011-0163-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10260-011-0163-2

Keywords

Navigation