Skip to main content

Data Augmentation

  • Reference work entry
  • First Online:
Computer Vision
  • 647 Accesses

Definition

Data augmentation is a Markov chain Monte Carlo algorithm for sampling from a Bayesian posterior distribution

Background

Data augmentation was originally developed by Tanner and Wong [10] as a stochastic counterpart of the EM algorithm [1], and it is closely related to the Gibbs sampler [2]. Thus, the basic setup of data augmentation is similar to the EM algorithm.

Theory

Let y be the observed data and z be the missing data or latent variable. Let p(y, z | θ) be the probability distribution of the complete data (y, z), with θ being the unknown parameter. The marginal distribution of the observed data y is \(p(y\vert \theta ) = \int \nolimits \nolimits p(y,z\vert \theta )dz\). Let p(θ) be the prior distribution of θ. The goal is to draw Monte Carlo samples from the posterior distribution \(p(\theta \vert y) \propto p(\theta )p(y\vert \theta )\).

The data augmentation algorithm is an iterative algorithm. It starts from an initial value θ0. Let (θ t , z t ) be the values of θ...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39:1–38

    MathSciNet  MATH  Google Scholar 

  2. Geman S, Geman D (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6:721–741

    Article  MATH  Google Scholar 

  3. Higdon DM (1998) Auxiliary variable methods for Markov chain Monte Carlo with applications. J Am Stat Assoc 93:585–595

    Article  MATH  Google Scholar 

  4. Liu JS, Wu YN (1999) Parameter expansion for data augmentation. J Am Stat Assoc 94(448):1264–1274

    Article  MATH  Google Scholar 

  5. Liu JS, Wong WH, Kong A (1994) Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes. Biometrika 81:27–40

    Article  MathSciNet  MATH  Google Scholar 

  6. Liu C, Rubin DB, Wu YN (1998) Parameter expansion to accelerate EM: the PX-EM algorithm. Biometrika 85(4): 755–770

    Article  MathSciNet  MATH  Google Scholar 

  7. Meng XL, van Dyk D (1997) The EM algorithm – an old folk-song sung to a fast new tune. J R Stat Soc B 59:511–567

    Article  MATH  Google Scholar 

  8. Meng XL, van Dyk D (1999) Seeking efficient data augmentation schemes via conditional and marginal augmentation. Biometrika 86:301–320

    Article  MathSciNet  MATH  Google Scholar 

  9. Swendsen RH, Wang J (1987) Nonuniversal critical dynamics in Monte Carlo simulations. Phys Rev Lett 58:86–88

    Article  Google Scholar 

  10. Tanner MA, Wong WH (1987) The calculation of posterior distributions by data augmentation. J Am Stat Assoc 82: 528–540

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ying Nian Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this entry

Cite this entry

Wu, Y.N. (2014). Data Augmentation. In: Ikeuchi, K. (eds) Computer Vision. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-31439-6_741

Download citation

Publish with us

Policies and ethics