Open access
Author
Date
2020Type
- Doctoral Thesis
ETH Bibliography
yes
Altmetrics
Abstract
This thesis addresses two central tasks in image processing: single-image super-resolution and image compression with generative models.
While super-resolution has traditionally been viewed as a restoration task, it can also be seen as a form of lossy image compression where the encoder downscales
the image and decoder reconstructs it back.
Generative models, which typically have been studied as a tool for unsupervised learning, can also be used (as we will show) for image compression
to synthesize details in the decoder which cannot be stored as a given bitrate.
In the first part of this thesis, we study the problem of single image super-resolution, from the perspective of memory efficiency.
We investigate Adjusted Anchored Neighborhood Regression (A+), a clustered regression model over a low-resolution dictionary, and propose Regressor Basis Learning (RB) where we restrict the regressor set to a learned low dimensional subspace.
We show that RB achieves comparable performance to A+ but by using orders of magnitude fewer basis regressors, which makes it ideal for memory constrained applications.
In the second part of this thesis, we look at super-resolution as a generic regression task.
We propose the Anchored Regression Network (ARN), a generalization of A+ which is a smoothed relaxation of a piecewise linear regression through the combination of multiple linear regressors over soft assignments to anchor points.
We demonstrate the power of the ARN by applying it to two very diverse and challenging tasks: age prediction from face images and image super-resolution, and yield strong results in both cases.
In the third part of this thesis, we present a learned image compression system based on Generative Adversarial Networks (GANs), operating at extremely low bitrates.
Our proposed model synthesizes details it cannot afford to store, obtaining visually pleasing results at bitrates where previous methods fail and show strong artifacts.
A user study confirms that for low bitrates, our approach is preferred to state-of-the-art methods, even when they use more than double the bits.
In the fourth and final part of this thesis, we study how to inspect the latent space generative models (such as GANs) and Variational Auto Encoders (VAEs), which are trained for a fixed prior distribution.
We show that popular latent space operations (such as interpolating between two samples) can induce a distribution mismatch between the resulting outputs and the prior distribution.
To address this, we propose to use distribution matching transport maps to ensure that such latent space operations preserve the prior distribution and experimentally validate that the proposed operations give higher quality samples compared to the original operations. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000487536Publication status
publishedExternal links
Search print copy at ETH Library
Publisher
ETH ZurichSubject
Computer vision; Super-resolution; Compression; Generative modelsOrganisational unit
03514 - Van Gool, Luc / Van Gool, Luc
More
Show all metadata
ETH Bibliography
yes
Altmetrics