Foundations and Trends® in Machine Learning > Vol 11 > Issue 5-6

Computational Optimal Transport: With Applications to Data Science

By Gabriel Peyré, CNRS and DMA, ENS, France, gabriel.peyre@ens.fr | Marco Cuturi, Google and CREST/ENSAE, France, cuturi@google.com

 
Suggested Citation
Gabriel Peyré and Marco Cuturi (2019), "Computational Optimal Transport: With Applications to Data Science", Foundations and Trends® in Machine Learning: Vol. 11: No. 5-6, pp 355-607. http://dx.doi.org/10.1561/2200000073

Publication Date: 12 Feb 2019
© 2019 G. Peyré and M. Cuturi
 
Subjects
Optimization,  Clustering,  Classification and prediction
 

Free Preview:

Download extract

Share

Download article
In this article:
1. Introduction
2. Theoretical Foundations
3. Algorithmic Foundations
4. Entropic Regularization of Optimal Transport
5. Semidiscrete Optimal Transport
6. W1 Optimal Transport
7. Dynamic Formulations
8. Statistical Divergences
9. Variational Wasserstein Problems
10. Extensions of Optimal Transport
Acknowledgements
References

Abstract

Optimal transport (OT) theory can be informally described using the words of the French mathematician Gaspard Monge (1746–1818): A worker with a shovel in hand has to move a large pile of sand lying on a construction site. The goal of the worker is to erect with all that sand a target pile with a prescribed shape (for example, that of a giant sand castle). Naturally, the worker wishes to minimize her total effort, quantified for instance as the total distance or time spent carrying shovelfuls of sand. Mathematicians interested in OT cast that problem as that of comparing two probability distributions—two different piles of sand of the same volume. They consider all of the many possible ways to morph, transport or reshape the first pile into the second, and associate a “global” cost to every such transport, using the “local” consideration of how much it costs to move a grain of sand from one place to another. Mathematicians are interested in the properties of that least costly transport, as well as in its efficient computation. That smallest cost not only defines a distance between distributions, but it also entails a rich geometric structure on the space of probability distributions. That structure is canonical in the sense that it borrows key geometric properties of the underlying “ground” space on which these distributions are defined. For instance, when the underlying space is Euclidean, key concepts such as interpolation, barycenters, convexity or gradients of functions extend naturally to the space of distributions endowed with an OT geometry. OT has been (re)discovered in many settings and under different forms, giving it a rich history. While Monge’s seminal work was motivated by an engineering problem, Tolstoi in the 1920s and Hitchcock, Kantorovich and Koopmans in the 1940s established its significance to logistics and economics. Dantzig solved it numerically in 1949 within the framework of linear programming, giving OT a firm footing in optimization. OT was later revisited by analysts in the 1990s, notably Brenier, while also gaining fame in computer vision under the name of earth mover’s distances. Recent years have witnessed yet another revolution in the spread of OT, thanks to the emergence of approximate solvers that can scale to large problem dimensions. As a consequence, OT is being increasingly used to unlock various problems in imaging sciences (such as color or texture processing), graphics (for shape manipulation) or machine learning (for regression, classification and generative modeling). This paper reviews OT with a bias toward numerical methods, and covers the theoretical properties of OT that can guide the design of new algorithms.We focus in particular on the recent wave of efficient algorithms that have helped OT find relevance in data sciences. We give a prominent place to the many generalizations of OT that have been proposed in but a few years, and connect them with related approaches originating from statistical inference, kernel methods and information theory. All of the figures can be reproduced using code made available on a companion website. This website hosts the book project Computational Optimal Transport. You will also find slides and computational resources.

DOI:10.1561/2200000073
ISBN: 978-1-68083-550-2
272 pp. $99.00
Buy book (pb)
 
ISBN: 978-1-68083-551-9
272 pp. $280.00
Buy E-book (.pdf)
Table of contents:
1. Introduction
2. Theoretical Foundations
3. Algorithmic Foundations
4. Entropic Regularization of Optimal Transport
5. Semidiscrete Optimal Transport
6. W1 Optimal Transport
7. Dynamic Formulations
8. Statistical Divergences
9. Variational Wasserstein Problems
10. Extensions of Optimal Transport
Acknowledgements
References

Computational Optimal Transport: With Applications to Data Science

The goal of Optimal Transport (OT) is to define geometric tools that are useful to compare probability distributions. Their use dates back to 1781.

Recent years have witnessed a new revolution in the spread of OT, thanks to the emergence of approximate solvers that can scale to sizes and dimensions that are relevant to data sciences. Thanks to this newfound scalability, OT is being increasingly used to unlock various problems in imaging sciences (such as color or texture processing), computer vision and graphics (for shape manipulation) or machine learning (for regression, classification and density fitting). This monograph reviews OT with a bias toward numerical methods and their applications in data sciences, and sheds lights on the theoretical properties of OT that make it particularly useful for some of these applications.

Computational Optimal Transport presents an overview of the main theoretical insights that support the practical effectiveness of OT before explaining how to turn these insights into fast computational schemes. Written for readers at all levels, the authors provide descriptions of foundational theory at two-levels. Generally accessible to all readers, more advanced readers can read the specially identified more general mathematical expositions of optimal transport tailored for discrete measures. Furthermore, several chapters deal with the interplay between continuous and discrete measures, and are thus targeting a more mathematically-inclined audience.

This monograph will be a valuable reference for researchers and students wishing to get a thorough understanding of Computational Optimal Transport, a mathematical gem at the interface of probability, analysis and optimization.

 
MAL-073