
Open access
Author
Date
2023Type
- Doctoral Thesis
ETH Bibliography
yes
Altmetrics
Abstract
The success of modern machine learning algorithms in extracting global information from data crucially relies on strong distributional assumptions on the input datasets. However, real-world datasets may contain outliers, fake or malicious data, or measurement errors that are known to substantially degrade the performance of many of these algorithms. The design of robust algorithms –that succeed even when the input dataset satisfies the distributional assumptions only approximately– has thus become a fundamental topic across statistics, mathematics and computer science.
In this thesis we resolve several open questions central to this broad research agenda. Our focus is two-fold: on one side we establish statistical and computational tractability of adversarial models; on the other, we introduce novel, efficient and robust algorithms that provide provably optimal guarantees.
With respect to the first goal, we unveil surprising information-computation gaps that show how the computational landscape of semi-random problems may differ from their average-case or worst-case counterparts. For example in the context of sparse principal component analysis or constraint satisfaction problems.
With respect to the second goal, we design new algorithms that achieve provably optimal guarantees in these general semi-random models. When there is a computational price to pay for robustness, our efficient algorithms match the new computational limits we established. When there is no price for robustness – such as for stochastic block models– our algorithms match the guarantees of their fragile counterparts and, in some cases, even improve over them.
By-products of our results are novel techniques to speed-up robust algorithms, and new state-of-the-art algorithms satisfying other important, related, requirements, such as differential privacy. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000642721Publication status
publishedExternal links
Search print copy at ETH Library
Publisher
ETH ZurichSubject
Machine Learning (stat.ML); Computational Complexity (cs.CC); Algorithms (ALG); Sum-of-squaresOrganisational unit
09622 - Steurer, David / Steurer, David
More
Show all metadata
ETH Bibliography
yes
Altmetrics