Files

Abstract

Artificial intelligence has been an ultimate design goal since the inception of computers decades ago. Among the many attempts towards general artificial intelligence, modern machine learning successfully tackles many complex problems thanks to the progress in deep learning, and in particular in convolutional neural networks (CNN). To design a CNN for a specific task, one common approach consists of adapting the heuristics from the pre-deep-learning era to the CNN domain. In the first part of this thesis, we introduce two methods that follow this approach: i) We build a covariance descriptor, i.e., a local descriptor that is suitable for texture recognition, to replace the first-order fully connected layers in an ordinary CNN, showing that such a descriptor yields state-of-the-art performance on many fine-grained image classification tasks with orders of magnitude fewer feature dimensions; ii) we develop a light-weight recurrent U-Net for image semantic segmentation, inspired by the biological eye saccadic movements, that yields real-time predictions on devices with limited computational resources. As most methods pre-dating automatic machine learning (AutoML), the two above-mentioned CNNs were human-designed. In the past few years, however, neural architecture search~(NAS), which aims to facilitate the design of deep networks for new tasks, has drawn an increasing attention. In this context, the weight-sharing approach, which consists of utilizing a super-net to encompass all possible architectures within a search space, has become a de facto standard in NAS because it enables the search to be done on commodity hardware. In the second part of this thesis, we then provide an in-depth study of recent weight-sharing NAS algorithms. First, we discover a phenomenon in the weight-sharing NAS training pipeline, which we dub multi-model forgetting, that negatively impacts the super-net quality, and propose a statistically motivated approach to address it. Subsequently, we find that (i) on average, many popular NAS algorithms perform similarly to a random architecture sampling policy; (ii) the widely-adopted weight sharing strategy degrades the ranking of the NAS candidates to the point of not reflecting their true performance, thus reducing the effectiveness of the search process. We then further decouple weight sharing from the NAS sampling policy, and isolate 14 factors that play a key role in the success of super-net training. Finally, to improve the super-net quality, we propose a regularization term that aims to maximize the correlation between the performance rankings of the super-net and of the stand-alone architectures using a small set of landmark architectures.

Details

PDF