Abstract:
Deep learning is a modern subfield of machine learning, itself a subfield of artificial intelligence.
The key distinguishing feature of modern deep learning is the use of many layers of computation, which
can be trained with relatively simple and scalable optimization methods such as stochastic gradient
descent. By stacking many computation layers together, models in deep learning can represent complex
functions that were previously difficult to approximate. This capability has led to significant advances
in a variety of fields, including computer vision, natural language processing, game playing, and protein
folding, among others.
Despite this success, key pieces of deep learning remain not well-understood, hampering the field’s
ability to advance. This thesis aims to shed light on a variety of fundamental components in deep
learning, using the additional understanding to subsequently improve them. These components are
varied: We develop an understanding of mixed-example data augmentation, disproving the hypothesis
that linearity is responsible for its efficacy, while introducing new methods that are superior to linear
approaches. We show that logit pairing methods aimed at enhanced adversarial robustness derive much
of their benefit from logit regularization, then show how they can be extended by other logit regularization
techniques. We analyze multiple facets of batch normalization, a critical component of many
neural networks, and demonstrate four unique changes that improve it across a variety of settings. We
study the effects of nondeterminism on model training, show that they are largely due to instability in
model training, and propose two methods for reducing the effects of instability.