Abstract:
Neural networks have attracted a lot of attention from both academia and industry due to their success in challenging tasks such as machine vision, speech recognition and natural language processing. However, in order to achieve good performance on supervised learning tasks, neural networks have to be trained on large amounts of labelled training data. This often requires human experts to painstakingly label every single example in the data. The labelling process can be costly and time consuming. Even though labelled data can be scarce and expensive to collect in practice, unlabelled data are usually available in abundance. The goal of this thesis is to improve the generalisation performance of neural networks on classification tasks by utilising the unlabelled data. This thesis studies three strategies to tackle this problem: pretraining, semi-supervised learning and active learning.
In this thesis, we propose a self-supervised pretraining method for tabular data that learns to identify real data from randomly shuffled data. Then the weights learned in the pretraining are reused as initial weights for the original task on the labelled training set. In the second piece of work, we break the common assumption in semi-supervised learning that the labelled data and unlabelled data come from the same distribution. We empirically show that novel classes in unlabelled data can lead to a degradation in generalisation performance for semi-supervised algorithms. We propose a 1-nearest-neighbour based method to assign a weight to each unlabelled example in order to reduce the negative effect of novel classes in unlabelled data. Lastly, we propose a new uncertainty-based active learning method specifically for neural networks trained using stochastic gradient descent by querying examples whose predictions change the most during the training. Experimental results show that the proposed method is more effective when a large labelled training set is already available. We also show that different types of active learning methods perform differently under different settings. It suggests that to fully evaluate the characteristics of an active learning algorithm, experiments under a wide range of settings are required.