There has been growing recent interest in the field of active learning for binary classification. This thesis develops a Bayesian approach to active learning which aims to minimise the objective function on which the learner is evaluated, namely the expected misclassification cost. We call this approach the expected cost reduction approach to active learning. In this form of active learning queries are selected by performing a `lookahead' to evaluate the associated expected misclassification cost.
\paragraph{}
Firstly, we introduce the concept of a \textit{query density} to explicitly model how new data is sampled. An expected cost reduction framework for active learning is then developed which allows the learner to sample data according to arbitrary query densities. The model makes no assumption of independence between queries, instead updating model parameters on the basis of both which observations were made \textsl{and} how they were sampled. This approach is demonstrated on the probabilistic high-low game which is a non-separable extension of the high-low game presented by \cite{Seung_etal1993}. The results indicate that the Bayes expected cost reduction approach performs significantly better than passive learning even when there is considerable overlap between the class distributions, covering $30\%$ of input space. For the probabilistic high-low game however narrow queries appear to consistently outperform wide queries. We therefore conclude the first part of the thesis by investigating whether or not this is always the case, demonstrating examples where sampling broadly is favourable to a single input query.
\paragraph{}
Secondly, we explore the Bayesian expected cost reduction approach to active learning within the pool-based setting. This is where learning is limited to a finite pool of unlabelled observations from which the learner may select observations to be queried for class-labels. Our implementation of this approach uses Gaussian process classification with the expectation propagation approximation to make the necessary inferences. The implementation is demonstrated on six benchmark data sets and again demonstrates superior performance to passive learning.
EPSRC