Active Learning and Competitive Coevolution:
GP requires that multiple candidate solutions (individuals) be
evaluated before credit assignment can be accomplished. The stochastic nature of
the search process implies that the process is also iterative. All of this makes the
process of training GP models computationally expensive, precluding the scaling
of the GP paradigm to problems described by large datasets as exemplified, for
example, by the annual KDD Cup Competition. Hardware speedups have therefore
been widely utilized. An alternative approach was proposed in the mid 1990's in the
form of ``Dynamic Subset Selection'' (Gathercole and Ross). In effect, a subset of
the training data is stochastically sampled such that the exemplars least frequently
sampled or resulting in most error are given priority. This is sufficient as long as the
dataset is sufficiently ``well behaved'' i.e., the dataset is balanced and is sufficiently
concise to exist in RAM alone. A series of works addressed these drawbacks with
the introduction of a hierarchical model of sampling and support for protecting the
representation of the minor class(es):
- Hierarchical Active Learning algorithm:
This was the original work in which
a hierarchical model for applying Gathercole’s `age' and `difficulty' exemplar
selection heuristics was demonstrated on a classification problem with half a
million training exemplars. Training times were in the order of fifteen minutes
on a laptop computer and classification performance was competitive with that
from the original KDD competition winners.
- Balanced Block algorithm:
The original hierarchical active learning model of
1(a) did not make any specific attempt to address the problem of unbalanced
datasets, typically resulting in degenerate solutions in such cases. The work
was extended to explicitly address this problem, without compromising the
computational efficiency of the approach.
- Curry R., Lichodzijewski P., Heywood M.I. (2007) Scaling Genetic Programming
to Large Datasets using Hierarchical Dynamic Subset Selection,
IEEE Transactions on Systems, Man and Cybernetics - Part B: Cybernetics. 37(4): 1065-1073.
- Pareto Competitive Coevolution:
An alternative approach for establishing the
subsets on which active learning is based is to utilize a competitive model of
coevolution, thus taking the perspective of a game (or host-parasite model).
The gaming model of interest to this work was first developed under the
auspice of Genetic Algorithms and typically utilizes a pairwise competition,
possibly between individuals from two different populations (e.g., de Jong and
Pollock). In the latter case, the basic goal is to encourage the two populations
to engage in a mutually beneficial competition resulting in the improvement of
both populations. For the case of GP, one population would represent a subset
of indexes to the dataset and the other the GP classifiers. Needless to say,
there are several technical challenges to successfully carrying these concepts
over to the paradigm of GP classifiers, and multiple approaches to realizing
practical competitive cooperative models. Porting the GA Pareto competitive
model of coevolution into a GP classification setting requires that a scheme
be adopted for defining how individuals are combined from the Pareto front
of GP classifiers post training i.e., a voting policy. In addition we adopt
pruning heuristics designed to complement the post training voting policy,
and make use of a balanced point population heuristic, motivated by recent
work on sampling algorithms for semi-supervised learning models (Weiss and
Provost). The resulting model provides a significant speedup relative to GP
without coevolution, while bettering the classification accuracy, particularly
on unbalanced datasets.