Competitive Coevolution and Active Learning -- Malcolm Heywood, FCS, Dalhousie Uni

Active Learning and Competitive Coevolution:

GP requires that multiple candidate solutions (individuals) be evaluated before credit assignment can be accomplished. The stochastic nature of the search process implies that the process is also iterative. All of this makes the process of training GP models computationally expensive, precluding the scaling of the GP paradigm to problems described by large datasets as exemplified, for example, by the annual KDD Cup Competition. Hardware speedups have therefore been widely utilized. An alternative approach was proposed in the mid 1990's in the form of ``Dynamic Subset Selection'' (Gathercole and Ross). In effect, a subset of the training data is stochastically sampled such that the exemplars least frequently sampled or resulting in most error are given priority. This is sufficient as long as the dataset is sufficiently ``well behaved'' i.e., the dataset is balanced and is sufficiently concise to exist in RAM alone. A series of works addressed these drawbacks with the introduction of a hierarchical model of sampling and support for protecting the representation of the minor class(es):

Hierarchical Active Learning algorithm: This was the original work in which a hierarchical model for applying Gathercole’s `age' and `difficulty' exemplar selection heuristics was demonstrated on a classification problem with half a million training exemplars. Training times were in the order of fifteen minutes on a laptop computer and classification performance was competitive with that from the original KDD competition winners.
- Song D., Heywood M.I., Zincir-Heywood A.N. (2005) Training Genetic Programming on Half a Million Patterns: An Example from Anomaly Detection, IEEE Transactions on Evolutionary Computation, 9(3), pp 225-239, ISSN 1089-778X
Balanced Block algorithm: The original hierarchical active learning model of 1(a) did not make any specific attempt to address the problem of unbalanced datasets, typically resulting in degenerate solutions in such cases. The work was extended to explicitly address this problem, without compromising the computational efficiency of the approach.
- Curry R., Lichodzijewski P., Heywood M.I. (2007) Scaling Genetic Programming to Large Datasets using Hierarchical Dynamic Subset Selection, IEEE Transactions on Systems, Man and Cybernetics - Part B: Cybernetics. 37(4): 1065-1073.
Pareto Competitive Coevolution: An alternative approach for establishing the subsets on which active learning is based is to utilize a competitive model of coevolution, thus taking the perspective of a game (or host-parasite model). The gaming model of interest to this work was first developed under the auspice of Genetic Algorithms and typically utilizes a pairwise competition, possibly between individuals from two different populations (e.g., de Jong and Pollock). In the latter case, the basic goal is to encourage the two populations to engage in a mutually beneficial competition resulting in the improvement of both populations. For the case of GP, one population would represent a subset of indexes to the dataset and the other the GP classifiers. Needless to say, there are several technical challenges to successfully carrying these concepts over to the paradigm of GP classifiers, and multiple approaches to realizing practical competitive cooperative models. Porting the GA Pareto competitive model of coevolution into a GP classification setting requires that a scheme be adopted for defining how individuals are combined from the Pareto front of GP classifiers post training i.e., a voting policy. In addition we adopt pruning heuristics designed to complement the post training voting policy, and make use of a balanced point population heuristic, motivated by recent work on sampling algorithms for semi-supervised learning models (Weiss and Provost). The resulting model provides a significant speedup relative to GP without coevolution, while bettering the classification accuracy, particularly on unbalanced datasets.
- Lemczyk M., Heywood M.I. (2007) Training Binary GP Classifiers Efficiently: a Pareto-coevolutionary Approach, 10th European Conference on Genetic Programming (EuroGP), M. Ebner et al. (eds), Lecture Notes in Computer Science 4445: 229-240, Springer-Verlag.