Selected Completed Master Thesis

Caleidgh Bayer

Evaluating simple reactive agents in visual reinforcement learning tasks: August 2023
Visual formulations of reinforcement learning tasks are potentially challenging because (1) the state space is large and composed from pixels (so unlikely to be directly correlated with actions), (2) the underlying task might be partially observable despite the high dimensionality, and (3) rewards can be sparse, so do not necessarily discriminate between useful and not useful decisions. In this thesis we compare the classic deep Q-network (a temporal difference reinforcement learning approach) with tangled program graphs (TPG) (a genetic programming approach) under complete and partially observable visual reinforcement learning tasks from ViZDoom. We demonstrate that TPG is particularly effective at imparting structure on the partially observable task (resulting in a general policy for navigating a labyrinth), but is relatively poor at solving a fully observable (aiming) task. Conversely, DQN is very effective when presented with the complete information aiming task, but is unable to discover general solutions to the partially observable navigation task. We attribute these preferences to the different approaches TPG and DQN assume for addressing representation/feature construction versus credit assignment.

Noah Sealy

QTRB: Team-based region building using Q-learning to derive policy on programs parameterized by local reward signal : April 2023
While attempting to solve 2-dimensional grid world maze tasks, it was observed that genetic programming is limited by its random initialization and no use of local reward. This thesis proposes a hybrid algorithm called QTRB, team-based region building with Q-learning, which attempts to integrate genetic programming and reinforcement learning to use local reward during evolution. During evolution, QTRB constructs programs based directly on local environmental reward; programs are then passed to a reinforcement learning agent to learn on as a model. QTRB was tested to solve variously sized 2-dimensional maze tasks, hypothesizing that policy can be derived from an agent learning from this model. The results suggest that QTRB can derive policy on the given tasks, with fewer direct environment queries than traditional Q-learning as the task size scales.

Alexandru Ianta

A compariosn of traversal strategies for Tangled Program Graphs under the Arcade Learning Environment : April 2021
Tangled program graphs provides a framework for constructing modular genetic programming solutions to visual reinforcement learning tasks. In order to guard against the development of cycles within the resulting graph, and therefore introduce the halting problem, a traversal strategy forbidding the revisiting of vertices was originally assumed. In this thesis an alternative traversal strategy wherein vertex revisits are allowed but edge revisits are not is explored. An empirical study is performed using 20 game titles from the Arcade Learning Environment in order to assess the relative impact of the different traversal strategies on the resulting agent behaviours and underlying graph characteristics. Ultimately both strategies appear to result in behaviours that are statistically very similar. The most notable differences appear in distributions of actions used to reach the same performance.

Matthew Wright

Providing real-valued actions for Tangled Program Graphs under the Cartpole Benchmark : August 2020
The Tangled Program Graph framework (TPG) is a genetic programming approach to reinforcement learning. Canonical TPG is limited to performing discrete actions. This thesis investigates mechanisms by which TPG might perform real-valued actions. Two approaches are proposed. In the first, a decision-making network extracts state from TPG’s internal structure. A gradient-based learning method tailors the network to this representation. In the second, TPG is modified to generate a state representation in an external matrix visible to the decision-making network. No additional learning algorithm is used to configure the ‘decision-making network’. Instead, TPG adapts to use the default configuration. This thesis applies these approaches to a modified version of the classic CartPole environment that accepts real-valued actions. This enables the comparison between discrete action configurations of the task and the real-valued formulation. Results indicate that there is no additional complexity in TPG solutions under real-valued action versus discrete action configurations.

Robert J. Smith

Evolving policies to solve the Rubik's Cube: Experiments with ideal and apporximate performance functions: August 2016
Abtract: This work reports on an approach to direct policy discovery (a form of reinforcement learning) using genetic programming (GP) for the 3 by 3 by 3 Rubik's Cube. Specically, a synthesis of two approaches is proposed: 1) a previous group theoretic formulation is used to suggest a sequence of objectives for developing solutions to different stages of the overall task; and 2) a hierarchical formulation of GP policy search is utilized in which policies adapted for an earlier objective are explicitly transferred to aid the construction of policies for the next objective. The resulting hierarchical organization of policies into a policy tree explicitly demonstrates task decomposition and policy reuse. Algorithmically, the process makes use of a recursive call to a common approach for maintaining a diverse population of GP individuals and then learns how to reuse subsets of programs (policies) developed against the earlier objective. Other than the two objectives, we do not explicitly identify how to decompose the task or mark specic policies for transfer. Moreover, at the end of evolution we return a population solving 100 percent of 17,675,698 different initial Cubes for the two objectives currently in use.

Jessica Pauli de Castro Bonson

Diversity and novelty as objectives in Poker: August 2016
Abtract: Evolutionary algorithms are capable to lead to efficient solutions without a predefined design and few human bias. However, they can be prone to early convergence and may be deceived by a non-informative or deceptive fitness function, and thus the agents may end up as suboptimal solutions or not be able to solve the task. Diversity maintenance and novelty search are methods developed to deal with these drawbacks. The first method modifies the evolution so agents are selected based on their fitness and their diversity. Novelty search builds on top of it, and evolve individuals that are not only diverse, but that also possess novel behaviors. Currently that are no studies on diversity and novelty for tasks that possess both deceptive properties and large amounts of ambiguity. In this work Heads-up Texas Hold'em Poker is used to provide a domain exhibiting both properties simultaneously. Specifically, Poker contains ambiguity due to imperfect information, stochasticity, and intransitivity. It is also deceptive, due to the complex strategies necessary to perform well in the game, such as bluffing. Finally, this poker variant also contains a behavior space that is extremely large, due to its many game states and decision points. This thesis investigates if diversity maintenance and novelty search are still beneficial under a task that posses these features. These techniques are compared between themselves and between the classic evolutionary method. The goal is to analyze if these methods improve the diversity in the population, and if it leads to an improved performance. This work does not aim to develop a world-class poker player, but to assess the significance of diversity and novelty search. The results show that diversity maintenance methods were not only able to produce a diverse range of strategies for poker, but also to produce statistically better strategies than in a scenario with no diversity.

Erinn Atwater

Towards coevolutionary genetic programming with Pareto archiving under streaming data: August 2013
Abtract: Classication under streaming data constraints implies that training must be performed continuously, can only access individual exemplars for a short time after they arrive, must adapt to dynamic behaviour over time, and must be able to retrieve a current classier at any time. A coevolutionary genetic programming framework is adapted to operate in non-stationary streaming data environments. Methods to generate synthetic datasets for benchmarking streaming classication algorithms are introduced, and the proposed framework is evaluated against them. The use of Pareto archiving is evaluated as a mechanism for retaining access to a limited number of useful exemplars throughout training, and several tness sharing heuristics for archiving ective under streams with continuous (incremental) changes, while the addition of an aging heuristic is preferred when the stream has stepwise changes. Tapped delay lines are explored as a method for explicitly incorporating sequence context in cyclical data streams, and their use in combination with the aging heuristic suggests a promising route forward.

Alexander Loginov

On the utility of evolving FOREX market trading agents with criteria based retraining: March 2013
Abtract: This research investigates the ability of genetic programming to build protable trading strategies for the Foreign Exchange Market (FX) of one major currency pair (EURUSD) using one hour prices from July 1, 2009 to November 30, 2012. We recognize that such environments are likely to be non-stationary and we do not expect that a single training partition, used to train a trading agent, represents all likely future behaviours. The proposed adaptive retraining algorithm { hereafter FXGP { detects poor trading behaviours and trains a new trading agent. This represents a signicant departure from current practice which assumes some form of continuous evolution. Extensive benchmarking is performed against the widely used EURUSD currency pair. The non-stationary nature of the task is shown to result in a preference for exploration over exploitation. Moreover, adopting a behavioural approach ective than assuming incremental adaptation on a continuous basis. From the application perspective, we demonstrate that use of a validation partition and Stop-Loss (S/L) orders signicantly improves the performance of a trading agent. In addition the task of co-evolving of technical indicators (TI) and the decision trees (DT) for deploying trading agent is explicitly addressed. The results of 27 experiments of 100 simulations each demonstrate that FXGP significantly outperforms existing approaches and generates protable solutions with a high probability.

Stephen Kelly

On developmental variation in hierarchical symbiotic policy search : August 2012
A hierarchical symbiotic framework for policy search with genetic programming (GP) is evaluated in two control-style temporal sequence learning domains. The symbiotic formulation assumes each policy takes the form of a cooperative team between multiple symbiont programs. An initial cycle of evolution establishes a diverse range of host behaviours with limited capability. The second cycle uses these initial policies as meta actions for reuse by symbiont programs. The relationship between development and ecology is explored by explicitly altering the interaction between learning agent and environment at fixed points throughout evolution. In both task domains, this developmental diversity significantly improves performance. Specifically, ecologies designed to promote good specialists in the first developmental phase and then good generalists result in much stronger organisms from the perspective of generalization ability and efficiency. Conversely, when there is no diversity in the interaction between task environment and policy learner, the resulting hierarchy is not as robust or general. The relative contribution from each cycle of evolution in the resulting hierarchical policies is measured from the perspective of multi-level selection. These multi-level policies are shown to be significantly better than the sum of contributing meta actions.

Michal Lemczyk

Pareto-coevolutionary Genetic Programming Classifier: May 2006
Abstract: The conversion and extension of the Incremental Pareto-Coevolution Archive algorithm (IPCA) into the domain of Genetic Programming classifier evolution is presented. The coevolutionary aspect of the IPCA algorithm is utilized to simultaneously evolve a subset of the training data that provides distinctions between candidate classifiers. To remain practical, a method for limiting the sizes of the IPCA archives is required. Several archive pruning basis are proposed and evaluated. Furthermore, methods for consolidating the resultant pareto-front of classifiers into one label per testing data point are presented and evaluated. The algorithm is compared against; a traditional GP classifier using the entirety of the training data for evaluation, in addition to GP classifiers which perform Stochastic, Cycling, and Dynamic subset selection. Results indicate that the PGPC algorithm often outperforms the subset selection algorithms in terms of classification performance, and often outperforms the traditional classifier while requiring roughly 1/128 of the wall-clock time.

Ashley George

Local versus Global Wrapper Functions in Genetic Programming: April 2006
Abstract: Genetic Programming offers freedom in the definition of the cost function that is unparalleled in the realm of supervised learning algorithms. However, this freedom goes largely unexploited in previous work. Here, we revisit the design of fitness functions for genetic programming by explicitly considering the contribution of the wrapper and cost function. Withing the context of supervised learning, as applised to classification problems, a clustering methodology is introduced using cost functions which encourage maximization of separation between in and out of class exemplars. Through a series of empirical investigations of the nature of these functions, we demonstrate that classifier performance is much more dependable than previousl the case under the genetic programming paradigm. In addition, we also observe solutions with lower complexity than typically returned by the classically employed hits (or even sum square error) based cost functions.

Robert Redhead

Evaluation of Cluster Combination Functions for Mixture of Experts: March 2005
Abstract: The Mixtures of Experts (MoE) model provides the basis for building modular neural network solutions. In this work we are interested in methods for decomposing the input before forwarding to the MoE architecture. By doing so we are able to define the number of experts from the data itself. Specific schemes are shown to be appropriate for regression and classification problems, where each appear to have different preferences.

Robert Curry

Towards Efficient Training on Large Datasets for Genetic Programming: September 2004
Abstract: Genetic Programming (GP) has the potential to provide unique solutions to a wide range of supervised learning problems. However, the technique suffers from a widely acknowledged computational overhead. As a consequence, applications of GP are often confined to datasets consisting of hundreds of training exemplars as opposed to tens of thousands of exemplars, thus limiting the applicability of the approach. This work proposes and thoroughly investigates three data sub-sampling algorithms that filter the initial training dataset in parallel with the learning process. The motivation being to focus the GP training on the most difficult or least recently visited exemplars. To do so, we build a hierarchy of subset selections, thus matching the concept of a memory hierarchy. Such an approach provides for the training of GP solutions to data sets with hundreds of thousands of exemplars in tens of minutes whilst matching the classification accuracies of more classical approaches.

Peter Lichodzijewski

Cascaded GP Models for Data Mining: September 2004
Abstract: The cascading CasGP architecture for incrementally building increasingly sophisticated classifiers is presented. Concise solutions are evolved by using fixed-length genetic programming, and training times remain acceptable through the use of the RSS-DSS data filtering algorithm. The architecture provides an attractive alternative approach to building arbitrarily complex solutions via genetic programming. The effectiveness of the multi-layered, cascaded approach versus single classifiers is demonstrated. Several fitness functions are evaluated, and the sum-squared error function is recommended. To guide evolution at higher layers toward finding unique solutions, fitness sharing is introduced. Despite mixed results, the approach is recommended provided that a more robust sharing function is found. To further reduce computational overhead, two mechanisms were investigated, however, both were found to require further work. Additional analysis showed that CasGP produces more accurate models than other GP-based approaches and that it remains competitive with other machine learning algorithms.

Leigh Wetmore

Dynammic Subset Selection Applied to Self-Organizing Maps: April 2004
Abstract: The self-organizing map (SOM) is an unsupervised learning algorithm that attempts to form a compact representation of a data set via a set of prototype vectors that exist in the same space. Dynamic subset selection (DSS) is a genetic programming (GP) based method of selecting the particularly difficult-to-learn patterns from a data set, where difficulty is a GP-specific measure. In this work, the dynamic subset selection self-organizing map (DSSSOM) is presented. It is the application of DSS to the SOM, with modifications to both. It is able to handle very large data sets, and has a DSS-based built-in stopping mechanism. The performance of the new algorithm is measured over five data sets via an original implementation, and compared to the SOM and other learning algorithms. Results show that the DSSSOM achieves a performance on par with the SOM with training time reduced by a factor of up to nearly five hundred.

H. Gunes Kayacik

Hierarchical Self Organizing Map Based IDS on KDD Benchmark: May 2003
Abstract: In this work, an architecture consisting entirely Self-Organizing Feature Maps is developed for network based intrusion detection. The principle interest is to analyze just how far such an approach can be taken in practice. To do so, the KDD benchmark dataset from the International Knowledge Discovery and Data Mining Tools Competition is employed. In this work, no content based feature is utilized. Experiments are performed on two-level and three-level hierarchies, training set biases and the contribution of features to intrusion detection. Results show that a hierarchical SOM intrusion detection system is as good as the other learning based approaches that use content based features.

Dong Song

A Linear Genetic Programming Approach to Intrusion Detection: March 2003
Abstract: Page-based Linear Genetic Programming (GP) is implemented with a new two-layer Subset Selection scheme to address the two-class intrusion detection classification problem on the KDD-99 benchmark dataset. By careful adjustment of the relationship between subset layers, over fitting by individuals to specific subsets is avoided. Unlike the current approaches to this benchmark, the learning algorithm is also responsible for deriving useful temporal features. Following evolution, decoding of a GP individual demonstrates that the solution is unique and comparative to hand coded solutions found by experts. Standard, dynamic and lexicographic fitness are implemented and compared. In summary, this work represents a significant advance over previous applications of GP in which evaluation is limited to relatively concise benchmark datasets (typically less than a thousand patterns). This work details a hierarchical extension to the Subset Selection scheme resulting in the successful application of GP to a training dataset consisting of over 494,021 patterns. Moreover, the computational requirements are less than decision tree solutions applied to the same dataset.

Suihong Liang

Intelligent Packets for Dynamic Network Routing : April 2002
Abstract: A distributed GA (Genetic Algorithm) is designed for the packet switched network routing problem under minimal information, i.e., without information exchange, every node only knows the existence of its neighboring nodes. The requirements of such a problem mean that intelligent packets are required to possess more intelligence than was the norm. To this end a distributed GA approach is developed and benchmarked against the AntNet algorithm under similar information constraints. A profile of AntNet under local and global information is developed with the proposed distributed GA clearly improving on the AntNet algorithm under local information constraints.