Selected Completed Master Thesis
Caleidgh Bayer
Evaluating simple reactive agents in visual reinforcement learning tasks: August 2023
Visual formulations of reinforcement learning tasks are potentially challenging because
(1) the state space is large and composed from pixels (so unlikely to be directly
correlated with actions), (2) the underlying task might be partially observable despite
the high dimensionality, and (3) rewards can be sparse, so do not necessarily discriminate
between useful and not useful decisions. In this thesis we compare the classic
deep Q-network (a temporal difference reinforcement learning approach) with tangled
program graphs (TPG) (a genetic programming approach) under complete and partially
observable visual reinforcement learning tasks from ViZDoom. We demonstrate
that TPG is particularly effective at imparting structure on the partially observable
task (resulting in a general policy for navigating a labyrinth), but is relatively poor at
solving a fully observable (aiming) task. Conversely, DQN is very effective when presented
with the complete information aiming task, but is unable to discover general
solutions to the partially observable navigation task. We attribute these preferences
to the different approaches TPG and DQN assume for addressing representation/feature
construction versus credit assignment.
Noah Sealy
QTRB: Team-based region building using Q-learning to derive policy on
programs parameterized by local reward signal
: April 2023
While attempting to solve 2-dimensional grid world maze tasks, it was
observed that genetic programming is limited by its random initialization
and no use of local reward. This thesis proposes a hybrid algorithm called
QTRB, team-based region building with Q-learning, which attempts to
integrate genetic programming and reinforcement learning to use local
reward during evolution. During evolution, QTRB constructs programs based
directly on local environmental reward; programs are then passed to a
reinforcement learning agent to learn on as a model. QTRB was tested to
solve variously sized 2-dimensional maze tasks, hypothesizing that policy
can be derived from an agent learning from this model. The results suggest
that QTRB can derive policy on the given tasks, with fewer direct
environment queries than traditional Q-learning as the task size scales.
Alexandru Ianta
A compariosn of traversal strategies for Tangled Program Graphs under the Arcade Learning Environment
: April 2021
Tangled program graphs provides a framework for constructing modular genetic programming solutions to visual reinforcement learning tasks. In order to guard against the development of cycles within the resulting graph, and therefore introduce the halting problem, a traversal strategy forbidding the revisiting of vertices was originally assumed. In this thesis an alternative traversal strategy wherein vertex revisits are allowed but edge revisits are not is explored. An empirical study is performed using 20 game titles from the Arcade Learning Environment in order to assess the relative impact of the different traversal strategies on the resulting agent behaviours and underlying graph characteristics. Ultimately both strategies appear to result in behaviours that are statistically very similar. The most notable differences appear in distributions of actions used to reach the same performance.
Matthew Wright
Providing real-valued actions for Tangled Program Graphs under the Cartpole Benchmark
: August 2020
The Tangled Program Graph framework (TPG) is a genetic programming approach to reinforcement learning. Canonical TPG is limited to performing discrete actions. This thesis investigates mechanisms by which TPG might perform real-valued actions. Two approaches are proposed. In the first, a decision-making network extracts state from TPG’s internal structure. A gradient-based learning method tailors the network to this representation. In the second, TPG is modified to generate a state representation in an external matrix visible to the decision-making network. No additional learning algorithm is used to configure the ‘decision-making network’. Instead, TPG adapts to use the default configuration. This thesis applies these approaches to a modified version of the classic CartPole environment that accepts real-valued actions. This enables the comparison between discrete action configurations of the task and the real-valued formulation. Results indicate that there is no additional complexity in TPG solutions under real-valued action versus discrete action configurations.
Robert J. Smith
Evolving policies to solve the Rubik's Cube: Experiments with
ideal and apporximate performance functions: August 2016
Abtract: This work reports on an approach to direct policy discovery (a form of
reinforcement learning) using genetic programming (GP) for the 3 by 3 by 3 Rubik's Cube. Specically, a synthesis of
two approaches is proposed: 1) a previous group theoretic formulation is used to suggest
a sequence of objectives for developing solutions to different stages of the overall task; and
2) a hierarchical formulation of GP policy search is utilized in which policies adapted for an
earlier objective are explicitly transferred to aid the construction of policies for the next
objective. The resulting hierarchical organization of policies into a policy tree explicitly
demonstrates task decomposition and policy reuse. Algorithmically, the process makes use of a
recursive call to a common approach for maintaining a diverse population of GP individuals and
then learns how to reuse subsets of programs (policies) developed against the earlier
objective. Other than the two objectives, we do not explicitly identify how to decompose the
task or mark specic policies for transfer. Moreover, at the end of evolution we return a
population solving 100 percent of 17,675,698 different initial Cubes for the two objectives
currently in use.
Jessica Pauli de Castro Bonson
Diversity and novelty as objectives in Poker: August 2016
Abtract: Evolutionary algorithms are capable to lead to efficient solutions without a predefined design and few human bias. However, they can be prone
to early convergence and may be deceived by a non-informative or deceptive fitness function,
and thus the agents may end up as suboptimal solutions or not be able to solve the task.
Diversity maintenance and novelty search are methods developed to deal with these drawbacks.
The first method modifies the evolution so agents are selected based on their fitness and
their diversity. Novelty search builds on top of it, and evolve individuals that are not only
diverse, but that also possess novel behaviors. Currently that are no studies on diversity and
novelty for tasks that possess both deceptive properties and large amounts of ambiguity. In
this work Heads-up Texas Hold'em Poker is used to provide a domain exhibiting both properties
simultaneously. Specifically, Poker contains ambiguity due to imperfect information,
stochasticity, and intransitivity. It is also deceptive, due to the complex strategies
necessary to perform well in the game, such as bluffing. Finally, this poker variant also
contains a behavior space that is extremely large, due to its many game states and decision
points. This thesis investigates if diversity maintenance and novelty search are still
beneficial under a task that posses these features. These techniques are compared between
themselves and between the classic evolutionary method. The goal is to analyze if these
methods improve the diversity in the population, and if it leads to an improved performance.
This work does not aim to develop a world-class poker player, but to assess the significance
of diversity and novelty search. The results show that diversity maintenance methods were not
only able to produce a diverse range of strategies for poker, but also to produce
statistically better strategies than in a scenario with no diversity.
Erinn Atwater
Towards coevolutionary
genetic programming with Pareto archiving under streaming data: August 2013
Abtract: Classication under streaming data constraints implies that training must be
performed continuously, can only access individual exemplars for a short time after they
arrive, must adapt to dynamic behaviour over time, and must be able to retrieve a current
classier at any time. A coevolutionary genetic programming framework is adapted to operate in
non-stationary streaming data environments. Methods to generate synthetic datasets for
benchmarking streaming classication algorithms are introduced, and the proposed framework is
evaluated against them. The use of Pareto archiving is evaluated as a mechanism for retaining
access to a limited number of useful exemplars throughout training, and several tness sharing
heuristics for archiving ective under streams with continuous (incremental) changes, while the
addition of an aging heuristic is preferred when the stream has stepwise changes. Tapped delay
lines are explored as a method for explicitly incorporating sequence context in cyclical data
streams, and their use in combination with the aging heuristic suggests a promising route
forward.
Alexander Loginov
On the utility of evolving FOREX market trading agents with criteria based
retraining: March 2013
Abtract: This research investigates the ability of genetic programming to build protable trading
strategies for the Foreign Exchange Market (FX) of one major currency pair
(EURUSD) using one hour prices from July 1, 2009 to November 30, 2012. We recognize
that such environments are likely to be non-stationary and we do not expect
that a single training partition, used to train a trading agent, represents all likely
future behaviours. The proposed adaptive retraining algorithm { hereafter FXGP {
detects poor trading behaviours and trains a new trading agent. This represents a
signicant departure from current practice which assumes some form of continuous
evolution. Extensive benchmarking is performed against the widely used EURUSD
currency pair. The non-stationary nature of the task is shown to result in a preference
for exploration over exploitation. Moreover, adopting a behavioural approach
ective than assuming incremental adaptation
on a continuous basis. From the application perspective, we demonstrate that use of
a validation partition and Stop-Loss (S/L) orders signicantly improves the performance
of a trading agent. In addition the task of co-evolving of technical indicators
(TI) and the decision trees (DT) for deploying trading agent is explicitly addressed.
The results of 27 experiments of 100 simulations each demonstrate that FXGP significantly
outperforms existing approaches and generates protable solutions with a
high probability.
Stephen Kelly
On developmental variation in hierarchical symbiotic policy
search : August 2012
A hierarchical symbiotic framework for policy search with genetic programming (GP) is
evaluated in two control-style temporal sequence learning domains. The symbiotic formulation
assumes each policy takes the form of a cooperative team between multiple symbiont programs.
An initial cycle of evolution establishes a diverse range of host behaviours with limited
capability. The second cycle uses these initial policies as meta actions for reuse by symbiont
programs. The relationship between development and ecology is explored by explicitly altering
the interaction between learning agent and environment at fixed points throughout evolution.
In both task domains, this developmental diversity significantly improves performance.
Specifically, ecologies designed to promote good specialists in the first developmental phase
and then good generalists result in much stronger organisms from the perspective of
generalization ability and efficiency. Conversely, when there is
no diversity in the interaction between task environment and
policy learner, the resulting hierarchy is not as robust or
general.
The relative contribution from each cycle of evolution in the
resulting hierarchical policies is measured from the perspective
of multi-level selection. These multi-level policies are shown to
be significantly better than the sum of contributing meta
actions.
Michal Lemczyk
Pareto-coevolutionary Genetic Programming Classifier: May 2006
Abstract: The conversion and extension of the Incremental Pareto-Coevolution Archive algorithm
(IPCA) into the domain of Genetic Programming classifier evolution is presented.
The coevolutionary aspect of the IPCA algorithm is utilized to simultaneously
evolve a subset of the training data that provides distinctions between candidate
classifiers. To remain practical, a method for limiting the sizes of the IPCA archives is required.
Several archive pruning basis are proposed and evaluated. Furthermore, methods for
consolidating the resultant pareto-front of classifiers into one label per testing data
point are presented and evaluated. The algorithm is compared against; a traditional GP classifier using
the entirety of
the training data for evaluation, in addition to GP classifiers which perform Stochastic,
Cycling, and Dynamic subset selection. Results indicate that the PGPC algorithm often outperforms the
subset selection
algorithms in terms of classification performance, and often outperforms the traditional
classifier while requiring roughly 1/128 of the wall-clock time.
Ashley George
Local versus Global Wrapper Functions in Genetic Programming:
April 2006
Abstract: Genetic Programming offers freedom in the definition of the cost function
that is unparalleled in the realm of supervised learning algorithms. However, this freedom
goes largely unexploited in previous work. Here, we revisit the design of fitness functions
for genetic programming by explicitly considering the contribution of the wrapper and cost
function. Withing the context of supervised learning, as applised to classification problems,
a clustering methodology is introduced using cost functions which encourage maximization of
separation between in and out of class exemplars. Through a series of empirical
investigations of the nature of these functions, we demonstrate that classifier performance
is much more dependable than previousl the case under the genetic programming paradigm. In
addition, we also observe solutions with lower complexity than typically returned by the
classically employed hits (or even sum square error) based cost functions.
Robert Redhead
Evaluation of Cluster Combination Functions for Mixture of
Experts: March 2005
Abstract: The Mixtures of Experts (MoE) model provides the basis for building modular
neural network solutions. In this work we are interested in methods for decomposing
the input before forwarding to the MoE architecture. By doing so we are able to
define the number of experts from the data itself. Specific schemes are shown to
be appropriate for regression and classification problems, where each appear to have
different preferences.
Robert Curry
Towards Efficient Training on Large Datasets for Genetic Programming:
September 2004
Abstract:
Genetic Programming (GP) has the potential to provide unique solutions to a wide range
of supervised learning problems. However, the technique suffers from a widely
acknowledged computational overhead. As a consequence, applications of GP are often
confined to datasets consisting of hundreds of training exemplars as opposed to tens of
thousands of exemplars, thus limiting the applicability of the approach. This work
proposes and thoroughly investigates three data sub-sampling algorithms that filter the
initial training dataset in parallel with the learning process. The motivation being to focus
the GP training on the most difficult or least recently visited exemplars. To do so, we
build a hierarchy of subset selections, thus matching the concept of a memory hierarchy.
Such an approach provides for the training of GP solutions to data sets with hundreds of
thousands of exemplars in tens of minutes whilst matching the classification accuracies of
more classical approaches.
Peter Lichodzijewski
Cascaded GP Models for Data Mining: September 2004
Abstract: The cascading CasGP architecture for incrementally building increasingly sophisticated
classifiers is presented. Concise solutions are evolved by using fixed-length
genetic programming, and training times remain acceptable through the use of the
RSS-DSS data filtering algorithm. The architecture provides an attractive alternative
approach to building arbitrarily complex solutions via genetic programming. The
effectiveness of the multi-layered, cascaded approach versus single classifiers is demonstrated.
Several fitness functions are evaluated, and the sum-squared error function
is recommended. To guide evolution at higher layers toward finding unique solutions,
fitness sharing is introduced. Despite mixed results, the approach is recommended
provided that a more robust sharing function is found. To further reduce computational
overhead, two mechanisms were investigated, however, both were found to
require further work. Additional analysis showed that CasGP produces more accurate
models than other GP-based approaches and that it remains competitive with
other machine learning algorithms.
Leigh Wetmore
Dynammic Subset Selection Applied to Self-Organizing Maps: April 2004
Abstract: The self-organizing map (SOM) is an unsupervised learning algorithm that attempts
to form a compact representation of a data set via a set of prototype vectors that
exist in the same space. Dynamic subset selection (DSS) is a genetic programming
(GP) based method of selecting the particularly difficult-to-learn patterns from a
data set, where difficulty is a GP-specific measure. In this work, the dynamic subset
selection self-organizing map (DSSSOM) is presented. It is the application of DSS to
the SOM, with modifications to both. It is able to handle very large data sets, and
has a DSS-based built-in stopping mechanism. The performance of the new algorithm
is measured over five data sets via an original implementation, and compared to the
SOM and other learning algorithms. Results show that the DSSSOM achieves a
performance on par with the SOM with training time reduced by a factor of up to
nearly five hundred.
H. Gunes Kayacik
Hierarchical Self Organizing Map Based IDS on KDD Benchmark: May 2003
Abstract: In this work, an architecture consisting entirely Self-Organizing Feature Maps is
developed
for network based intrusion detection. The principle interest is to analyze just how far such
an approach can be taken in practice. To do so, the KDD benchmark dataset from the
International Knowledge Discovery and Data Mining Tools Competition is employed. In
this work, no content based feature is utilized. Experiments are performed on two-level
and three-level hierarchies, training set biases and the contribution of features to intrusion
detection. Results show that a hierarchical SOM intrusion detection system is as good as
the other learning based approaches that use content based features.
Dong Song
A Linear Genetic Programming Approach to Intrusion Detection: March 2003
Abstract: Page-based Linear Genetic Programming (GP) is implemented with a new
two-layer Subset Selection scheme to address the two-class intrusion detection
classification problem on the KDD-99 benchmark dataset. By careful adjustment
of the relationship between subset layers, over fitting by individuals to specific
subsets is avoided. Unlike the current approaches to this benchmark, the learning
algorithm is also responsible for deriving useful temporal features. Following
evolution, decoding of a GP individual demonstrates that the solution is unique
and comparative to hand coded solutions found by experts. Standard, dynamic
and lexicographic fitness are implemented and compared.
In summary, this work represents a significant advance over previous
applications of GP in which evaluation is limited to relatively concise
benchmark datasets (typically less than a thousand patterns). This work details a
hierarchical extension to the Subset Selection scheme resulting in the successful
application of GP to a training dataset consisting of over 494,021 patterns.
Moreover, the computational requirements are less than decision tree solutions
applied to the same dataset.
Suihong Liang
Intelligent Packets for Dynamic Network Routing : April 2002
Abstract: A distributed GA (Genetic Algorithm) is designed for the packet switched network routing
problem under minimal information, i.e., without information exchange, every node only knows the
existence
of its neighboring nodes. The requirements of such a problem mean that intelligent packets are required
to
possess more intelligence than was the norm. To this end a distributed GA approach is developed and
benchmarked against the AntNet algorithm under similar information constraints. A profile of AntNet under
local and global information is developed with the proposed distributed GA clearly improving on the
AntNet
algorithm under local information constraints.