Selected Completed Doctoral Thesis

Ali Vahdat

Symbiotic evolutionary subspace clustering : Nov 2012
Abstract: Application domains with large attribute spaces, such as genomics and text analysis, necessitate clustering algorithms with more sophistication than traditional clustering algorithms. More sophisticated approaches are required to cope with the large dimensionality and cardinality of these data sets. Subspace clustering, a generalization of traditional clustering, identies the attribute support for each cluster as well as the location and number of clusters. In the most general case, attributes associated with each cluster could be unique. The proposed algorithm, Symbiotic Evolutionary Subspace Clustering (S-ESC) borrows from `symbiosis' in the sense that each clustering solution is dened in terms of a host (a single member of the host population) and a number of coevolved cluster centroids (or symbionts in an independent symbiont population). Symbionts dene clusters and therefore attribute subspaces, whereas hosts dene sets of clusters to constitute a non-degenerate solution. The symbiotic representation of S-ESC is the key to making it scalable to high-dimensional data sets, while an integrated subsampling process makes it scalable to tasks with a large number of data items. A bi-objective evolutionary method is proposed to identify the unique attribute support of each cluster while detecting its data instances. Benchmarking is performed against a well-known test suite of subspace clustering data sets with four well-known comparator algorithms from both the full-dimensional and subspace clustering literature: EM, MINECLUS, PROCLUS, STATPC and a generic genetic algorithm-based subspace clustering. Performance of the S-ESC algorithm was found to be robust across a wide cross-section of properties with a common parameterization utilized throughout. This was not the case for the comparator algorithms. Speci- cally, performance could be sensitive to a particular data distribution or parameter sweeps might be necessary to provide comparable performance. A comparison is also made relative to a non-symbiotic genetic algorithm. In this case each individual represents the set of clusters comprising a subspace cluster solution. Benchmarking indicates that the proposed symbiotic framework can be demonstrated to be superior once again. The S-ESC code and data sets are publicly available.

Peter Lichodzijewski

A Symbiotic Bid-Based Framework for Problem Decomposition using Genetic Programming: Feb 2011
Abstract: (para 1 of 3) This thesis investigates the use of symbiosis as an evolutionary metaphor for problem decomposition using Genetic Programming. It begins by drawing a connection between lateral problem decomposition, in which peers with similar capabilities coordinate their actions, and vertical problem decomposition, whereby solution subcomponents are organized into increasingly complex units of organization. Furthermore, the two types of problem decomposition are associated respectively with context learning and layered learning. The thesis then proposes the Symbiotic Bid-Based framework modeled after a three-staged process of symbiosis abstracted from biological evolution. As such, it is argued, the approach has the capacity for both types of problem decomposition.

Gunes Kayacik

Can the best defense be a good offense? Evolving (mimicry) attacks for detector vulnerability testing under a 'black-box' assumption: 2009 (co-supervised with Prof. Nur Zincir-Heywood)
Abstract: (para 1 of 3) This thesis proposes a 'black-box' approach for automating attack generation by way of Evolutionary Computation. The proposed 'black-box' approach employs just the anomaly rate or detection feedback from the detector. Assuming a 'black-box' access in vulnerability testing presents a scenario different from a 'white-box' access assumption, since the attacker does not possess sufficient knowledge to constrain the scope of the attack. As such, this thesis contributes by providing a 'black-box' vulnerability testing tool for identifying detector weaknesses and aiding detector research in designing detectors which are robust against evasion attacks.

Andrew R. McIntyre

Novelty Detection + Coevolution = Automatic Problem Decomposition: A framework for scalable Genetic Programming Classifiers: Nov 2007
Abstract: A novel approach to the classi cation of large and unbalanced multi-class data sets is presented where the widely acknowledged issues of scalability, solution transparency, and problem decomposition are addressed simultaneously within the context of the Genetic Programming (GP) paradigm. A cooperative coevolutionary training environment that employs multi-objective evaluation provides the basis for problem decomposition and reduced solution complexity. Scalability is achieved through a Pareto competitive coevolutionary framework, allowing the system to be readily applied to large data sets without recourse to hardware-speci c speedups. A key departure from the canonical GP approach to classi cation involves expressing the output of GP in terms of a non-binary, local membership function (Gaussian), where it is no longer necessary for an expression to represent an entire class. Decomposition is then achieved through reformulating the classi cation problem as one of cluster consistency, where individuals learn to associate subsets of training exemplars with each cluster. Classi cation problems are now solved by several specialist classi ers as opposed to a single `super' individual. Finally, although multi-objective methods have been reported previously for GP classi cation domains, we explicitly formulate the objectives for cooperative behavior. Without this the user is left to choose a single individual as the overall solution from a front of solutions. This work is able to utilize the entire front of solutions without recourse to heuristics for selecting one individual over another or duplicating behaviors between di erent classi ers. Extensive benchmarking is performed against related frameworks for classi cation including Genetic Programming, Neural Networks, and deterministic methods. In contrast to classi ers evolved using competitive coevolution alone, we demonstrate the ability of the proposed coevolutionary model to provide a non-overlapping decomposition or association between learners and exemplars, while returning statistically signi cant improvements in classi er performance. In the case of the Neural Network methods, benchmarking is conducted against the more challenging second order neural learning algorithm of conjugate gradient optimization (previous comparisons limit Neural Network training to rst order methods). The ensuing comparison indicated that most data sets bene t from the proposed algorithm, which remains competitive even against the well-established deterministic algorithms, such as C4.5.

Garnett C. Wilson

Probabilistic Adaptive Mapping Developmental Genetic Programming: March 2007
Abstract: Developmental Genetic Programming (DGP) algorithms explicitly enable the search space for a problem to be divided into genotypes and corresponding phenotypes. The two search spaces are often connected with a genotype-phenotype mapping (GPM) intended to model the biological genetic code, where current implementations of this concept involve evolution of the mappings along with evolution of the genotype solutions. This work presents the Probabilistic Adaptive Mapping DGP (PAM DGP) algorithm, a new developmental implementation that provides research contributions in the areas of GPMs and coevolution. The algorithm component of PAM DGP is demonstrated to overcome coevolutionary performance problems as identified and empirically benchmarked against the latest competing Adaptive Mapping algorithm with both algorithms using the same (non-redundant) mapping encoding process. Having established that PAM DGP provides a superior algorithmic framework given equivalent mapping and genotype structures for the individuals, a new adaptive redundant mapping is incorporated into PAM DGP for further performance enhancement and closer adherence to developmental modeling of the biological code. PAM DGP with two mapping types is then compared to the competing Adaptive Mapping algorithm and Traditional GP with respect to three regression benchmarks. PAM DGP using redundant mappings is then applied to two medical classification domains, where PAM DGP with redundant encodings is found to provide better classifier performance than the alternative algorithms. PAM DGP with redundant mappings is also given the task of learning three sequences of increasing recursion order given a function set consisting of general (not implicitly recursive) machine-language instructions; where it is found to more efficiently learn second and third order recursive Fibonacci functions than the related developmental systems and Traditional GP. PAM DGP using redundant encoding is also demonstrated to produce the semantically highest quality solutions for all three recursive functions considered (Factorial, second and third order Fibonacci). PAM DGP is shown for regression, medical classification, and recursive problems to have produced its solutions by evolving redundant mappings to emphasize appropriate members within relevant subsets of the problem's €˜original function set.

Samuel Howse

NUMMSQUARED 2006a0 Explained, Including a new Well-Founded Functional Foundation for Logic, Mathematics and Computer Science: October 2006
Abstract: Set theory is the standard foundation for mathematics, but often does not include rules of reduction for function calls. Therefore, for computer science, the untyped lambda calculus or type theory is usually preferred. The untyped lambda calculus (and several improvements on it) make functions fundamental, but suffer from non-terminating reductions and have partially non-classical logics. Type theory is a good foundation for logic, mathematics and computer science, except that, by making both types and functions fundamental, it is more complex than either set theory or the untyped lambda calculus. This document proposes a new foundational formal language called NummSquared that makes only functions fundamental, while simultameously ensuring that reduction terminates, having a classical logic, and attempting to follow set theory as much as possible. NummSquared builds on earlier works by John von Neumann in 1925 and Roger Bishop Jones in 1998 that have perhaps not received sufficient attention in computer science. A soundness theorem for NummSquared is proved. Usual set theory, the work of Jones, and NummSquared are all well-founded. NummSquared improves upon the works of von Neumann and Jones by having reduction and proof, by supporting computation and reflection, and by having an interpreter called NsGo (work in progress) so the language can be practically used. NummSquared is variable-free. For enhanced reliability, NsGo is an F#/C# .NET assembly that is mostly automatically extracted from a program of the Coq proof assistant. As a possible step toward making formal methods appealing to a wider audience, NummSquared minimizes constraints on the logician, mathematician or programmer. Because of coercion, there are no types, and functions are defined and called without proof, yet reduction terminates. NummSquared supports proofs as desired, but not required.