NIMS Lab Evolutionary Computation Datasets
The following datasets are made available by graduates of the NIMS
Lab for research purposes alone. They are also most definately developed under a *research* context and provided AS IS!
Streaming data
- Individual Household Electric Power Consumption -- derived
from the 9 attribute, 1-minute
measurement dataset available from the UCI
repository.
- Pre-processing is performed to construct 30 minute and 15 minute summaries expressing the
amount of movement in 'a3' attribute in particular. The goal of the streaming algorithm is
to 'predict' the movement at the next step in terms of a binary (up, down) or tinary tuple
(up, equal, down)
- Dataset 1 -- Average of each of the 9 original attributes as estimated over
non-overlapping windows of 30 minutes (base case).
- Dataset 2 -- open-high-low-close format characterizing movement over consecutive
non-overlapping windows of 30 minutes (up / down label)
- Dataset 3 -- as per Dataset 2, but with (up / down / equal label)
- Dataset 4 -- as per Dataset 2, but for a 15 minute window.
- Dataset 5 -- as per Dataset 3, but for a 15 minute window.
- Non-stationary
streaming classification task represents a set of artificial dataset constructed to
explicitly embody drift and shift properties.
- Recent publications using this dataset include:
- Khanchi, Heywood, Zincir-Heywood (2016)
On the Impact of
Class Imbalance in GP Streaming Classification with Label Budgets.
EuroGP.
LNCS 9594.
- Vahdat, Morgan, McIntyre, Heywood, Zincir-Heywood (2015) Evolving GP classifiers for
streaming data tasks with concept change and label budgets: A benchmarking study
Chapter 18. Handbook of Genetic Programming Applications. Springer.
- Vahdat, Morgan, McIntyre, Heywood,
Zincir-Heywood (2015) Tapped
delay lines for GP streaming data classification with label
budgets.
EuroGP. LNCS 9025.
- Vahdat, Atwater, McIntyre, Heywood (2014) On
the application of GP to streaming data classification tasks with label budgets. ACM GECCO
(Workshop)