NIMS Lab Evolutionary Computation Datasets

The following datasets are made available by graduates of the NIMS Lab for research purposes alone. They are also most definately developed under a *research* context and provided AS IS!

Streaming data

Individual Household Electric Power Consumption -- derived from the 9 attribute, 1-minute measurement dataset available from the UCI repository.
- Pre-processing is performed to construct 30 minute and 15 minute summaries expressing the amount of movement in 'a3' attribute in particular. The goal of the streaming algorithm is to 'predict' the movement at the next step in terms of a binary (up, down) or tinary tuple (up, equal, down)
- Dataset 1 -- Average of each of the 9 original attributes as estimated over non-overlapping windows of 30 minutes (base case).
- Dataset 2 -- open-high-low-close format characterizing movement over consecutive non-overlapping windows of 30 minutes (up / down label)
- Dataset 3 -- as per Dataset 2, but with (up / down / equal label)
- Dataset 4 -- as per Dataset 2, but for a 15 minute window.
- Dataset 5 -- as per Dataset 3, but for a 15 minute window.
  - Details of the dataset can be found in the accompanying IEEE WCCI 2016 paper, and a zip archive.
Non-stationary streaming classification task represents a set of artificial dataset constructed to explicitly embody drift and shift properties.
- Recent publications using this dataset include:
  - Khanchi, Heywood, Zincir-Heywood (2016) On the Impact of Class Imbalance in GP Streaming Classification with Label Budgets. EuroGP. LNCS 9594.
  - Vahdat, Morgan, McIntyre, Heywood, Zincir-Heywood (2015) Evolving GP classifiers for streaming data tasks with concept change and label budgets: A benchmarking study Chapter 18. Handbook of Genetic Programming Applications. Springer.
  - Vahdat, Morgan, McIntyre, Heywood, Zincir-Heywood (2015) Tapped delay lines for GP streaming data classification with label budgets. EuroGP. LNCS 9025.
  - Vahdat, Atwater, McIntyre, Heywood (2014) On the application of GP to streaming data classification tasks with label budgets. ACM GECCO (Workshop)