Predictive Analytics for Spatio-temporal Data

Abstract

Increasingly widespread sensor networks collect sequences of numerical observations at fixed locations over time, and generate vast amounts of geo-referenced time series. Decision-makers need to make well-informed, data-driven choices, but the volume of data makes it difficult for human experts to glean actionable information. By forecasting the future behaviour of spatio-temporal data, Predictive Analytics (PA) can guide their course of action in a wide range of domains, from transportation to environmental monitoring. Standard PA faces issues when applied to spatio-temporal data. Most methods assume observations to be independent and identically distributed (i.i.d.), but the autocorrelation in spatio-temporal data breaks this assumption. The implicit spatial and temporal dependencies between observations also cause problems for standard performance estimation used to evaluate these solutions. Our empirical study of over 15 performance estimation methods showed that standard cross-validation (CV) led to over-optimistic estimates in spatio-temporal settings. We recommend that practitioners evaluate their models using out-of-sample (OOS) methods that respect the temporal order of the data, or CV variants that block observations along time. Predictive methods explicitly designed for spatio-temporal data can leverage data dependencies to their advantage and improve predictions. We proposed a pre-processing strategy based on a previous proposal to extract features incorporating information from past and neighbouring observations. We tested our proposed method on 17 real-world variables and found that it improved prediction for about half of the tested data and learning model combinations. In some applications, the target variable follows an imbalanced distribution where the extreme and rare values represent cases of heightened importance (e.g., a spike in air pollution). These values can be particularly difficult to predict, as most standard PA methods optimize for the average case. Once again, the observations’ spatio-temporal context can work to our advantage. We proposed new strategies that improved extreme value prediction by introducing a bias into typically random resampling approaches

Publication
PhD Thesis. Faculty of Sciences, University of Porto, Porto, Portugal

Supervisors

Mariana Oliveira
Mariana Oliveira
Post-doctoral Fellow

Mariana Oliveira is a post-doctoral fellow at Dalhousie University, Faculty of Computer Science. Her research focuses on Machine Learning and Data Mining.