Deepan Shankar was a MSc student at Faculty of Computer Science of Dalhousie University. His research focused on anticipating anomalies in time series data.
Title: Forecasting algae blooms in aquaculture using mussels' openings data
Supervisor: Luis Torgo
MSc in Computer Science
Time series data consists of a series of measurements collected over a period of time. This type of data is very relevant in several domains, including healthcare, manufacturing, finance, environment, and many more. For these domains it is frequently of key importance to be able to predict the future values of these time series. Activity monitoring is a related task where time series data is used as input signals to some events whose occurrence is supposed to depend on the values of these series. In these tasks these events are typically of critical importance to the end users and the goal is to be able to anticipate them with sufficient lead time. Due to the uncertainty of the future, forecasting and anticipating these scenarios could help prevent or mitigate any hazardous activities. In this thesis we address one of such applications - the anticipation of algae blooms in aquaculture industries. More specifically, we propose a method for anticipating algae blooms based on measurements of mussels' valve openings that domain experts think can be used as bio-sentinels of the blooms. In this thesis we use machine learning models to address this predictive task and obtain models that can predict future algal bloom events based on the micro closures of the mussels. We focus our goal on predicting the presence of the algae Alexandrium Tamarense in the water environment. Due to the rarity of these events, sampling procedures were used to balance the distribution of the target variable to facilitate the task of the learning algorithms. Overall, the experimental comparisons we have carried out have shown that we were able to obtain very good results, particularly in terms of being able to signal a high percentage of the blooms although with some false alarms. Our results have also shown the advantage of adding sampling procedures to overcome the imbalanced distribution of our target variable.