Paula Branco

Paula Branco

Assistant Professor, U. Ottawa

University of Ottawa

Paula Branco is currently an Assistant Professor at University of Ottawa.

You may get more information at Paula’s Home Page

MSc Thesis

Title: Re-sampling Approaches for Regression Tasks under Imbalanced Domains

Supervisor: Luis Torgo; Co-supervisor: Rita P. Ribeiro, University of Porto, Portugal

MSc in Computer Science, FCUP-U.Porto

Finished: Sep/2014

Abstract

Many real world domains, such as meteorological and financial, involve obtaining predictive models that should be particularly accurate in a specific sub-range of the domain of the target variable. Frequently, these values are poorly represented in the available data set. In this case, we face a challenge usually known as the problem of imbalanced domains.

The existence of few examples that match the user specific preferences creates important problems at different levels. One of these levels is related with the unsuitability of the existing performance assessment metrics. Another level is the need for approaches that are able to force the algorithms to focus on these rare situations. Both aspects are studied in this thesis.

Considering adequate metrics for this problem type is essential. We start by reviewing the existing performance assessment metrics for imbalanced domains and propose a new formulation specifically for regression tasks, which we then use in the experimental evaluation of different methods for handling these problems.

We then address the problem of regression tasks under imbalanced data distribution using re-sampling methods. An extensive survey of the existing approaches both in classification and regression is presented. Among all the existing types of techniques, re-sampling methods are the most studied for classification tasks. These methods are extremely versatile. In effect, re-sampling approaches simply manipulate the given training set changing the examples distribution. This way, they allow the use of any standard learning system. Still, no effort has been made in this field for regression tasks. In this thesis, we propose three new re-sampling methods to address the problem of imbalanced data distribution for regression tasks.

We have carried out an extensive experimental evaluation of the proposed methods on 18 data sets using a large set of learning systems. Results provide clear evidence of the advantages of using the proposed re-sampling approaches for this type of problems.

Interests

  • Artificial Intelligence
  • Machine Learning
  • Imbalanced Domains
  • Outlier Detection
  • Anomaly Detection
  • Cybersecurity

Education

  • PhD in Computer Science, 2018

    University of Porto