Colin Bellinger

Over the course of my career, I have had the opportunity to work on multiple inter-disciplinary projects. This page provides details regarding the work and research conducted over them.

Current projects

Understanding Imbalanced Deep Learning

AI Algorithms for Computer Aided Design

Observation Cost Sensitive Refinforcement Learning

DoMiNo

Validation and Refinement of the Work Assessment Triage Tool For Selecting Rehabilitation Interventions for Injured Workers

Fault Prediction in Time and Load Varying Motors

Previous projects

Anemia Classification in Paediatric Populations

Gamma-ray Spectral Classification for Radiation Monitoring (GRSC)

Comprehensive Test-Ban-Treaty (CTBT)

Understanding Imbalanced Deep Learning

Deep Learning, Class Imbalance, Limited Data, Resampling and Data Augmentation, XAI

C. Bellinger, National Research Council of Canada/li>
N. Japkowicz, American University
R. Corizzo, American University
P. Branco, University of Ottawa
B. Krawczyk, Virginia Commonwealth University
N. Chawla, University of Notre Dame
D. Dablain, University of Notre Dame
K Ghosh, University of Alberta

Overview

This project aims to understand how deep learning models are impacted by imbalanced training and develop new techniques for imbalanced deep learning.

Objective

Develop algorithms to amelorate the problem of training deep learning models from imbalanced data. This includes, but is not limited to, methods to improve model generalization, calibration and explanability.

Research Topics

Deep model generalization and calibration from imbalanced training data.
Representation learning from imbalanced training data.
XAI for Imbalanced data.
Resampling and data augmentation for imbalanced deep learning.
Decoupled training for imbalanced deep learning.

AI Algorithms for Computer Aided Design

Deep Learning, Reinforcement Learning, Scientific Design and Discovery

C. Bellinger, National Research Council of Canada/li>
I. Tamblyn, University of Ottawa
C. Beeler, University of Ottawa
M. Crowley, University of Waterloo
N. Chatti, University of Waterloo
S.G. Subramanian, University of Waterloo

Overview

This project aims to explore where and how reinforcement learning and be used to aid in the discovery of new materials. To this end, the project includes the developement of a simulated chemistry environment (the chemgym) that enable individual and collectives of reinforcement learning agents to learn to carry out chemistry experiments to achieve prespecified goals.

Objective

Research and develop reinforcement learning algorithms capable of efficiently learning to carry chemistry experiments for automated materials design and discovery.

Research Topics

Developement of the simulated chemistry laboratory.
Sample efficient and cost-sensitive reinforcement learning.
Hierarchical reinforcement learning for simulated chemistry.
Multi-agent reinforcement learning for simulated materials design.

Observation Cost Sensitive Refinforcement Learning

Deep Learning, Reinforcement Learning, Observation Costs

C. Bellinger, National Research Council of Canada/li>
I. Tamblyn, University of Ottawa
A. Drozdyuk, Carleton University
R. Coles, University of Victoria
C. Beeler, University of Ottawa
M. Crowley, University of Waterloo

Overview

Humans efficiently exploit state observational information as needed during decision making. More importantly, we are generally able to select the information we need when we need it. This enables us to reduce information costs and avoid information overload. Standard reinforecement learning algorithms, on the other hand, require potentially expensive state observational information at each time step. Moreover, they have no autonomy over the information that they are provided. This project aims to explore how reinforcement learning agents can take control of information aquisition in addition to standard action selection in order to reduce state observation costs while learning and exploiting a policy.

Objective

Research and develop reinforcement learning algorithms capable of learning optimal policies whilst reducting state observation costs.

Research Topics

Reinforcement learning with active observation policies.
Reinforcement learning with explicit observation costs.
Mixed observable reinforcement learning.

Data MIning and Neonatal Outcomes (DoMiNo)

Cancer, Adverse Birth Outcomes, Air Pollution, Chemical Exposure Data Mining, Association Rule Mining, Hypothesis Generation, Geo-Spatial Data

C. Bellinger, Computing Science, University of Alberta
O. Zaiane, Computing Science, University of Alberta
A.R. Osornio-Vargas, Department of Paediatrics, University of Alberta
S.M. Mohomed Jabbar, Computing Science, University of Alberta
C. Nielsen, Department of Paediatrics, University of Alberta
J.S. Lomelin, Department of Paediatrics, University of Alberta
O. Wine, Department of Paediatrics, University of Alberta

Overview

This project is designed to explore the potential relations between chemical/environmental influences and maternal/infant health: a major knowledge gap to date. Using a unique and novel method based on spatial data mining, we examine the relationships among co-location of sources of industrial toxic emissions, socio-economic index, and adverse birth outcomes (ABO) in Canada.

Objective

The objective of this research is to mine geo-spatial data in order to develop new hypotheses about the chemical and socio-economic factors related to adverse birth outcomes. The generated hypotheses are intended to help support domain research to identify novel and important new research areas which motivate new studies and grant applications. The primary machine learning focus is to generate hypotheses about multiple chemical interactions, which is challenging for traditional methods in epidemiology. In addition, we are interested in discovering, geo-spatial similarities and dissimilarities in terms of chemical combinations and outcomes. Such discoveries serve to motivate deeper domain research into why geo-spatially removed areas that are subject to similar air pollutants have different health outcomes.

Machine learning challenges

Development of association rule mining for geo-spatial data.

Algorithms sensitive to rare but impactful combinations.

Selection of metrics that the domain experts are comfortable with and confident in.
Evaluation of chemical combinations discovered by the algorithms that are not known in the current domain research.
Data cleaning and merger.

Research Topics

Develop a novel spatial data mining approach to analyze multiple variables that contribute to adverse birth outcomes.
Identify potential areas where there is non-trivial co-location/spatial association of adverse birth outcomes with multiple combinations of environmental and other known risk factors.
Identify potential patterns of interactions/correlations between chemical emissions, wind patterns, socio-economic and adverse birth outcomes, thereby providing insights for postulating new hypotheses in public health.

Validation and Refinement of the Work Assessment Triage Tool For Selecting Rehabilitation Interventions for Injured Workers

Medical Treatment, Physiotherapy, Diagnoses and Treatment, Decision Trees, Rule Learning, Treatment Recommender Systems

C. Bellinger, Computing Science, University of Alberta
O. Zaiane, Computing Science, University of Alberta
D. Gross, Department of Physical Therapy, University of Alberta

Overview

The Workers Compensation Board of Alberta (WCB) aims to support workers by helping them return to health and work as quickly as medically possible. As part of this, the physical condition and nature of the workers injuries are first assessed by a physiotherapists. Based on this assessment, the appropriate treatment is determined and prescribed.

Objective

This is the second iteration of this project. In the first iteration, a machine learning tool was developed using historic data from the WBC, which included information about the injury, the prescribed treatment and the outcome. Given an assessment of a newly injured worker, the tool was designed to suggest courses of action to the physiotherapist. In the current iteration of this study, we aim to further improve the care and rehabilitation of injured workers through validating and refining a clinical decision support tool used to create personalized treatment decisions for injured workers.

Machine learning challenges

Assessing the representativeness of the training data.

Dealing with incomplete coverage of the categories of injuries and treatments (rare sub-concepts).

Evaluating the system.

Research Topics

Conduct an external validation of the existing algorithm on a separate dataset containing information on WCB claimants.
Refine and update the algorithm with new potential classifiers and outcome variables.

Fault Prediction in Time and Load Varying Motors

Motor Failure and Servicing, Sensor Data, Non-Static Data, Static and Sequential Anomaly Detection, Failure Prediction, Class Imbalance

C. Bellinger, Computing Science, University of Alberta
O. Zaiane, Computing Science, University of Alberta
M. Lipsett, Mechanical Engineering, University of Alberta
M. Riazi, Computing Science, University of Alberta
Domain partners

Overview

In addition to planned maintenance, motors may also require unplanned servicing resulting from faster than expected wear, or internal or external damage. Unplanned services can lead to added costs, lost income and costumer dissatisfaction. Thus, it is important to quickly identify unexpected breakdowns in order to manage them in a timely fashion.

Objective

This project aims to use machine learning to predict pending failures in time and load varying motors. In particular, we aim to design an approach that can robustly identify failures sufficiently in advance to enable the interested parties to rectify them in time so as to minimizes costs and disruption.

Machine learning challenges

Modelling and predicting failures in time and load varying time series.

In machine learning, failure prediction has typically been studied in steady state systems.

Class imbalance and the challenge of acquiring a representative set of failures to train and test on.
Evaluating the system.

Research Topics

How to predict the future failure of systems based on currently and/or past functionality recorded as time series sensor data.

Previous projects

Anemia Classification in Paediatric Populations

Medical Treatment, Paediatrics, Anemia, Blood Data, Classification, Multi-Label Classification, Imbalanced Classification

C. Bellinger, Computing Science, University of Alberta
N. Japkowicz, School of Electrical Engineering and Computer Science, University of Ottawa (now American University)
A. Amid, Hospital for Sick Children
H. Viktor, School of Electrical Engineering and Computer Science, University of Ottawa

Overview

Beta-thalassemia minor (β-thal) and iron deficiency anemia (IDA) are the most common causes of anemia with small red blood cells (microcytic anemias) in paediatric populations. They are global health challenges that pose a significant burden on health care systems of countries where IDA and β-thal are common. Because IDA and β-thal required different treatments, the ability to accurately classify with an inexpensive test is essential. The standard practice is to utilize a linear equation, such as the Mentzer Index, to distinguish between the categories. However, the classification problem is likely not linear, and the Mentzer Index is unable to classify the co-occurrence of both IDA and β-thal in a patient.

Objective

The objective of this project is to study the ability of non-linear machine learning algorithms to improve the predictive accuracy over the two classes of anemia.

Machine learning challenges

Small, low-dimensional dataset that limits the ability to apply complex learning strategies
Multi-label prediction problem.
Class imbalance.

Research Topics

Multi-label classification for anemia prediction in paediatric populations [Bellinger, Amed, Viktor, Japkowicz and Drummond, 2015]
Imbalance in multi-label classification
A comparison of linear and non-linear algorithms for the anemia dataset

Gamma-ray Spectral Classification for Radiation Monitoring (GRSC)

Safety and Security, Environment and Health, Class Imbalance, One-Class Classification, Sensor Data, Noise Data, Complex Data, Synthetic Oversampling

C. Bellinger, Computing Science, University of Alberta
S. Sharma, School of Electrical Engineering and Computer Science, University of Ottawa (now Fluent Solutions Inc.)
N. Japkowicz, School of Electrical Engineering and Computer Science, University of Ottawa (now American University)
R. Burg, Radiation Protection Bureau, Health Canada
K. Ungar, Radiation Protection Bureau, Health Canada

Overview

Gamma-ray spectra are sampled with high frequency at key locations around Canada by the Radiation Protection Bureau. The Bureau analyzes these to monitor for potential radioactive threats to health, security and the environment. Any alarming spectra are further analyzed and the appropriate action is taken.

Objective

The potential risk posed by some forms and levels of radiation necessitates the ability to quickly and accurately identify radioactive events in the very early stage. This requires a high sampling frequency as well as skilled analysts. In order to reduce the number of spectra that must be analyzed by analysts, this project studies the ability of machine learning algorithms to identify and flag spectra that are most likely to be associated with some form of threat.

Machine learning challenges

Extreme class imbalance.
Data complexity due of high-dimensionality, noise, multi-modality and variance.
Within class imbalance in a one-class classification context.
Evaluating the system while balancing cost and misclassifications.

Research Topics

One-class classifier that are robust to complex training data:

Multi-class classifiers / two-tiered classifiers [Bellinger, Sharma, Japkowicz, Burg and Ungar, 2012], [Sharma, Bellinger, Japkowicz, 2012] (best paper)

[Sharma, Bellinger and Japkowicz, 2012]

De-noising in a domain appropriate manner [Barnab-Lortie, Bellinger and Japkowicz, 2014]

Active one-class classification [Barnab-Lortie, Bellinger and Japkowicz, 2015]
Synthetic oversampling high-dimensional spectral data [Bellinger, Japkowicz and Drummond, 2015] (best paper)

Comprehensive Test-Ban-Treaty (CTBT)

Safety and Security, Atmospheric Transport Modelling, Data Generation, Classification, One-Class Classification

C. Bellinger, Computing Science, University of Alberta
N. Japkowicz, School of Electrical Engineering and Computer Science, University of Ottawa (now American University)
S. Sharma, School of Electrical Engineering and Computer Science, University of Ottawa (now Fluent Solutions Inc.)
J. Oommen, School of Computer Science, Carleton University
R. Burg, Radiation Protection Bureau, Health Canada
K. Ungar, Radiation Protection Bureau, Health Canada

Overview

The CTBT is a United Nations treaty that aims to prevent the spread of nations with nuclear weapons. Nations that are signatories to the treaty agree not to develop nuclear weapons, and to support verification of the treaty within their borders. One form of verification is achieved with the monitoring of radio-xenon isotopes in the atmosphere. These are inert isotopes that may be detected after the testing of a nuclear weapon, even if the test was concealed underground.

Objective

The primary object of this project is the development of machine learning algorithms that can classify the occurrence of a clandestine nuclear test. In particular, this requires the ability to distinguish between radio-xenon, and isotopes resulting from its decay, emitted from a nuclear test, and those that may be present in the normal background.

Machine learning challenges

The main challenge relates to the complete absence of training instance sampled from the class of nuclear tests. This impacts our choice of classification algorithms because we must train on a single class. Furthermore, this impacts our ability to test these algorithms because we have no true nuclear test instances. Thus, we must find alternative means to assess and build confidence in these algorithms.

Research Topics

Evaluate existing one-class classification models, and potential new ones, for their ability to accurately detect clandestine nuclear tests while maintaining a low false positive rate. [Bellinger and Oommen, 2011], [Bellinger and Oommen 2012]
Research atmospheric transport models as a means of synthesizing clandestine nuclear tests in order to generate instances of varying complexity with which to train one-class classification models. [Bellinger and Oommen, 2010]
Perform feature engineering with a specific focus on the potential benefit of including meteorological variables into the feature space upon which the classifier is built. [Bellinger and Japkowicz, 2011]

My Research Projects