Nuno Moniz

Nuno Moniz

Title: Prediction and Ranking of Highly Popular Web Content,

Supervisor: Luis Torgo

PhD in Computer Science, Faculty of Sciences, University of Porto

Nuno’s Thesis was awarded the 2nd place in the Fraunhofer Portugal Challange’2017

Abstract

This thesis addresses prediction and ranking tasks using web content data. The main objective is to improve the ability to accurately predict and rank recent and highly popular content, thus enabling a faster and more precise recommendation of such items. The main motivation relates to the profusion of online content, and the increasing demand of users concerning a fast and easy access to relevant content.

To fulfill these tasks an extensive review of previous work is carried out in order to define the state-of-the-art and to identify important research opportunities. As a result, three problems are identified and addressed in this thesis: (1) the lack of an interpretable and robust evaluation framework to correctly assess web content popularity prediction models focusing on highly popular web content; (2) issues concerning proposals of popularity prediction models and their ability to predict the rare cases of highly popular content; and (3) the need for recommendation frameworks concerning such items using multi-source data. For each of these problems novel solutions are proposed and extensively evaluated in comparison to existing work.

The first problem (1) concerns the evaluation methods commonly used in web content popularity prediction tasks. According to previous work, the popularity of web content is best described by a heavy-tail distribution. As such, at any given moment, most of the content under analysis has a low level of popularity, and a small set of cases has high levels of popularity. Standard evaluation metrics focus on the average behaviour of the data, assuming that each case is equally relevant. Given the predictive focus on highly popular content, it is argued that such assumption may lead to an over-estimation of the models’ predictive accuracy. Therefore, an evaluation framework is proposed, allowing for a robust interpretation of the prediction models’ ability to accurately forecast highly popular web content.

The second problem (2) is related to the fact that proposals concerning web content popular- ity prediction models are based on standard learning approaches. These are commonly biased towards capturing the dynamics of the majority of cases. Given the skewness of web content popularity data, this may lead to poor accuracy towards under-represented cases of highly popular items. An evaluation with a diverse set of such proposals is carried out, confirming their issues when learning to predict such items. Also, it is additionally confirmed that the use of standard evaluation metrics often presents an over-estimated ability to accurately predict the most popular items. Novel approaches are proposed for the prediction of web content popularity focusing on accuracy towards highly popular items.

The third and final problem (3) concerns the task of ranking, but also evaluating, web content by its predicted popularity. Although the task of ranking may be trivial in most cases, when considering scenarios with multiple sources of data such task is considerably difficult. Notwithstanding, ranking tasks and their evaluation in single-source scenarios are not exempt of issues concerning the ability to account for highly popular content. The ability to rank web content based on models’ predictions is discussed, given an extensive evaluation in both single-source and multi-source scenarios.

Each of these problems is evaluated using real-world data concerning online news feeds from both official and social media sources. This type of web content provides a difficult setting for the early and accurate prediction of highly popular items, given their short lifespan. Experimental evaluations show that the approaches proposed in this thesis concerning the prediction and ranking of highly popular content obtained encouraging results demonstrating a significant advantage in comparison to state-of-the-art work.

Current Position:

Nuno is currently a Post Doc at LIAAD/INESC Tec and invited Auxiliar Professor at the Department of Computer Science of the Faculty of Sciences of the University of Porto. You may get more information at Nuno’s Home Page

Related