Data Mining using R
by Luis Torgo
Location: New York University
Editions: 2014, 2015, 2016, 2017, 2018, 2019
Course Web Page: contact Luis Torgo
Course Description
The goal of this course is to provide hands-on experience on key data mining technologies using one particular tool – the R environment.
R is an established technology that has witnessed widespread acceptance both in academia and industry. Several surveys clearly indicate R as one of the tools of choice for the great majority of data scientists. There are many factors contributing for this acceptance but clearly these include the price (free), being open source (trustworthy software that can be easily inspected/checked for flaws), the extension of available methods (exponential growth of the set of available methods for different application areas), and the available support from the community (an extremely large community of knowledgeable experts proving top-notch support for free).
In this course we will illustrate the use of R for several key data mining processes. This illustration will be driven by concrete case studies that we will “solve” using R. This course can be regarded as a hands-on complement of the Data Mining for Business Analytics course.
Learning Outcomes
-
Understanding your data. Exploratory analysis of data frequently provides key insights to data properties and problems that can have a big impact on posterior mining steps and may help in mapping business problems into data mining tasks. We will provide practical illustrations of methods for summarizing, visualizing and preparing your data for model construction.
-
Master frequently used modeling techniques. Data can be modeled in many different ways. The outcomes of these models can provide useful information for decision makers. We will address several concrete modeling tasks with frequently used techniques. We will learn how to obtain and apply these models in R.
-
Correctly assess the performance of models. Performance assessment is a key step for taking advantage of the results of data mining models. Being able to carry out this task in a reliable way is of key importance to make sure future deployment of data mining pays off.
-
Easily report and deliver results of data mining. The outcome of data mining often needs to be reported, communicated and made available to different kinds of people (e.g. decision makers, key personnel of other departments lacking knowledge of data mining, etc.). We will learn how specific tools available in R can boost our productivity in this type of tasks.
-
Address several important data mining tasks. We will illustrate the use of R on several key application domains such as fraud detection, time series forecasting or text mining. Several specific techniques for addressing these tasks will be explained and used in solving concrete case studies.