Machine Learning in R

Machine learning is a method of data analysis that automates analytical model building. You will learn how to build machine learning algorithms which learn from data, identify patterns, and make predictions. 

Introduction to Classification models

We will introduce the basic concepts of a Machine Learning Model by looking at a decision tree. Decision trees are a popular machine learning algorithm because they are powerful yet simple to understand. This module will use a decision tree to classify a binary variable.

One hot encoding for categorical data

One hot encoding for categorical data is an important step when preparing data for a classification model. In this section, understand what one hot encoding is, when it’s appropriate to use and how it’s implemented in R. 

Setting the seed in R

Setting the seed is an important part of running machine learning models in a way that makes the reproducible. This means random processes can be repeated exactly. 

Splitting data into train and test datasets

Splitting data into a dataset for training and a dataset for testing means any models built can be tested on data on which is wasn’t trained on. This means you can understand how well your model performs on new data. 

Training a decision tree in R

Once data is prepared and split, we are ready train a decision tree using our data. This module will go through how to fit this model, and understand the outout. Finally, we will use the decision tree to predict on new values from the test dataset. 

Introduction to AutoML

Automated machine learning (AutoML) is the process of automating the process of building machine learning models. It covers the complete pipeline from the raw dataset to the deployable machine learning model. This module will show you how to build automated machine learning models using the popular package h2o.

Coming soon

Sarah Blake

Instructor

Sarah is a Data scientist with experience of using R and R Shiny to build interactive dashboards in the public sector to provide evidence which informs policy decisions. She has led a project to deliver a dashboard to display and analyse international trade data and various macroeconomic indicators, based on a range of data sources.