Supervised Learning in Machine Learning

Supervised Learning in Machine Learning

Supervised Learning is the method of machine learning where a machine learning model learns a function from input-output pairs. These examples are labeled that means in a dataset each input is already tagged to some output. Our ML model will learn these examples in the training phase and predict the output if a new input is given.

Consider the below example dataset which says whether to decide to play/not given atmosphere properties. Each row is called an Example. This dataset is labeled and has input-output pairs where inputs are Outlook, Temperature, Humidity, and Wind speed, the output is Play/No Play.

Outlook Temperature Humidity Wind Speed Play/No Play
Sunny 85 75 80 0
Overcast 80 61 60 1
Sunny 90 65 70 0
Rainy 74 83 100 1
Rainy 72 58 110 0

Dataset showing example which is labeled

The output variable is Play/No Play:   0 – No Play, 1- Play

Supervised Learning process:

supervisedLearning-1

Prepare the data: The first step in the supervised learning process is to prepare the data for the model. It includes removing unwanted records or filling missing data. After preparing the data we will split our dataset into two parts namely Train dataset and Test dataset.

Select an Algorithm: We have many supervised learning algorithms today and we can select any one of the algorithms in this step. Some popular ones are given below:

  1. Logistic regression
  2. K-Nearest Neighbors (KNN)
  3. Naïve Bayes
  4. Support Vector Machines (SVM)
  5. Linear regression

Train a model: In this step, we will train our model using the selected algorithm with the train data.

Validation method: We will validate our train model in this step using cross-validation methods.

Predictions on test data: Now, our model has trained on train data. In this phase, we start predicting the output on test data.

Evaluate a model: Now, we have our model predicted output on test data and original output on test data. Evaluation of a model tells us how accurate our model is predicting the output. Methods like accuracy, confusion matrix, AUC-ROC curve, Root Mean Square Error (RMSE), etc. are used to evaluate our model based on the problem in hand.

In supervised learning, evaluation is performed by selecting different algorithms for the model, and whichever algorithm is giving a high performance on test data, we will use that algorithm for our final model.

Types of Supervised Learning: There are two types of Supervised Machine learning. They are:

supervivsedLearning-2

 

What is Classification?

In the Classification model, the output of the given dataset is discrete which means the output belongs to a set of numbers/classes. Our model will predict one of the classes for the given input.  If we see our example dataset above, the output is discrete and belongs to either 1 or 0 (can be represented as {1,0}). If we notice that the output is discrete then we apply Classification algorithms on top of it.

The type of classification where output belongs to either 1 or 0 is called Two-Class Classification or Binary Classification. If our output belongs to more than two discrete values/classes (either 0 or 1 or 2), then it is known as Multi-Class Classification.

Some of the classification algorithms are logistic regression, Naïve Bayes, KNN.

What is Regression?

In the Regression model, the output of the dataset will be continuous and belongs to any real number. Our model prediction will be a real number based on the input given. See the example of a dataset where the output is continuous values. The dataset is labeled and let us say we are predicting on output variable Pricing in $ where the inputs are Number of rooms and area of the house. In regression, the output can be any real value like values in the pricing column.

ID Number of rooms Area of house Pricing in $
1 4 420 700
2 2 250 486
3 5 687 950
4 3 560 540

Table showing continuous values in the output

Some of the regression algorithms are Linear regression, Support Vector machines, Lasso regression. Some algorithms can be applicable for both classification and regression like Support Vector Machine and Neural networks.

Finally, the choice of picking a column as the output variable is purely problem-specific. If the selected column for the output variable has discrete values, we will proceed with designing a classification supervised model and if not, we will use the regression algorithm for our ML model.