Supervised Learning in Machine Learning Updated 2020

Supervised Learning in Machine Learning

Supervised Learning is the method of machine learning where a machine learning model learns a function from input-output pairs. These examples are labeled that means in a dataset each input is already tagged to some output. Our ML model will learn these examples in the training phase and predict the output if a new input is given.

Consider the below example dataset which says whether to decide to play/not given atmosphere properties. Each row is called an Example. This dataset is labeled and has input-output pairs where inputs are Outlook, Temperature, Humidity, and Wind speed, the output is Play/No Play.

Outlook	Temperature	Humidity	Wind Speed	Play/No Play
Sunny	85	75	80	0
Overcast	80	61	60	1
Sunny	90	65	70	0
Rainy	74	83	100	1
Rainy	72	58	110	0

Dataset showing example which is labeled

The output variable is Play/No Play: 0 – No Play, 1- Play

Supervised Learning process:

Prepare the data: The first step in the supervised learning process is to prepare the data for the model. It includes removing unwanted records or filling missing data. After preparing the data we will split our dataset into two parts namely Train dataset and Test dataset.

Select an Algorithm: We have many supervised learning algorithms today and we can select any one of the algorithms in this step. Some popular ones are given below:

Logistic regression
K-Nearest Neighbors (KNN)
Naïve Bayes
Support Vector Machines (SVM)
Linear regression

Train a model: In this step, we will train our model using the selected algorithm with the train data.

Validation method: We will validate our train model in this step using cross-validation methods.

Predictions on test data: Now, our model has trained on train data. In this phase, we start predicting the output on test data.

Evaluate a model: Now, we have our model predicted output on test data and original output on test data. Evaluation of a model tells us how accurate our model is predicting the output. Methods like accuracy, confusion matrix, AUC-ROC curve, Root Mean Square Error (RMSE), etc. are used to evaluate our model based on the problem in hand.

In supervised learning, evaluation is performed by selecting different algorithms for the model, and whichever algorithm is giving a high performance on test data, we will use that algorithm for our final model.

Types of Supervised Learning: There are two types of Supervised Machine learning. They are:

What is Classification?

In the Classification model, the output of the given dataset is discrete which means the output belongs to a set of numbers/classes. Our model will predict one of the classes for the given input. If we see our example dataset above, the output is discrete and belongs to either 1 or 0 (can be represented as {1,0}). If we notice that the output is discrete then we apply Classification algorithms on top of it.

The type of classification where output belongs to either 1 or 0 is called Two-Class Classification or Binary Classification. If our output belongs to more than two discrete values/classes (either 0 or 1 or 2), then it is known as Multi-Class Classification.

Some of the classification algorithms are logistic regression, Naïve Bayes, KNN.

What is Regression?

In the Regression model, the output of the dataset will be continuous and belongs to any real number. Our model prediction will be a real number based on the input given. See the example of a dataset where the output is continuous values. The dataset is labeled and let us say we are predicting on output variable Pricing in $ where the inputs are Number of rooms and area of the house. In regression, the output can be any real value like values in the pricing column.

ID	Number of rooms	Area of house	Pricing in $
1	4	420	700
2	2	250	486
3	5	687	950
4	3	560	540

Table showing continuous values in the output

Some of the regression algorithms are Linear regression, Support Vector machines, Lasso regression. Some algorithms can be applicable for both classification and regression like Support Vector Machine and Neural networks.

Finally, the choice of picking a column as the output variable is purely problem-specific. If the selected column for the output variable has discrete values, we will proceed with designing a classification supervised model and if not, we will use the regression algorithm for our ML model.