Unsupervised Learning is the method of machine learning where a machine learning model tries to understand hidden patterns from our dataset. The dataset here is not labeled and we need not supervise the model whereas in supervised learning the data is labeled. Our ML model will draw inferences from data and groups similar to data from our dataset. The most common algorithms in Unsupervised learning are Clustering, Anomaly Detection, and Neural Networks. These algorithms are computationally complex, takes more time to execute and less accurate.
For example, let us consider we have a baby in our home and we have shown pictures of some dogs and cats to him. After seeing these pictures, the baby starts recognizing the similar ones from the group of pictures shown and can identify dogs and cats by looking into some features like two eyes, two ears, four legs, tail, etc. In this case, we have not taught the baby about the dogs and cats, but we simply showed him pictures. This is called unsupervised learning.
Consider the example dataset of bank account transactions of different customers of a bank. Here, we do not have a labeled feature in the below dataset. Hence the unsupervised learning algorithm will find patterns in the data. The features here are customer name, account number, mode of payment, account balance, and zip code.
Customer Name | Account number | Mode of payment | Balance in $ | Zip code |
Alex | 168389 | UPI | 8000 | 34567 |
Clay | 236858 | Net Banking | 6467 | 32456 |
Jessica | 127859 | UPI | 3654 | 34528 |
Baker | 469289 | ATM | 100 | 43458 |
Clay | 236858 | Net Banking | 1535 | 32456 |
Jessica | 127859 | ATM | 234 | 34528 |
Dataset showing transactions of customers in the bank
We can pose the above problem suitable for a supervised model by manually creating one more feature saying whether this transaction is fraud or not.
Unsupervised Learning process:
- Input the data: Input the raw data in this step.
- Interpretation: Interpret the data and we do not know what is the output and the data is not labeled.
- Algorithm: Any unsupervised learning algorithm is used.
- Processing: Input data is fed to the algorithm and it will process the data.
- Output: The output is in the form of groups/clusters and each group/cluster will have similar features.
Different types of Unsupervised Learning:
Like supervised learning, Unsupervised learning is divided into Association problems and clustering problems. Clustering is one of the most used and important algorithms in unsupervised learning. There are different types of clustering available like Partitioning, Agglomerative, overlapping, and probabilistic clustering. Clustering algorithms are used to find patterns especially when the data is Uncategorized. Some of the clustering algorithms are K-means Clustering, K-Nearest Neighbors (KNN), Hierarchical clustering, Singular Value Decomposition (SVD), and Principle Component Analysis (PCA).
Association problem-solving in unsupervised learning is a rule-based algorithm and for example, if a person purchases an item A will likely to purchase item B. We would discover the relationships between the features/variables in the given database.
Advantages of Unsupervised learning compared with Supervised learning:
- All kinds of unknown patterns of data can be found by using Unsupervised Learning.
- Collecting unsupervised data is easier since supervised learning data should be labeled and there is a need for manual effort for labeling.
- Less complexity and takes place in real-time.
Disadvantages of Unsupervised learning compared with Supervised learning:
- Computationally complex takes more time to execute
- Less accurate compared to Supervised learning.
- More time is required to analyze the data patterns and results cannot be ascertained.
- The patterns which we would get are poor approximations when compared to supervised learning.
Real-world applications:
- Image segmentation
- Social Network analysis
- Used in Genetics
- Automatic tagging of photos on Facebook
- Stock Market analysis
It is difficult to say which learning to be used when. Both Supervised and unsupervised learning had its pros and cons. When it comes to real-world applications, we cannot simply conclude that these algorithms are alternatives. It all depends on the problems we solve. These learning algorithms play a crucial role in data-driven decision making which in result leads to good business results.