Semi-supervised learning is an advancement in Machine Learning that uses a combination of labeled and unlabeled data for training models.
There used to be only two types of machine learning algorithms other than reinforcement learning, supervised and unsupervised. For supervised learning, a large amount of labeled data is required for training the model. On the other hand, unsupervised learning does not require labeled data – it learns patterns and trends from unlabeled data.
Both of these methods have a set of disadvantages. For supervised learning, it is difficult to find labeled data, so it mostly needs to be hand-engineered, which is a very tedious, time-expensive job. On the other hand, gathering data for unsupervised learning may be easier as it only has a limited number of applications.
To solve these problems, semi-supervised learning was introduced. It is the middle ground between supervised and unsupervised learning. The amount of labeled data is generally much less as compared to unlabeled data, but the presence of even a small amount of labeled data makes the model perform much better.
Semi-supervised learning is used in those cases where the labeled and unlabeled data have some sort of connection with each other, i.e., a relationship based on patterns and trends must exist. For this, some assumptions may be made. These are:
Continuity Assumption
It is assumed that those points that lie closer to each other have the same outputs. This assumption is also used in supervised learning so that the decision boundary is simplified.
Cluster Assumption
It may be assumed that the data can be divided into distinct clusters and that points that are present in the same cluster have the same output. This is similar to unsupervised learning, where data is separated based on patterns such as cluster centroid distance.
Manifold Assumption
Another assumption may be that the data can be modeled on a space that is much smaller than the entire input space. This assumption helps to avoid the problems risen when dealing with high dimensions of data. An example is that during speech recognition, where the input space that consists of all possible waves is much larger than the actual requirement.