It is relatively easy for machine learning models to learn linear classification, but it is not enough. Linear classifiers create linear decision boundaries prone to noise and fail to fit well in a non-linear dataset.
Suppose we want to create a classifier to check whether a person is obese or not. The dataset contains the
This will work fine but let's take another example of cancer detection. The dataset may not be linearly separable, as shown in the figure below:
A linear decision boundary cannot be created. This is where kernel functions play their role.
Kernel functions transform a set of inputs to a higher dimension where the data is linearly separable. With the help of kernel functions, we can transform data into infinite dimensions.
For the problem of cancer detection, we can transform the data into three dimensions where a
There are several kernel functions available to use. Some of them are given in the table below:
Kernel Name | Equation |
Linear kernel | k(X, X') = X.X' |
Radial Basis Function (RBF) kernel | k(X, X') = exp( - (|| X - X' ||2 )/ 2l2) |
Polynomial kernel | k(X, X') = (X.X' + 1)e |
Kernels act like a feature map by mapping features of one space, say
A function
Condition 1 - The function
Condition 2 - The function must be positive semi-definite.
This is a pretty simple condition. Symmetric condition is given as:
For a given unique set of
Here,
These conditions can be summarized in an alternative way using the Gram matrix. Gram matrix is defined as:
If this matrix is positive semi-definite, then the function
Note: To learn more about semi-definite matrix, refer to this.
Though the word "kernel" is used in different contexts when it comes to machine learning, it holds great importance as its selection ultimately affects the performance of the model.
Free Resources