The KNNImputer
belongs to the scikit-learn
module in Python.
Scikit-learn
is generally used for machine learning.
The KNNImputer
is used to fill in missing values in a dataset using the k-Nearest Neighbors method.
k-Nearest Neighbors algorithm is used for classification and prediction problems.
The KNNImputer
predicts the value of a missing value by observing trends in related columns. It then chooses the best fit value based on the k-Nearest Neighbors algorithm.
The illustration below show how KNNImputer
works in scikit-learn
:
The KNNImputer
class is defined as follows:
class sklearn.impute.KNNImputer(*, missing_values=nan, n_neighbors=5, weights='uniform', metric='nan_euclidean', copy=True, add_indicator=False)
The KNNImputer
class takes in the following parameters:
Parameters | Purpose |
---|---|
missing_values |
All instances of missing_values will be imputed. Values include int , float , str , np.nan or None . By default: np.nan |
n_neighbors |
Number of neighbors used for prediction. By default: 5 |
weights |
Weight function used for prediction. Vales include uniform , distance , or callable . By default: uniform . |
metric |
Distance metric for searching neighbors. Used in k-nearest neighbors algorithm. Value include nan_euclidean or callable . By default: nan_euclidean |
copy |
Takes in a bool value. If True, a copy of the data will be created. If False, imputation will be done in-place. By default: True |
add_indicator |
Takes in a bool value. If True, a MissingIndicator transform will stack onto the output of the imputer’s transform. By default: False |
The KNNImputer
class has several methods:
Method | Purpose |
---|---|
fit(X) |
Fit the imputer on X. |
fit_transform(X) |
Fit to data, then transform it. |
get_param() |
Get parameters for this estimator. |
set_params(**params) |
Set parameters for the estimator |
transform(X) |
Impute all missing values of X |
Simple imputation can work using the fit_transform
method only.
The following example shows how we can use the KNNImputer
in scikit-learn:
import numpy as np # Importing numpy to create an arrayfrom sklearn.impute import KNNImputer# Creating array with missing valuesX = [[1, 2, np.nan], [3, 6, 12], [np.nan, 12, 24], [2, 4, 16]]print("Original array: ", X)imputer = KNNImputer(n_neighbors=2) # Creating a KNNImputerarray = imputer.fit_transform(X) # Imputing dataprint("Updated array: ", array)
Free Resources