Gradient descent is a popular optimization algorithm extensively applied in machine learning and numerical optimization. Its operation involves calculating the cost function's gradient concerning the parameters and then updating them by taking steps proportionate to the negative gradient. This process continues until
In this Answer, we will learn the concept of gradient descent and provide a step-by-step guide to implementing it in MATLAB.
Before implementing gradient descent, the initial step involves defining the cost function that requires minimization. The selection of the cost function is dependent on the particular problem being addressed. For instance, in a basic linear regression scenario, the cost function could be the mean squared error (MSE) calculated between the predicted values and the true target values.
Next, we need to initialize the parameters of our model with some initial values. For linear regression, we have one weight per input feature, including the bias term. We can initialize the weights randomly or with zeros.
Hyperparameters are parameters that control the behavior of the optimization algorithm. In gradient descent, the key hyperparameter is the learning rate, which determines the step size taken in each iteration. Additionally, specify the number of iterations to perform.
Now, we proceed to execute the gradient descent iterations. During each iteration, we calculate the gradient of the cost function concerning the parameters, and subsequently, we update the parameter values by moving in the direction opposite to the gradient. We repeat this process until convergence or reaching the maximum number of iterations.
Let's look into the following MATLAB code that computes gradient descent for linear regression:
% Example data for linear regressionX = [1, 1; 1, 2; 1, 3; 1, 4]; % input features (including the bias term)y = [2; 3; 4; 5]; % target values% Hyperparameterslearning_rate = 0.01;num_iterations = 1000;% Initialize parameterstheta = zeros(size(X, 2), 1); %column vector of weights% Gradient Descentfor iter = 1:num_iterations% Calculate predictionspredictions = X * theta;% Calculate the errorerror = predictions - y;% Calculate the gradientgradient = X' * error;% Update parameterstheta = theta - learning_rate * gradient;end% Display the learned parametersdisp('Learned parameters:');disp(theta);
Lines 2–3: These lines define the input features X
and target values y
for a linear regression problem. The input features X
are a y
are a
Lines 6–7: These lines specify the hyperparameters for the gradient descent algorithm. The learning_rate
determines the step size taken in each iteration, and num_iterations
determines the maximum number of iterations to perform.
Line 10: This line initializes the parameter theta
of the linear regression model. The size of theta
is determined by the number of columns in X
. It corresponds to the number of features (including the bias term). It creates a column vector of zeros.
Lines 13–25: This loop performs the gradient descent iterations. It loops for num_iterations
times. In each iteration:
It calculates the predictions by multiplying the input features X
with the current parameter values theta
.
It calculates the error by subtracting the target values y
from the predictions.
It calculates the gradient by taking the dot product of the transpose of X
and the error.
It updates the parameters theta
by subtracting the learning rate multiplied by the gradient.
Lines 28–29: These lines display the learned parameters theta
after finishing the gradient descent iterations.
Free Resources