How to implement gradient descent in MATLAB

Gradient descent is a popular optimization algorithm extensively applied in machine learning and numerical optimization. Its operation involves calculating the cost function's gradient concerning the parameters and then updating them by taking steps proportionate to the negative gradient. This process continues until convergenceConvergence is where the cost function is at its minimum. or a predefined number of iterations is reached.

A depiction of the gradient descent of a cost function
A depiction of the gradient descent of a cost function

In this Answer, we will learn the concept of gradient descent and provide a step-by-step guide to implementing it in MATLAB.

1. Defining the cost function

Before implementing gradient descent, the initial step involves defining the cost function that requires minimization. The selection of the cost function is dependent on the particular problem being addressed. For instance, in a basic linear regression scenario, the cost function could be the mean squared error (MSE) calculated between the predicted values and the true target values.

2. Initializing parameters

Next, we need to initialize the parameters of our model with some initial values. For linear regression, we have one weight per input feature, including the bias term. We can initialize the weights randomly or with zeros.

3. Setting hyperparameters

Hyperparameters are parameters that control the behavior of the optimization algorithm. In gradient descent, the key hyperparameter is the learning rate, which determines the step size taken in each iteration. Additionally, specify the number of iterations to perform.

4. Performing gradient descent iterations

Now, we proceed to execute the gradient descent iterations. During each iteration, we calculate the gradient of the cost function concerning the parameters, and subsequently, we update the parameter values by moving in the direction opposite to the gradient. We repeat this process until convergence or reaching the maximum number of iterations.

Let's look into the following MATLAB code that computes gradient descent for linear regression:

% Example data for linear regression
X = [1, 1; 1, 2; 1, 3; 1, 4]; % input features (including the bias term)
y = [2; 3; 4; 5]; % target values
% Hyperparameters
learning_rate = 0.01;
num_iterations = 1000;
% Initialize parameters
theta = zeros(size(X, 2), 1); %column vector of weights
% Gradient Descent
for iter = 1:num_iterations
% Calculate predictions
predictions = X * theta;
% Calculate the error
error = predictions - y;
% Calculate the gradient
gradient = X' * error;
% Update parameters
theta = theta - learning_rate * gradient;
end
% Display the learned parameters
disp('Learned parameters:');
disp(theta);

Code explanation

  • Lines 2–3: These lines define the input features X and target values y for a linear regression problem. The input features X are a 4×24\times2 matrix, where each row represents a training example, and the first column represents the bias term. The target values y are a 4×14\times1 column vector.

  • Lines 6–7: These lines specify the hyperparameters for the gradient descent algorithm. The learning_rate determines the step size taken in each iteration, and num_iterations determines the maximum number of iterations to perform.

  • Line 10: This line initializes the parameter theta of the linear regression model. The size of theta is determined by the number of columns in X. It corresponds to the number of features (including the bias term). It creates a column vector of zeros.

  • Lines 13–25: This loop performs the gradient descent iterations. It loops for num_iterations times. In each iteration:

    • It calculates the predictions by multiplying the input features X with the current parameter values theta.

    • It calculates the error by subtracting the target values y from the predictions.

    • It calculates the gradient by taking the dot product of the transpose of X and the error.

    • It updates the parameters theta by subtracting the learning rate multiplied by the gradient.

  • Lines 28–29: These lines display the learned parameters theta after finishing the gradient descent iterations.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved