What is reverse mode differentiation?

In deep learning, we use automatic differentiation to compute the derivatives of a function or a neural network. There are two types of automatic differentiation:

In this Answer, we will discuss the latter.

Reverse Mode Differentiation

In reverse mode differentiation, we compute the derivative of the computational graph by starting from the final output towards the input variables with the help of the chain rule.

Let us understand the concept of reverse mode differentiation with the help of an example.

Example

Consider a functionf(x,y)=xy2+3xyf(x,y)=xy^2+3xy where xx and yy are independent variables.

The trace graph of this function is given below:

Trace graph
Trace graph

In reverse mode differentiation, we find the derivative of the parent node with respect to the children nodes. Now look at the below-mentioned points, where we will find the derivatives at every node.

  • We start from DD and compute the derivative.

    fD=1\frac{\partial f} {\partial D} =1CC

  • Now, we will find the derivative at CC.

    fC=fD.DC\frac{\partial f} {\partial C} =\frac{\partial f} {\partial D}.\frac{\partial D} {\partial C}

    fC=1.(B+C)C\frac{\partial f} {\partial C} =1.\frac{\partial (B+C) } {\partial C}

    fC=(1)(1)\frac{\partial f} {\partial C} =(1)(1)

    fC=1\frac{\partial f} {\partial C} =1

  • Similarly, we find the derivative of BB.

    fB=fD.DB\frac{\partial f} {\partial B} =\frac{\partial f} {\partial D}.\frac{\partial D} {\partial B}

    fB=1\frac{\partial f} {\partial B} =1

  • Finding the derivative at the node V1V_1 is tedious, so by using the chain rule, we have:

    fA=fC.CA+fBBA \frac{\partial f} {\partial A} =\frac{\partial f} {\partial C}.\frac{\partial C} {\partial A}+\frac{\partial f} {\partial B}\frac{\partial B} {\partial A}

    fA=fC.3AA+fBX2AA \frac{\partial f} {\partial A} =\frac{\partial f} {\partial C}.\frac{\partial 3A} {\partial A}+\frac{\partial f} {\partial B}\frac{\partial X_2A} {\partial A}

    fA=1(3)+X2\frac{\partial f} {\partial A} =1(3)+X_2

    fA=X2+3\frac{\partial f} {\partial A} =X_2+3

  • Now we will find the derivatives at input nodes.

    fX1=fA.AX1\frac{\partial f} {\partial X_1} =\frac{\partial f} {\partial A}.\frac{\partial A} {\partial X_1}

    fX1=(X2+3).X1X2X1\frac{\partial f} {\partial X_1} =(X_2+3).\frac{\partial X_1X_2} {\partial X_1}

    fX1=X2(X2+3)\frac{\partial f} {\partial X_1} = X_2(X_2+3)

    fX1=y(y+3)\frac{\partial f} {\partial X_1} = y(y+3)

  • Similarly, for node X2X_2.

    fX2=fA.AX2+fBBX2\frac{\partial f} {\partial X_2} =\frac{\partial f} {\partial A}.\frac{\partial A} {\partial X_2}+\frac{\partial f} {\partial B}\frac{\partial B} {\partial X_2}

    fX2=x(y+3)+1.A.X2X2\frac{\partial f} {\partial X_2} =x(y+3)+1.\frac{\partial A.X_2} {\partial X_2}

    fX2=x(y+3)+A\frac{\partial f} {\partial X_2} =x(y+3)+A

    fX2=2xy+3x\frac{\partial f} {\partial X_2} =2xy+3x

So, we have calculated the derivatives of all nodes using reverse mode differentiation.

Conclusion

In this Answer, we learned about reverse mode differentiation with the help of an example with its trace graph. We calculated the derivative from the final node towards the input nodes. In essence, reverse mode differentiation is used in machine learning for the backpropagation process and enables efficient computations to find the gradients.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved