TensorFlow Gradient Tape

Anika Tabassum Era Feb 15, 2024 TensorFlow TensorFlow Gradient

TensorFlow has played a vital role in being a consistent framework and a compatible work zone for the dynamic machine learning and deep learning field.

One of the most hyped domains is the neural network, and its working principle can be demonstrated easily with the help of the gradient descent algorithm. And TensorFlow assures almost all the libraries required to enhance this experience.

According to the basic concept, the neural network is how the human brain cell operates, decides, recognizes, and functions. Our processing unit can only memorize the information fed at a time.

Rather it follows a learning rate and revises back to back based on dealing with situations. In this process, there is an important term called the cost function or loss function or partial derivative or the basic term gradient.

To visualize the concept, we have tried to demonstrate a common concept, the “Linear Regression” model. If you are not a newbie to machine learning, then we assume you are already familiar with it.

What this model does is extract a relationship between the independent and dependent variables with the help of a regression line.

While establishing the regression line, it is required to know the sum of square errors (SSE), which is the gradient for this particular case. We expect the SSE to be the minimum when we train our model.

The constant process only avails this of training. Here, we calculate the output from our specified equation and reinitialize the driver values with the gradient x learning rate.

The learning rate ensures stability so that our loss function doesn’t show abrupt consequences.

We will update the base values depending on the gradients after initiating a base value. This backward pass is necessary to check if the model was trained properly.

And the gradient tape is just that component that successively stores the cost function along with the updated inputs and outputs. So we get a complete picture of the whole procedure with the detail of the values.

Let’s visualize the method with an example.

Use of Gradient Tape in TensorFlow

In this section, we will not do a theoretical overview. Rather we will follow up on the practical codes and try to demonstrate how we rely on the gradient tape.

So, let’s jump on to the code bases. We are using Google Colab for our instance.

Initially, we will set the library imports and generate a random dataset for the drive. We will also plot a scatter plot to see how the variables behave.

# Necessary Imports
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import os

# Generating random data
x = np.random.uniform(0.0, 1.0, (200))
y = 0.3 + 0.5 * x + np.random.normal(0.0, 0.3, len(x))
# x
# y

plt.scatter(x, y)
plt.xlabel("Independent")
plt.ylabel("dependent")
plt.show()

Output:

use of gradient tape in tensorflow 1

The next code fences will include the regression model and the loss functions declaration. The important part to focus on is that the regression model outputs the predicted value for the dependent variable y, and the equation goes like this:

$$ \hat y = a + b * x $$

Here, x is the independent variable, aka the model’s input. Mainly, we will try to see if we have the minimum error between the given y and predicted y^.

The loss function includes the sum of square error (SSE):

$$ SSE = \sum(y_i-\hat y_i)^2 $$

# The following equation for the model is ---->  y" = a + b * x
# Linear Regression
class regression:
    def __init__(self):
        self.a = tf.Variable(initial_value=0, dtype=tf.float32)
        self.b = tf.Variable(initial_value=0, dtype=tf.float32)

    def __call__(self, x):
        x = tf.convert_to_tensor(x, dtype=tf.float32)
        y_est = tf.add(self.a, tf.multiply(self.b, x))
        return y_est

model = regression()

# The loss function/ Cost function/ Optimization function (sum of square error)
def loss_func(y_true, y_pred):
    # both y_true and y_pred are tensors
    sse = tf.reduce_sum(tf.square(tf.subtract(y_true, y_pred)))
    return sse

Next, we will define the training section, including the gradient tape. The equation to update the variables a and b are:

$$ a = a_i - \nabla(sse)|a *l_r $$

$$ b = b_i - \nabla(sse)|b *l_r $$

# Gradient Descent ----> a = ai - sse * lr (learning rate)
#                                      b = bi - sse * lr


def train(model, inputs, outputs, lr):
    # convert outputs to tensor
    y_true = tf.convert_to_tensor(outputs, dtype=tf.float32)

    # Gradient Tape
    with tf.GradientTape() as g:
        y_pred = model(inputs)
        current_loss = loss_func(y_true, y_pred)

    # slopes/ partal_differentiation/ gradients
    da, db = g.gradient(current_loss, [model.a, model.b])

    # update values

    model.a.assign_sub(da * lr)
    model.b.assign_sub(db * lr)

The next segment is to set the regression line. We will set some epochs for better learning.

We have examined the saturated point and the necessary epoch value for which the loss function is the minimum. Let’s plot the epoch and cost values to see how the output occurred from the whole training section.

# Fitting
def plotting(x, y):
    plt.scatter(x, y)  # scatter
    plt.plot(x, model(x), "r--")

model = regression()
a_val = []
b_val = []
cost_val = []

# epochs
epochs = 70

# learning rate
lr = 0.001

for e in range(epochs):
    a_val.append(model.a)
    b_val.append(model.b)

    # prediction values and errors
    y_pred = model(x)
    cost_value = loss_func(y, y_pred)
    cost_val.append(cost_value)

    # Train
    train(model, x, y, lr)
    plotting(x, y)
    plt.show()
    print("Epoch: %d, Loss: %0.2f" % (e, cost_value))

The output scatter-plot from the last epoch:

use of gradient tape in tensorflow 2

plt.plot(cost_val)
plt.xlabel("epochs")
plt.ylabel("cost values")

Output:

use of gradient tape in tensorflow 3

So, it can be said that the cost values reduced drastically in the epoch range of 0-10. And when you run all the epochs, you will notice with the values that the saturated point is near the 64-65 epoch.

Technically, the gradient tape helped collect all the corresponding values, and thus the visualization is more intuitive. You can also preview the whole code in this link.

Enjoying our tutorials? Subscribe to DelftStack on YouTube to support us in creating more high-quality video guides. Subscribe

Author: Anika Tabassum Era

Era is an observer who loves cracking the ambiguos barriers. An AI enthusiast to help others with the drive and develop a stronger community.

LinkedIn Facebook