Gradient Checking

This is a method to check the accuracy of the gradients calculated during the backpropagation in a neural network. It uses numerical approximation to estimate the gradient of the cost function and compares it with the gradient computed by backpropagation. For example take $f (a) = a^3$ and $f'(a) = 3$ . We can use the following formula to approximate the derivative:

f'(a) \approx \frac{f(a + \epsilon) - f(a - \epsilon)}{2\epsilon}

This gives us an approximate derivative, which should be close to our backpropagation result if we've done everything right. It's helpful for finding the errors in your backpropagation implementation. I'ts slower than gradient descent, so It's only really used for debugging. Implementation of this is very simple:

First take all the parameters $\mathbf{w}^{[1]}, \mathbf{b}^{[1]}, ..., \mathbf{w}^{[L]}, \mathbf{b}^{[L]}$ and reshape them into one vector $\theta$ .
The cost function will be $J(\theta)$
Then do the same for the derivatives $d\mathbf{w}^{[1]}, d\mathbf{b}^{[1]}, ..., d\mathbf{w}^{[L]}, d\mathbf{b}^{[L]}$ into one big vector ( $d\theta$ )
Now, for each element of $\theta$ , tweak it a little by adding and subtracting a tiny value $\epsilon$ , and calculate the cost function for each tweaked $\theta$ .
Use the tweaked cost functions to estimate the gradient like this:

eps = 1e-7  # a tiny number
for i in range(len(theta)):
    d_theta_approx[i] = (J(theta1, ..., theta[i] + eps) - J(theta1, ..., theta[i] - eps)) / (2 * eps)

Compare the approximated gradient vector $d\theta\_{approx}$ with the computed gradient vector $d\theta$ from backpropagation using the below formula, where $|| x ||_2$ denotes the Euclidean norm of vector $x$ .:

\frac{||d\theta_{approx} - d\theta||_2}{||d\theta_{approx}||_2 + ||d\theta||_2}

$< 10^{-7}$ : is excellent, meaning your backpropagation is likely correct.
$10^{-5}$ : is possibly okay, but be cautious and inspect further.
$\geq 10^{-3}$ : bad, indicates a likely error in backpropagation.

Important Notes on Gradient Checking

Gradient checking is a slow process, so don't use it during training—only for debugging.
When you find a discrepancy, investigate the components of your gradient vectors to locate potential errors.
Remember to include regularization terms in your cost function when performing gradient checking.

\frac{\lambda}{2m} \sum_{l=1}^{L} \lVert w^{[l]} \rVert_F^2

Gradient checking won't work with dropout since it introduces randomness in your cost function. You should turn off dropout (set keep_prob i.e. $p$ to 1.0) to check gradients.
Try gradient checking after some training, not just at the start. Some bugs only surface when the parameters are far from their initial values.

By using gradient checking, you ensure the reliability of the training process of your neural network, allowing you to trust that it's learning correctly.

The Vanishing/Exploding Gradient Problem Batch, Stochastic, and Mini-Batch Gradient Descent