本文节选自吴恩达老师《深度学习专项课程》编程作业,在此表示感谢。

课程链接:https://www.deeplearning.ai/deep-learning-specialization/

目录

1) How does gradient checking work?

2) 1-dimensional gradient checking

3) N-dimensional gradient checking(掌握)


# Packages
import numpy as np
from testCases import *
from gc_utils import sigmoid, relu, dictionary_to_vector, vector_to_dictionary, gradients_to_vector

1) How does gradient checking work?

Backpropagation computes the gradients where denotes the parameters of the model. ? is computed using forward propagation and your loss function.

Because forward propagation is relatively easy to implement, you're confident you got that right, and so you're almost 100% sure that you're computing the cost ? correctly. Thus, you can use your code for computing ?to verify the code for computing .

Let's look back at the definition of a derivative (or gradient):

If you're not familiar with the "" notation, it's just a way of saying "when ? is really really small."

We know the following:

  •  is what you want to make sure you're computing correctly.
  • You can compute(in the case that ? is a real number), since you're confident your implementation for ?is correct.

2) 1-dimensional gradient checking

Consider a 1D linear function ?(?)=??. The model contains only a single real-valued parameter ?θ, and takes ?x as input.

You will implement code to compute ?(.) and its derivative . You will then use gradient checking to make sure your derivative computation for ?Jis correct.

The diagram above shows the key computation steps: First start with ?, then evaluate the function ?(?) ("forward propagation"). Then compute the derivative ("backward propagation").

Exercise: implement "forward propagation" and "backward propagation" for this simple function. I.e., compute both ?(.)("forward propagation") and its derivative with respect to ? ("backward propagation"), in two separate functions.

def forward_propagation(x, theta):"""Implement the linear forward propagation (compute J) presented in Figure 1 (J(theta) = theta * x)Arguments:x -- a real-valued inputtheta -- our parameter, a real number as wellReturns:J -- the value of function J, computed using the formula J(theta) = theta * x"""J = theta * x return Jx, theta = 2, 4
J = forward_propagation(x, theta)
print ("J = " + str(J))

Exercise: Now, implement the backward propagation step (derivative computation) of Figure 1. That is, compute the derivative of ?(?)=?? with respect to ?. To save you from doing the calculus, you should get .

def backward_propagation(x, theta):"""Computes the derivative of J with respect to theta (see Figure 1).Arguments:x -- a real-valued inputtheta -- our parameter, a real number as wellReturns:dtheta -- the gradient of the cost with respect to theta"""dtheta = xreturn dthetax, theta = 2, 4
dtheta = backward_propagation(x, theta)
print ("dtheta = " + str(dtheta))

Instructions:

  • First compute "gradapprox" using the formula above (1) and a small value of ?ε. Here are the Steps to follow:

Then compute the gradient using backward propagation, and store the result in a variable "grad"
- Finally, compute the relative difference between "gradapprox" and the "grad" using the following formula:

You will need 3 Steps to compute this formula:
   - 1'. compute the numerator using np.linalg.norm(...)
   - 2'. compute the denominator. You will need to call np.linalg.norm(...) twice.
   - 3'. divide them.

If this difference is small (say less than ,you can be quite confident that you have computed your gradient correctly. Otherwise, there may be a mistake in the gradient computation.

def gradient_check(x, theta, epsilon = 1e-7):"""Implement the backward propagation presented in Figure 1.Arguments:x -- a real-valued inputtheta -- our parameter, a real number as wellepsilon -- tiny shift to the input to compute approximated gradient with formula(1)Returns:difference -- difference (2) between the approximated gradient and the backward propagation gradient"""# Compute gradapprox using left side of formula (1). epsilon is small enough, you don't need to worry about the limit.thetaplus = theta + epsilonthetaminux = theta - epsilonJ_plus= forward_propagation(x, thetaplus)J_minus = forward_propagation(x, thetaminux)gradapprox = (J_plus - J_minus) / (2*epsilon)# Check if gradapprox is close enough to the output of backward_propagation()grad = backward_propagation(x, theta)numerator = np.linalg.norm(grad - gradapprox)denominator = np.linalg.norm(grad) + np.linalg.norm(gradapprox)difference = numerator / denominatorif difference < 1e-7:print ("The gradient is correct!")else:print ("The gradient is wrong!")return difference
x, theta = 2, 4
difference = gradient_check(x, theta)
print("difference = " + str(difference))

3) N-dimensional gradient checking(掌握)

The following figure describes the forward and backward propagation of your fraud detection model.

def forward_propagation_n(X, Y, parameters):"""Implements the forward propagation (and computes the cost) presented in Figure 3.Arguments:X -- training set for m examplesY -- labels for m examples parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2", "W3", "b3":W1 -- weight matrix of shape (5, 4)b1 -- bias vector of shape (5, 1)W2 -- weight matrix of shape (3, 5)b2 -- bias vector of shape (3, 1)W3 -- weight matrix of shape (1, 3)b3 -- bias vector of shape (1, 1)Returns:cost -- the cost function (logistic cost for one example)"""# retrieve parametersm = X.shape[1]W1 = parameters["W1"]b1 = parameters["b1"]W2 = parameters["W2"]b2 = parameters["b2"]W3 = parameters["W3"]b3 = parameters["b3"]# LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SIGMOIDZ1 = np.dot(W1, X) + b1A1 = relu(Z1)Z2 = np.dot(W2, A1) + b2A2 = relu(Z2)Z3 = np.dot(W3, A2) + b3A3 = sigmoid(Z3)# Costlogprobs = np.multiply(-np.log(A3),Y) + np.multiply(-np.log(1 - A3), 1 - Y)cost = 1./m * np.sum(logprobs)cache = (Z1, A1, W1, b1, Z2, A2, W2, b2, Z3, A3, W3, b3)return cost, cache
def backward_propagation_n(X, Y, cache):"""Implement the backward propagation presented in figure 2.Arguments:X -- input datapoint, of shape (input size, 1)Y -- true "label"cache -- cache output from forward_propagation_n()Returns:gradients -- A dictionary with the gradients of the cost with respect to each parameter, activation and pre-activation variables."""m = X.shape[1](Z1, A1, W1, b1, Z2, A2, W2, b2, Z3, A3, W3, b3) = cachedZ3 = A3 - YdW3 = 1./m * np.dot(dZ3, A2.T)db3 = 1./m * np.sum(dZ3, axis=1, keepdims = True)dA2 = np.dot(W3.T, dZ3)dZ2 = np.multiply(dA2, np.int64(A2 > 0))dW2 = 1./m * np.dot(dZ2, A1.T)db2 = 1./m * np.sum(dZ2, axis=1, keepdims = True)dA1 = np.dot(W2.T, dZ2)dZ1 = np.multiply(dA1, np.int64(A1 > 0))dW1 = 1./m * np.dot(dZ1, X.T)db1 = 1./m * np.sum(dZ1, axis=1, keepdims = True)gradients = {"dZ3": dZ3, "dW3": dW3, "db3": db3,"dA2": dA2, "dZ2": dZ2, "dW2": dW2, "db2": db2,"dA1": dA1, "dZ1": dZ1, "dW1": dW1, "db1": db1}return gradients

How does gradient checking work?.

As in 1) and 2), you want to compare "gradapprox" to the gradient computed by backpropagation. The formula is still:

However, ? is not a scalar anymore. It is a dictionary called "parameters". We implemented a function "dictionary_to_vector()" for you. It converts the "parameters" dictionary into a vector called "values", obtained by reshaping all parameters (W1, b1, W2, b2, W3, b3) into vectors and concatenating them.

The inverse function is "vector_to_dictionary" which outputs back the "parameters" dictionary.

We have also converted the "gradients" dictionary into a vector "grad" using gradients_to_vector(). You don't need to worry about that.

Exercise: Implement gradient_check_n().

Instructions: Here is pseudo-code that will help you implement the gradient check.

For each i in num_parameters:

To compute J_plus[i]:

  1. Set  to np.copy(parameters_values)
  2. Set to
  3. Calculate using to `forward_propagation_n(x, y, vector_to_dictionary(``))`.

To compute J_minus[i]: do the same thing with

Compute 

Thus, you get a vector gradapprox, where gradapprox[i] is an approximation of the gradient with respect to `parameter_values[i]`. You can now compare this gradapprox vector to the gradients vector from backpropagation. Just like for the 1D case (Steps 1', 2', 3'),

def gradient_check_n(parameters, gradients, X, Y, epsilon = 1e-7):"""Checks if backward_propagation_n computes correctly the gradient of the cost output by forward_propagation_nArguments:parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2", "W3", "b3":grad -- output of backward_propagation_n, contains gradients of the cost with respect to the parameters. x -- input datapoint, of shape (input size, 1)y -- true "label"epsilon -- tiny shift to the input to compute approximated gradient with formula(1)Returns:difference -- difference (2) between the approximated gradient and the backward propagation gradient"""# Set-up variablesparameters_values, _ = dictionary_to_vector(parameters)grad = gradients_to_vector(gradients)num_parameters = parameters_values.shape[0]J_plus = np.zeros((num_parameters, 1))J_minus = np.zeros((num_parameters, 1))gradapprox = np.zeros((num_parameters, 1))# Compute gradapproxfor i in range(num_parameters):# Compute J_plus[i]. Inputs: "parameters_values, epsilon". Output = "J_plus[i]".# "_" is used because the function you have to outputs two parameters but we only care about the first onethetaplus = np.copy(parameters_values)thetaplus[i][0] += epsilonJ_plus[i], _ = forward_propagation_n(X, Y, vector_to_dictionary(thetaplus))# Compute J_minus[i]. Inputs: "parameters_values, epsilon". Output = "J_minus[i]".thetaminus = np.copy(parameters_values)                                     # Step 1thetaminus[i][0] -= epsilon                               # Step 2        J_minus[i], _ = forward_propagation_n(X, Y, vector_to_dictionary(thetaminus))     # Step 3gradapprox[i] = (J_plus[i] - J_minus[i]) / (2 * epsilon)numerator = np.linalg.norm(grad - gradapprox)denominator = np.linalg.norm(grad) + np.linalg.norm(gradapprox)defference = numerator / denominator### END CODE HERE ###if difference > 1e-7:print ("\033[93m" + "There is a mistake in the backward propagation! difference = " + str(difference) + "\033[0m")else:print ("\033[92m" + "Your backward propagation works perfectly fine! difference = " + str(difference) + "\033[0m")return difference

Note

  • Gradient Checking is slow! Approximating the gradient with is computationally costly. For this reason, we don't run gradient checking at every iteration during training. Just a few times to check if the gradient is correct.
  • Gradient Checking, at least as we've presented it, doesn't work with dropout. You would usually run the gradient check algorithm without dropout to make sure your backprop is correct, then add dropout.

*What you should remember from this notebook**: - Gradient checking verifies closeness between the gradients from backpropagation and the numerical approximation of the gradient (computed using forward propagation). - Gradient checking is slow, so we don't run it in every iteration of training. You would usually run it only to make sure your code is correct, then turn it off and use backprop for the actual learning process.

8.深度学习练习:Gradient Checking相关推荐

  1. 吴恩达深度学习课程deeplearning.ai课程作业:Class 2 Week 1 3.Gradient Checking

    吴恩达deeplearning.ai课程作业,自己写的答案. 补充说明: 1. 评论中总有人问为什么直接复制这些notebook运行不了?请不要直接复制粘贴,不可能运行通过的,这个只是notebook ...

  2. 深度学习 Deep Learning UFLDL 最新Tutorial 学习笔记 4:Debugging: Gradient Checking

    1 Gradient Checking 说明 前面我们已经实现了Linear Regression和Logistic Regression.关键在于代价函数Cost Function和其梯度Gradi ...

  3. 吴恩达深度学习2.1练习_Improving Deep Neural Networks_Gradient Checking

    转载自吴恩达老师深度学习练习notebook Gradient Checking Welcome to the final assignment for this week! In this assi ...

  4. 深度学习优化函数详解(5)-- Nesterov accelerated gradient (NAG) 优化算法

    深度学习优化函数详解系列目录 深度学习优化函数详解(0)– 线性回归问题 深度学习优化函数详解(1)– Gradient Descent 梯度下降法 深度学习优化函数详解(2)– SGD 随机梯度下降 ...

  5. 【转载】深度学习数学基础(二)~随机梯度下降(Stochastic Gradient Descent, SGD)

    Source: 作者:Evan 链接:https://www.zhihu.com/question/264189719/answer/291167114 来源:知乎 著作权归作者所有.商业转载请联系作 ...

  6. 深度强化学习-Policy Gradient基本实现

    全文共2543个字,2张图,预计阅读时间15分钟. 基于值的强化学习算法的基本思想是根据当前的状态,计算采取每个动作的价值,然后根据价值贪心的选择动作.如果我们省略中间的步骤,即直接根据当前的状态来选 ...

  7. 深度学习100问之深入理解Vanishing/Exploding Gradient(梯度消失/爆炸)

    这几天正在看梯度消失/爆炸,在深度学习的理论中梯度消失/爆炸也是极其重要的,所以就抽出一段时间认真地研究了一下梯度消失/爆炸的原理,以下为参考网上的几篇文章总结得出的. 本文分为四个部分:第一部分主要 ...

  8. 深度学习笔记(一)—— 计算梯度[Compute Gradient]

      这是深度学习笔记第一篇,完整的笔记目录可以点击这里查看.      有两种方法来计算梯度:一种是计算速度慢,近似的,但很简单的方法(数值梯度),另一种是计算速度快,精确的,但更容易出错的方法,需要 ...

  9. 深度学习优化函数详解(5)-- Nesterov accelerated gradient (NAG)

    深度学习优化函数详解系列目录 本系列课程代码,欢迎star: https://github.com/tsycnh/mlbasic 深度学习优化函数详解(0)-- 线性回归问题 深度学习优化函数详解(1 ...

最新文章

  1. 使用asm工具让移动设备投影到pc上
  2. ASP.net实现邮件发送
  3. Power Platform之Power Automate新增RPA功能
  4. 【C语言】数据结构C语言版 实验4 栈与字符串
  5. error “Device supports x86, but APK only supports armeabi-v7a”
  6. HP刀片带外管理系统OA各功能实例示范
  7. Go语言爱好者周刊:第 131 期 — 这道题你做对了吗?
  8. STM32开发环境安装
  9. vue3 组件naiveui报错: Extraneous non-props attributes (class) were passed to component but could not be
  10. VMware vRealize Automation 8.6 下载 - 现代基础架构自动化
  11. 《途客圈创业记:不疯魔,不成活》一一2.7 愿景和使命
  12. 《AcFun 的视频架构演化实践》阅读有感
  13. Windows mobile 客户解决方案成功案例
  14. 中国移动,联通,电信
  15. 最全的硬盘修复专题帖2(转贴)
  16. 小米的airdots 两个蓝牙耳机串联配对方法
  17. 网状模型的概念,举出三个网状模型的实例
  18. [Android/安卓]Google登录接口之Api接入
  19. 5.Struts2_Action 概述
  20. linux traceroute命令参数及用法详解--linux跟踪路由命令

热门文章

  1. dart 乘方运算符_Dart系列-运算符
  2. word2vec训练词向量 python_使用Gensim word2vector训练词向量
  3. 可以直接考甲级吗_成人高考可以考本科吗?成人高考可以考研究生吗?
  4. B. Sifid and Strange Subsequences
  5. 3-5 单链表分段逆转 (20 分)
  6. csv格式清洗与转换python_实例详解Python中 CSV格式清洗与转换
  7. javascript自定义事件应用实例
  8. 里怎么做页眉页脚_这年头县城里在家做的电商利润怎么样
  9. bert pytorch 序列标注_序列标注:Bi-LSTM + CRF
  10. fastapi 传输文件存文件_python3 FastAPI框架入门 基本使用, 模版渲染, 数据交互,cookie使用, 上传文件, 静态文件配置...