最近其实一直想自己手动创建op，这样的话好像得懂tensorflow自定义api/op的规则，设计前向与反向，注册命名，注意端口以及文件组织，最后可能还要需要重新编译才能使用。这一部分其实记得tensorflow官网上(可能是老版)有过介绍，但是当时没有仔细研究，也可能写的不够清晰，打算之后再专门写一篇博客介绍。本文主要介绍不自定义op的前提下，实现最大自由度的梯度计算与处理。

一、`tf.gradients`和`tf.stop_gradient()`以及高阶导数

这一部分可以参考：
https://blog.csdn.net/u012436149/article/details/53905797
https://blog.csdn.net/u012871493/article/details/71841709
https://blog.csdn.net/Invokar/article/details/86565232

gradient

tensorflow中有一个计算梯度的函数tf.gradients(ys, xs)，要注意的是，xs中的x必须要与ys相关，不相关的话，会报错。
代码中定义了两个变量w1， w2，但res只与w1相关

#wrong
import tensorflow as tfw1 = tf.Variable([[1,2]])
w2 = tf.Variable([[3,4]])res = tf.matmul(w1, [[2],[1]])grads = tf.gradients(res,[w1,w2])with tf.Session() as sess:tf.global_variables_initializer().run()re = sess.run(grads)print(re)

错误信息
TypeError: Fetch argument None has invalid type

# right
import tensorflow as tfw1 = tf.Variable([[1,2]])
w2 = tf.Variable([[3,4]])res = tf.matmul(w1, [[2],[1]])grads = tf.gradients(res,[w1])with tf.Session() as sess:tf.global_variables_initializer().run()re = sess.run(grads)print(re)
#  [array([[2, 1]], dtype=int32)]

对于grad_ys的测试：

import tensorflow as tfw1 = tf.get_variable('w1', shape=[3])
w2 = tf.get_variable('w2', shape=[3])w3 = tf.get_variable('w3', shape=[3])
w4 = tf.get_variable('w4', shape=[3])z1 = w1 + w2+ w3
z2 = w3 + w4grads = tf.gradients([z1, z2], [w1, w2, w3, w4], grad_ys=[tf.convert_to_tensor([2.,2.,3.]),tf.convert_to_tensor([3.,2.,4.])])with tf.Session() as sess:tf.global_variables_initializer().run()print(sess.run(grads))

[array([ 2.,  2.,  3.],dtype=float32),array([ 2.,  2.,  3.], dtype=float32), array([ 5.,  4.,  7.], dtype=float32), array([ 3.,  2.,  4.], dtype=float32)]

可以看出，grad_ys 代表的是 ys 的头梯度

tf.stop_gradient()

阻挡节点BP的梯度

import tensorflow as tfw1 = tf.Variable(2.0)
w2 = tf.Variable(2.0)a = tf.multiply(w1, 3.0)
a_stoped = tf.stop_gradient(a)# b=w1*3.0*w2
b = tf.multiply(a_stoped, w2)
gradients = tf.gradients(b, xs=[w1, w2])
print(gradients)
#输出
#[None, <tf.Tensor 'gradients/Mul_1_grad/Reshape_1:0' shape=() dtype=float32>]

可见，一个节点被 stop之后，这个节点上的梯度，就无法再向前BP了。由于w1变量的梯度只能来自a节点，所以，计算梯度返回的是None。

import tensorflow as tf
x=tf.constant([2.0,2.1,3.2,4.1])
G = tf.get_default_graph()
with G.gradient_override_map({"Sign": "Identity"}):E = tf.stop_gradient(tf.reduce_mean(tf.abs(x)))y = tf.sign(x / E)#y = tf.sign(x / E)grad=tf.gradients(y,x)
sess=tf.Session()
print(sess.run(grad))
print(sess.run(2.1))
print(sess.run(2.4))
print(sess.run(2.5))
print(sess.run(2.51))
print(sess.run(0.5))
print(sess.run(-2.1))
print(sess.run(-2.4))
print(sess.run(-2.5))
print(sess.run(-2.51))
tf.argmax()
tf.maximum()
import numpy as np
np.max()

a = tf.Variable(1.0)
b = tf.Variable(1.0)c = tf.add(a, b)c_stoped = tf.stop_gradient(c)d = tf.add(a, b)e = tf.add(c_stoped, d)gradients = tf.gradients(e, xs=[a, b])with tf.Session() as sess:tf.global_variables_initializer().run()print(sess.run(gradients))

虽然 c节点被stop了，但是a，b还有从d传回的梯度，所以还是可以输出梯度值的。

其次，在某些特殊用途，比如域适应，元学习，以及强化学习中，可能会有一些特殊的用法，比如这里的DQN：
https://blog.csdn.net/u013745804/article/details/79589514
https://blog.csdn.net/zbrwhut/article/details/83341869
在这情况下，网络可能是分步骤进行的，有多次run的过程，比如输入数据的准备程序，GAN，强化学习这种。。

import tensorflow as tfw1 = tf.Variable(2.0)
w2 = tf.Variable(2.0)
a = tf.multiply(w1, 3.0)
a_stoped = tf.stop_gradient(a)# b=w1*3.0*w2
b = tf.multiply(a_stoped, w2)opt = tf.train.GradientDescentOptimizer(0.1)gradients = tf.gradients(b, xs=tf.trainable_variables())tf.summary.histogram(gradients[0].name, gradients[0])# 这里会报错，因为gradients[0]是None
#其它地方都会运行正常，无论是梯度的计算还是变量的更新。总觉着tensorflow这么设计有点不好，
#不如改成流过去的梯度为0
train_op = opt.apply_gradients(zip(gradients, tf.trainable_variables()))print(gradients)
with tf.Session() as sess:tf.global_variables_initializer().run()print(sess.run(train_op))print(sess.run([w1, w2]))

高阶导数

tensorflow 求高阶导数可以使用 tf.gradients 来实现

import tensorflow as tfwith tf.device('/cpu:0'):a = tf.constant(1.)b = tf.pow(a, 2)grad = tf.gradients(ys=b, xs=a) # 一阶导print(grad[0])grad_2 = tf.gradients(ys=grad[0], xs=a) # 二阶导grad_3 = tf.gradients(ys=grad_2[0], xs=a) # 三阶导print(grad_3)with tf.Session() as sess:print(sess.run(grad_3))

Note: 有些 op，tf 没有实现其高阶导的计算，例如 tf.add …, 如果计算了一个没有实现高阶导的 op的高阶导， gradients 会返回 None。

另外这可以联系到这里，https://blog.csdn.net/edward_zcl/article/details/89338166

二、梯度修剪`apply_gradients`和`compute_gradients`

这一部分可以参考：
https://blog.csdn.net/hekkoo/article/details/53896598?utm_source=blogxgwz1

本文的由来是因为我想使用一个step function作为我的loss
function,但是直接使用会导致gradient不能计算，而之前在看tensorflow相关文档时，发现minimize可看作compute_gradients和apply_gradients二者之和，换言之，我们可以先计算gradients，进行处理后，再apply_gradients.

本来一开始打算自己去实现的，但由于tensorflow刚入门，碰了很多壁，最后在知乎上搜索时搜到分布式Tensorflow的梯度累积与异步更新，看到里面的代码，才弄明白该怎么弄

定义
1
gradient_all = optimizer.compute_gradients(loss)
计算全部gradient

2
grads_vars = [v for (g,v) in gradient_all if g is not None]
得到可进行梯度计算的变量

3
gradient = optimizer.compute_gradients(loss, grads_vars)
得到所需梯度

4
grads_holder = [(tf.placeholder(tf.float32, shape=g.get_shape()), v) for (g,v) in gradient]
生成holder

5
train_op = optimizer.apply_gradients(grads_holder)
继续进行BP算法

应用
1
gradient_result = sess.run(gradient, feed_dict={x:x_i,y_:y_real})
生成结果，计算loss与gradient
2

grads_dict={}
for i in range(len(gradient_result)):
k = grads_holder[i][0] # 取出holder，用于后面的feed_dict
grads_dict[k] = DealTheGradientFunction(gradient_result[i][0]) # 自由处理梯度

3
_ = sess.run(train_op,feed_dict=grads_dict)
继续更新权值

以上我们主要可以看出，在使用梯度之前可以先计算梯度，做相应的处理后，再进行应用梯度，这两部分组合起来就实现了自定义的梯度操作。但是还是很有限，这主要用于防止梯度爆炸与梯度消失，对梯度进行裁剪而已。你还可以参考：
https://blog.csdn.net/NockinOnHeavensDoor/article/details/80632677#%E6%A2%AF%E5%BA%A6%E4%BF%AE%E5%89%AA%E4%B8%BB%E8%A6%81%E9%81%BF%E5%85%8D%E8%AE%AD%E7%BB%83%E6%A2%AF%E5%BA%A6%E7%88%86%E7%82%B8%E5%92%8C%E6%B6%88%E5%A4%B1%E9%97%AE%E9%A2%98

我这里还有一份写的很好的博客，很全，可以作为进一步学习：
https://blog.csdn.net/lenbow/article/details/52218551

所以说compute_gradient与gradient是一致的，要想裁剪，需要得到梯度，处理之后再应用。
比如：
https://blog.csdn.net/u012436149/article/details/53006953
这里其实我有联想到了我的另外一篇博客，
https://blog.csdn.net/edward_zcl/article/details/89418268
https://blog.csdn.net/diligent_321/article/details/53130913

其实在异步通信，异步更新，多gpu并行，以及分布中，经常需要这样做。这就涉及到优化器了，低级版的参考：https://www.cnblogs.com/marsggbo/p/10056057.html
https://blog.csdn.net/c2a2o2/article/details/65633147
高级版的参考：https://blog.csdn.net/u014665013/article/details/84404204
另外这个小哥总结的流程很实用，也可以看看：
https://blog.csdn.net/weixin_36474809/article/details/88031720

另外我好像在哪里见过gradient cancelling，不知道是迁移学习的领域，还是在量化网络(基于keras)的一个github上面看到的，也可能是我记错了吧。。

三、`tf.get_default_graph().gradient_override_map`

关于这一点，我已经讲过过很多次了，可以参见我的博客：https://blog.csdn.net/edward_zcl/article/details/89338166

再举个栗子吧。

import tensorflow as tfsess=tf.Session()a=tf.constant([-1.0,-0.5,0.0,0.5,1.0])with tf.get_default_graph().gradient_override_map({"Relu": "Identity"}):result = tf.nn.relu(a)grad=tf.gradients(result, a)print(sess.run(grad))import tensorflow as tfsess=tf.Session()a=tf.constant([-1.0,-0.5,0.0,0.5,1.0])with tf.get_default_graph().gradient_override_map({"Relu": "TL_Sign_QuantizeGrad"}):result = tf.nn.relu(a)grad=tf.gradients(result, a)print(sess.run(grad))

其实tensorflow中所有算子(操作)都是定义了一套对应表，或者称为map标志代号，需要的时候直接索引对应的操作标识符就可以了。。

我的博客实例中，你甚至可以自己定义函数，去反向传播梯度，但是很有限，你只能很简单处理一下梯度，跟上面的功能类似，用于替代某些不可导，或者自定义的梯度反传行为。

四、`tf.py_func`+`tf.RegisterGradient`+`tf.get_default_graph().gradient_override_map`

这个组合可以说非常强大了，这里面你需要弄懂tensorflow里面的map符号机制，装饰器，lamba表达式，tf，numpy数据类型，成员函数与数据成员(更进一步是属性attrubute，property等)。总之有了这个函数，可以说很强了，好像不用编译，就可以实现自定义的操作，甚至是大型的网络层。。反而现在好奇的是，到底还需不需要编译了，或者那些自定义操作，还需要编译的是什么需求呢。关于编译源码实现自定义操作官网之前有讲(老版了。。)，之后再详谈。。

先给出一段代码，师弟有一天请教我的，我看了半天，大致上看懂了，当然还有一些细节没看，现在发现这个函数tf.py_func真的是太强了。

import numpy as np
import tensorflow as tf
from tensorflow.python.framework import ops# define common custom relu function
#def my_relu_def(x, threshold1=0.05):
#    if x<threshold:
#        return 0.0
#    else:
#        return x
#
#def my_relu_grad_def(x, threshold=0.05):
#    if x<threshold:
#        return 0.0
#    else:
#        return 1.2def my_relu_def(x, threshold1=3, threshold2=-3):if x<threshold2:return -3.0elif x<threshold1:return xelse:return 3.0def my_relu_grad_def(x, threshold1=3, threshold2=-3):if x<threshold2:return 0.0elif x<threshold1:return 1.0else:return 0.0# making a common function into a numpy function
my_relu_np = np.vectorize(my_relu_def)
my_relu_grad_np = np.vectorize(my_relu_grad_def)
# numpy uses float64 but tensorflow uses float32
my_relu_np_32 = lambda x: my_relu_np(x).astype(np.float32)
my_relu_grad_np_32 = lambda x: my_relu_grad_np(x).astype(np.float32)def my_relu_grad_tf(x, name=None):with ops.name_scope(name, "my_relu_grad_tf", [x]) as name:y = tf.py_func(my_relu_grad_np_32,[x],[tf.float32],name=name,stateful=False)return y[0]def my_py_func(func, inp, Tout, stateful=False, name=None, my_grad_func=None):# Need to generate a unique name to avoid duplicates:random_name = 'PyFuncGrad' + str(np.random.randint(0, 1E+8))tf.RegisterGradient(random_name)(my_grad_func)  # see _my_relu_grad for grad exampleg = tf.get_default_graph()with g.gradient_override_map({"PyFunc": random_name, "PyFuncStateless": random_name}):return tf.py_func(func, inp, Tout, stateful=stateful, name=name)# The grad function we need to pass to the above my_py_func function takes a special form:
# It needs to take in (an operation, the previous gradients before the operation)
# and propagate(i.e., return) the gradients backward after the operation.
def _my_relu_grad(op, pre_grad):x = op.inputs[0]cur_grad = my_relu_grad_tf(x)next_grad = pre_grad * cur_gradreturn next_graddef my_relu_tf(x, name=None):with ops.name_scope(name, "my_relu_tf", [x]) as name:y = my_py_func(my_relu_np_32,[x],[tf.float32],stateful=False,name=name,my_grad_func=_my_relu_grad)  # <-- here's the call to the gradientreturn y[0]with tf.Session() as sess:x = tf.constant([-3, -4, 1, 40])y = my_relu_tf(x)tf.global_variables_initializer().run()print (x.eval())print (y.eval())print (tf.gradients(y, [x])[0].eval())

好像是实现了自定义的一个阈值函数，也不知来源是哪里，找到了个类似的https://blog.csdn.net/mmc2015/article/details/71250090

关于tf.py_func这个函数，可以先看看这个
https://www.jianshu.com/p/bac384d34c47
https://blog.csdn.net/DaVinciL/article/details/80615526
上面两个链接讲的其实还行，就是有个地方有点冲突，就是这个tf.py_func属不属于计算图的一部分，我觉得这个看你怎么用，既可以耦合进入，也可以独立于计算图。分别对应于是否实现对应的梯度函数，可能得加上stop_gradient之类的。。

总之，这个函数确实是增大了tensorflow这个静态图框架的灵活性，虽然tensorflow还在不断发展，好像已经有lite，server，eager，甚至支持动态图什么的。。反正很多新功能。。
在这里还是先认为这种框架是静态计算图吧。参考以下链接作进一步理解：
https://blog.csdn.net/DaVinciL/article/details/80615526
https://blog.csdn.net/aaon22357/article/details/82996436
https://blog.csdn.net/weixin_41950276/article/details/83590058 (这个人可能用的python版本不对。。)
https://blog.csdn.net/aaon22357/article/details/82996436
这个：
https://blog.csdn.net/tiankongtiankong01/article/details/80568311
的话，讲的更加细致，全面，基本总结了这个函数的基本用法，numpy与tensor的互动，操作相互弥补，以及两者的区别(条件判断，size提前获取等)。但美中不足的是，如果我们想把这个tf.py_func嵌入我们计算图中呢，并且进行端到端的模型训练，我们应该怎么做，这个时候就需要看看上面链接了，https://blog.csdn.net/mmc2015/article/details/71250090，
https://blog.csdn.net/caorui_nk/article/details/82898200， 通过前面的讲解应该能看个99%理解，唯一一点就是

with g.gradient_override_map({"PyFunc": random_name, "PyFuncStateless": random_name}):return tf.py_func(func, inp, Tout, stateful=stateful, name=name)

这句话，到底是为了什么要加上两个map呢。另外有兴趣可以了解一下其他参数stateful，这个参数也是有点迷。
https://blog.csdn.net/u011436429/article/details/80420700
https://blog.csdn.net/xiaoYAN174/article/details/79090382
通过这两个链接，应该可以知道，他们如果不使用梯度的话，是使用在fastrcnn中的rpn中的，专门用于复杂计算。但这里不涉及反向传播，只适用于计算的转换。

以上。

Tensorflow中的各种梯度处理gradient相关推荐

Mini-batch 梯度下降与Tensorflow中的应用
mini-batch在深度学习中训练神经网络时经常用到,这是一种梯度下降方法,可以很快的降低cost,接下来系统介绍一下. 1. 什么是 mini-batch梯度下降先来快速看一下BGD,SGD,M ...
3. 机器学习中为什么需要梯度下降_梯度提升（Gradient Boosting）算法
本文首发于我的微信公众号里,地址:梯度提升(Gradient Boosting)算法本文禁止任何形式的转载. 我的个人微信公众号:Microstrong 微信公众号ID:MicrostrongAI ...
python中tensor与variable_NLP实战篇之tf2中tensor、variable、gradient、ops
本文是基于tensorflow2.2.0版本,介绍了tf中变量.张量的概念,tf中梯度的计算方式和tensor相关的操作. 实战系列篇章中主要会分享,解决实际问题时的过程.遇到的问题或者使用的工具等等 ...
如何像用MNIST一样来用ImageNet？这里有一份加速TensorFlow分布式训练的梯度压缩指南
作者 | 王佐今年的 NIPS 出现 "Imagenet is the new MNIST" 口号,宣告使用 MNIST 数据集检验网络模型性能已经成为过去式.算法工程师们早就意 ...
TensorFlow中设置学习率的方式
目录 1. 指数衰减 2. 分段常数衰减 3. 自然指数衰减 4. 多项式衰减 5. 倒数衰减 6. 余弦衰减 6.1 标准余弦衰减 6.2 重启余弦衰减 6.3 线性余弦噪声 6.4 噪声余弦衰减 ...
TensorFlow 中文文档介绍
介绍本章的目的是让你了解和运行 TensorFlow 在开始之前, 先看一段使用 Python API 撰写的 TensorFlow 示例代码, 对将要学习的内容有初步的印象. 这段很短的 Pyth ...
Python 3深度置信网络(DBN)在Tensorflow中的实现MNIST手写数字识别
任何程序错误,以及技术疑问或需要解答的,请扫码添加作者VX:1755337994 使用DBN识别手写体传统的多层感知机或者神经网络的一个问题: 反向传播可能总是导致局部最小值. 当误差表面(erro ...
tensorflow中optimizer minimize自动训练简介和选择训练variable的方法
本文主要介绍tensorflow的自动训练的相关细节,并把自动训练和基础公式结合起来.如有不足,还请指教. 写这个的初衷:有些教程说的比较模糊,没体现出用意和特性或应用场景. 面向对象:稍微了解点代码 ...
系统学习深度学习（三十五）--策略梯度(Policy Gradient)
转自:https://www.cnblogs.com/pinard/p/10137696.html 在前面讲到的DQN系列强化学习算法中,我们主要对价值函数进行了近似表示,基于价值来学习.这种Valu ...

Tensorflow中的各种梯度处理gradient

一、`tf.gradients`和`tf.stop_gradient()`以及高阶导数

gradient

tf.stop_gradient()

高阶导数

二、梯度修剪`apply_gradients`和`compute_gradients`

三、`tf.get_default_graph().gradient_override_map`

四、`tf.py_func`+`tf.RegisterGradient`+`tf.get_default_graph().gradient_override_map`

Tensorflow中的各种梯度处理gradient相关推荐

最新文章

热门文章

Tensorflow中的各种梯度处理gradient

一、tf.gradients和tf.stop_gradient()以及高阶导数

gradient

tf.stop_gradient()

高阶导数

二、梯度修剪apply_gradients和compute_gradients

三、tf.get_default_graph().gradient_override_map

四、tf.py_func+tf.RegisterGradient+tf.get_default_graph().gradient_override_map

Tensorflow中的各种梯度处理gradient相关推荐

最新文章

热门文章

一、`tf.gradients`和`tf.stop_gradient()`以及高阶导数

二、梯度修剪`apply_gradients`和`compute_gradients`

三、`tf.get_default_graph().gradient_override_map`

四、`tf.py_func`+`tf.RegisterGradient`+`tf.get_default_graph().gradient_override_map`