[深度学习TF2] 梯度带(GradientTape)

TF梯度带GradientTape

1. 背景介绍
2. tf.GradientTape函数的参数介绍
- 例1 - persistent =False and watch_accessed_variables=True ，也就是默认值
- 例2 - persistent =True and watch_accessed_variables=True，
- 例3 - persistent =True and watch_accessed_variables=True ，用Constant定义常量与Variable对比
- 例4：利用梯度的值再求梯度
- 例5：对同在一个梯度带中的多个公式分别就梯度
- 例6 ，可以对tf 相关的函数求梯度，例如reduce_sum与multiply
- 例7，二元函数求梯度
3. apply_gradients(grads_and_vars,name=None)
- 例8，一个线性回归的简单综合例子来把优化器和梯度带结合起来
- 例9 -记录控制流
4.参考资料

1. 背景介绍

梯度带是tensorflow2.x非常常用的一个特性了，因为一旦涉及到计算梯度的问题就离不开这个新的API

2. tf.GradientTape函数的参数介绍

persistent: Boolean controlling whether a persistent gradient tape is created. False by default, which means at most one call can be made to the gradient() method on this object.
watch_accessed_variables: Boolean controlling whether the tape will automatically watch any (trainable) variables accessed while the tape is active. Defaults to True meaning gradients can be requested from any result computed in the tape derived from reading a trainable Variable. If False users must explicitly watch any Variables they want to request gradients from.

persistent: 如果是false,那么gradient()函数最多只能调用一次。反之可以调用多次，默认是False.
watch_accessed_variables: 默认值是True,可以自动对任何Tensorflow 的Variable求梯度。
如果是False，那么只能显示调用Watch()方法对某些变量就梯度了

例1 - persistent =False and watch_accessed_variables=True ，也就是默认值

import tensorflow as tf
x= tf.Variable(initial_value=3.0)
with tf.GradientTape() as g:y = x * x
dy_dx = g.gradient(y, x)  # Will compute to 6.0
print(dy_dx)
dy_dx = g.gradient(y, x)
print(dy_dx)

执行结果调用第一次gradient()方法返回6，而第二次就抛错，因为persistent默认是False（GradientTape.gradient can only be called once on non-persistent tapes）

tf.Tensor(6.0, shape=(), dtype=float32)
Traceback (most recent call last):File "**/GradientTape_test.py", line 70, in <module>test1()File "**/test/GradientTape_test.py", line 11, in test1dy_dx = g.gradient(y, x)  # Will compute to 6.0File "**\lib\site-packages\tensorflow_core\python\eager\backprop.py", line 980, in gradientraise RuntimeError("GradientTape.gradient can only be called once on "
RuntimeError: GradientTape.gradient can only be called once on non-persistent tapes.

例2 - persistent =True and watch_accessed_variables=True，

import tensorflow as tf
x= tf.Variable(initial_value=3.0)
with tf.GradientTape(persistent=True) as g:y = x * x
dy_dx = g.gradient(y, x)  # Will compute to 6.0
print(dy_dx)
dy_dx = g.gradient(y, x)
print(dy_dx)

执行结果

tf.Tensor(6.0, shape=(), dtype=float32)
tf.Tensor(6.0, shape=(), dtype=float32)

例3 - persistent =True and watch_accessed_variables=True ，用Constant定义常量与Variable对比

import tensorflow as tf
x= tf.Variable(initial_value=3.0)
with tf.GradientTape(persistent=True) as g:y = x * x
dy_dx = g.gradient(y, x)  # Will compute to 6.0
print(dy_dx)x = tf.constant(3.0)
with tf.GradientTape(persistent=True) as g1:y = x * x
dy_dx = g1.gradient(y, x)  # Will compute to 6.0
print(dy_dx)with tf.GradientTape(persistent=True) as g1:g1.watch(x)y = x * x
dy_dx = g1.gradient(y, x)  # Will compute to 6.0
print(dy_dx)

执行结果，如果用constant定义常量而且你想要对其求梯度，那么必须调用watch方法

tf.Tensor(6.0, shape=(), dtype=float32)
None
tf.Tensor(6.0, shape=(), dtype=float32)

例4：利用梯度的值再求梯度

import tensorflow as tf
x = tf.constant(3.0)
with tf.GradientTape() as g:g.watch(x)with tf.GradientTape() as gg:gg.watch(x)y = x * xdy_dx = gg.gradient(y, x)  # Will compute to 6.0print(dy_dx)
d2y_dx2 = g.gradient(dy_dx, x)  # Will compute to 2.
print(d2y_dx2)

例5：对同在一个梯度带中的多个公式分别就梯度

import tensorflow as tf
x = tf.constant(3.0)
with tf.GradientTape(persistent=True) as g:g.watch(x)y = x * xz = y * y
dz_dx = g.gradient(z, x)  # 108.0 (4*x^3 at x = 3)
print(dz_dx)
dy_dx = g.gradient(y, x)  # 6.0
print(dy_dx)
del g  # Drop the reference to the tape

例6 ，可以对tf 相关的函数求梯度，例如reduce_sum与multiply

import tensorflow as tf
x = tf.ones((2, 2))
print(x)
y = tf.reduce_sum(x)
print(y)
z = tf.multiply(y, y)
print(z)
# 需要计算梯度的操作
with tf.GradientTape() as t:t.watch(x)y = tf.reduce_sum(x)z = tf.multiply(y, y)
# 计算z关于x的梯度
dz_dx = t.gradient(z, x)
print(dz_dx)

例7，二元函数求梯度

import tensorflow as tf
x = tf.constant(value=3.0)
y = tf.constant(value=2.0)
with tf.GradientTape(persistent=True,watch_accessed_variables=True) as tape:tape.watch([x,y])z1=x*x*y+x*y
# 一阶导数
dz1_dx=tape.gradient(target=z1,sources=x)
dz1_dy = tape.gradient(target=z1, sources=y)
dz1_d=tape.gradient(target=z1,sources=[x,y])
print("dz1_dx:", dz1_dx)
print("dz1_dy:", dz1_dy)
print("dz1_d:",dz1_d)
print("type of dz1_d:",type(dz1_d))

执行结果

dz1_dx: tf.Tensor(14.0, shape=(), dtype=float32)
dz1_dy: tf.Tensor(12.0, shape=(), dtype=float32)
dz1_d: [<tf.Tensor: shape=(), dtype=float32, numpy=14.0>, <tf.Tensor: shape=(), dtype=float32, numpy=12.0>]
type of dz1_d: <class 'list'>

3. apply_gradients(grads_and_vars,name=None)

作用：把计算出来的梯度更新到变量上面去。
参数:

grads_and_vars: (gradient, variable) 对的列表.
name: 操作名
This is the second part of minimize(). It returns an Operation that
applies gradients.
Args:
grads_and_vars: List of (gradient, variable) pairs.
name: Optional name for the returned operation. Default to the name
passed to the Optimizer constructor.
Returns:
An Operation that applies the specified gradients. The iterations
will be automatically increased by 1.

例8，一个线性回归的简单综合例子来把优化器和梯度带结合起来

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as pltTRAIN_STEPS = 20# Prepare train data
train_X = np.linspace(-1, 1, 100)
train_Y = 2 * train_X + np.random.randn(*train_X.shape) * 0.33 + 10print(train_X.shape)w = tf.Variable(initial_value=1.0)
b = tf.Variable(initial_value=1.0)optimizer = tf.keras.optimizers.SGD(0.1)
mse = tf.keras.losses.MeanSquaredError()for i in range(TRAIN_STEPS):print("epoch:", i)print("w:", w.numpy())print("b:", b.numpy())# 计算和更新梯度with tf.GradientTape() as tape:logit = w * train_X + bloss = mse(train_Y, logit)gradients = tape.gradient(target=loss, sources=[w, b])  # 计算梯度# print("gradients:",gradients)# print("zip:\n",list(zip(gradients,[w,b])))optimizer.apply_gradients(zip(gradients, [w, b]))  # 更新梯度# draw
plt.plot(train_X, train_Y, "+")
plt.plot(train_X, w * train_X + b)
plt.show()

执行结果：可以看到随着epoch增大，W和b值逐渐逼近2和10

epoch: 0
w: 1.0
b: 1.0
epoch: 1
w: 1.0676092
b: 2.7953496
epoch: 2
w: 1.13062
b: 4.231629
epoch: 3
w: 1.1893452
b: 5.3806524
epoch: 4
w: 1.2440765
b: 6.2998714
epoch: 5
w: 1.2950852
b: 7.035247
epoch: 6
w: 1.3426247
b: 7.623547
epoch: 7
w: 1.3869308
b: 8.094187
epoch: 8
w: 1.4282235
b: 8.470699
epoch: 9
w: 1.4667077
b: 8.771909
epoch: 10
w: 1.5025746
b: 9.0128765
epoch: 11
w: 1.5360019
b: 9.20565
epoch: 12
w: 1.5671558
b: 9.35987
epoch: 13
w: 1.5961908
b: 9.483246
epoch: 14
w: 1.6232511
b: 9.581946
epoch: 15
w: 1.6484709
b: 9.660907
epoch: 16
w: 1.6719754
b: 9.724075
epoch: 17
w: 1.6938813
b: 9.77461
epoch: 18
w: 1.7142972
b: 9.815037
epoch: 19
w: 1.7333245
b: 9.847379

例9 -记录控制流

因为tapes记录了整个操作，所以即使过程中存在python控制流（如if， while），梯度求导也能正常处理。

def f(x, y):output = 1.0# 根据y的循环for i in range(y):# 根据每一项进行判断if i> 1 and i<5:output = tf.multiply(output, x)return outputdef grad(x, y):with tf.GradientTape() as t:t.watch(x)out = f(x, y)# 返回梯度return t.gradient(out, x)
# x为固定值
x = tf.convert_to_tensor(2.0)print(grad(x, 6))
print(grad(x, 5))
print(grad(x, 4))

执行结果

tf.Tensor(12.0, shape=(), dtype=float32)
tf.Tensor(12.0, shape=(), dtype=float32)
tf.Tensor(4.0, shape=(), dtype=float32)

4.参考资料

[0] https://www.tensorflow.org/api_docs/python/tf/GradientTape
[1] https://blog.csdn.net/xierhacker/article/details/53174558
[2] https://blog.csdn.net/qq_36758914/article/details/104456736