python求雅可比矩阵_在Python中计算神经网络的雅可比矩阵

通常，神经网络是一个多变量，矢量值函数，如下所示：

函数f有一些参数θ(神经网络的权重),它将一个N维向量x(即猫图片的N像素)映射到一个m维矢量(例如，x属于M个不同类别中的每个类别的概率):

在训练过程中，通常会给输出附加一个标量损失值——分类的一个典型例子是预测类概率的交叉熵损失。当使用这样的标量损失时，M = 1，然后通过执行(随机)梯度下降来学习参数，在此期间重复计算相对于θ的损失函数的梯度。因此，在训练期间计算标量输出值相对于网络参数的梯度是很常见的，并且所有常见的机器学习库都可以这样做，通常使用自动微分非常有效。

然而,在推理时网络的输出通常是一个向量(例如,类概率)。在这种情况下，查看网络的雅可比矩阵可能会很有趣。在这篇文章中,我解释一下什么是雅可比矩阵,然后我探索和比较一些可能实现用Python来完成。

什么是雅可比矩阵，为什么我们会关心？

我们称y为f的输出向量。f的雅可比矩阵包含y的每个元素的偏导数，相对于输入x的每个元素：

该矩阵告诉我们神经网络输入的local perturbations将如何影响输出。在某些情况下，这些信息可能很有价值。例如，在用于创造性任务的机器学习(ML)系统中，让系统为用户提供一些交互式反馈可以很方便，告诉他们修改每个输入维度将如何影响每个输出类。

Tensorflow

让我们一起尝试使用Tensorflow。首先，我们需要一个示例网络来玩。在这里，我只对在测试时计算现有网络f的雅可比行列式感兴趣，所以我不专注于训练。假设我们有一个简单的网络[affine → ReLU → affine → softmax]。我们首先定义一些随机参数：

import numpy as np

N = 500 # Input size

H = 100 # Hidden layer size

M = 10 # Output size

w1 = np.random.randn(N, H) # first affine layer weights

b1 = np.random.randn(H) # first affine layer bias

w2 = np.random.randn(H, M) # second affine layer weights

b2 = np.random.randn(M) # second affine layer bias

使用Keras，我们按如下方式实施我们的神经网络：

import tensorflow as tf

from tensorflow.keras.layers import Dense

sess = tf.InteractiveSession()

sess.run(tf.initialize_all_variables())

model = tf.keras.Sequential()

model.add(Dense(H, activation='relu', use_bias=True, input_dim=N))

model.add(Dense(O, activation='softmax', use_bias=True, input_dim=O))

model.get_layer(index=0).set_weights([w1, b1])

model.get_layer(index=1).set_weights([w2, b2])

现在让我们尝试计算这个神经网络模型的雅可比行列式。不幸的是，Tensorflow目前没有提供一种开箱即用的雅可比矩阵计算方法。方法tf.gradients(y,xs)为xs的每一个x都返回sum(dy / dx)，在我们的例子中，是n维的矢量，不是我们想要的。然而，通过计算每个yi的梯度矢量，我们仍然可以计算出雅可比矩阵，并且将输出分组为矩阵:

def jacobian_tensorflow(x):

jacobian_matrix = []

for m in range(M):

# We iterate over the M elements of the output vector

grad_func = tf.gradients(model.output[:, m], model.input)

gradients = sess.run(grad_func, feed_dict={model.input: x.reshape((1, x.size))})

jacobian_matrix.append(gradients[0][0,:])

return np.array(jacobian_matrix)

我们用数值微分来检验雅可比矩阵的正确性。下面的函数is_jacobian_correct()采用一个函数来计算雅可比矩阵和前馈函数f:

def is_jacobian_correct(jacobian_fn, ffpass_fn):

""" Check of the Jacobian using numerical differentiation

"""

x = np.random.random((N,))

epsilon = 1e-5

""" Check a few columns at random

"""

for idx in np.random.choice(N, 5, replace=False):

x2 = x.copy()

x2[idx] += epsilon

num_jacobian = (ffpass_fn(x2) - ffpass_fn(x)) / epsilon

computed_jacobian = jacobian_fn(x)

if not all(abs(computed_jacobian[:, idx] - num_jacobian) < 1e-3):

return False

return True

def ffpass_tf(x):

""" The feedforward function of our neural net

"""

xr = x.reshape((1, x.size))

return model.predict(xr)[0]

is_jacobian_correct(jacobian_tensorflow, ffpass_tf)

>> True

非常好，这是正确的。让我们看看这个计算需要多长时间：

tic = time.time()

jacobian_tf = jacobian_tensorflow(x0, verbose=False)

tac = time.time()

print('It took %.3f s. to compute the Jacobian matrix' % (tac-tic))

>> It took 0.658 s. to compute the Jacobian matrix

用CPU需要大约650毫秒。650 ms对于这样的示例来说太慢了，特别是如果我们在测试时考虑到交互式使用，那么是否可以做得更好呢？

Autograd

Autograd是一个非常好的Python库。要使用它，我们首先必须使用Autograd的封装Numpy 指定我们的前馈函数f：

import autograd.numpy as anp

def ffpass_anp(x):

a1 = anp.dot(x, w1) + b1 # affine

a1 = anp.maximum(0, a1) # ReLU

a2 = anp.dot(a1, w2) + b2 # affine

exps = anp.exp(a2 - anp.max(a2)) # softmax

out = exps / exps.sum()

return out

首先，让我们通过比较之前的Tensorflow前馈函数ffpass_tf()来检查这个函数是否正确。

out_anp = ffpass_anp(x0)

out_keras = ffpass_tf(x0)

np.allclose(out_anp, out_keras, 1e-4)

>> True

好的,我们有相同的函数f。现在让我们计算雅可比矩阵。Autograd很简单：

from autograd import jacobian

def jacobian_autograd(x):

return jacobian(ffpass_anp)(x)

is_jacobian_correct(jacobian_autograd, ffpass_np)

>> True

看起来很正确。需要多长时间呢？

%timeit jacobian_autograd(x0)

>> 3.69 ms ± 135 µs

我们的Tensorflow实现大约需要650毫秒，Autograd需要3.7毫秒，在这种情况下我们的速度提高了约170倍。当然，使用Numpy指定一个模型并不是很方便，因为Tensorflow和Keras提供了许多有用的函数和开箱即用的训练设施......，但是现在我们越过这一步并使用Numpy编写我们的网络，我们是不是会让它更快呢？如果你看一下Autograd的jacobian()函数的实现，它仍然是在函数输出的维度上映射的。这是一个提示，我们可以通过直接依靠Numpy更好的矢量化来改善我们的结果。

NumPy

如果我们想要一个适当的Numpy实现，我们必须指定每个层的forward 和backward passes，以便自己实现backprop。下面的示例网络包含 - affine，ReLU和softmax。这里的层的实现是非常通用的。

基本上，backward pass现在传播包含每个网络输出的梯度的矩阵(或者，在一般情况下，张量)，我们使用Numpy高效矢量化操作：

def affine_forward(x, w, b):

"""

Forward pass of an affine layer

:param x: input of dimension (I, )

:param w: weights matrix of dimension (I, O)

:param b: biais vector of dimension (O, )

:return output of dimension (O, ), and cache needed for backprop

"""

out = np.dot(x, w) + b

cache = (x, w)

return out, cache

def affine_backward(dout, cache):

"""

Backward pass for an affine layer.

:param dout: Upstream Jacobian, of shape (M, O)

:param cache: Tuple of:

- x: Input data, of shape (I, )

- w: Weights, of shape (I, O)

:return the jacobian matrix containing derivatives of the M neural network outputs with respect to

this layer's inputs, evaluated at x, of shape (M, I)

"""

x, w = cache

dx = np.dot(dout, w.T)

return dx

def relu_forward(x):

""" Forward ReLU

"""

out = np.maximum(np.zeros(x.shape), x)

cache = x

return out, cache

def relu_backward(dout, cache):

"""

Backward pass of ReLU

:param dout: Upstream Jacobian

:param cache: the cached input for this layer

:return: the jacobian matrix containing derivatives of the M neural network outputs with respect to

this layer's inputs, evaluated at x.

"""

x = cache

dx = dout * np.where(x > 0, np.ones(x.shape), np.zeros(x.shape))

return dx

def softmax_forward(x):

""" Forward softmax

"""

exps = np.exp(x - np.max(x))

s = exps / exps.sum()

return s, s

def softmax_backward(dout, cache):

"""

Backward pass for softmax

:param dout: Upstream Jacobian

:param cache: contains the cache (in this case the output) for this layer

"""

s = cache

ds = np.diag(s) - np.outer(s, s.T)

dx = np.dot(dout, ds)

return dx

现在我们已经定义了层，让我们在forward 和backward passes中使用它们：

def forward_backward(x):

layer_to_cache = dict() # for each layer, we store the cache needed for backward pass

# Forward pass

a1, cache_a1 = affine_forward(x, w1, b1)

r1, cache_r1 = relu_forward(a1)

a2, cache_a2 = affine_forward(r1, w2, b2)

out, cache_out = softmax_forward(a2)

# backward pass

dout = np.diag(np.ones(out.size, )) # the derivatives of each output w.r.t. each output.

dout = softmax_backward(dout, cache_out)

dout = affine_backward(dout, cache_a2)

dout = relu_backward(dout, cache_r1)

dx = affine_backward(dout, cache_a1)

return out, dx

前馈输出是否正确？

out_fb = forward_backward(x0)[0]

out_tf = ffpass_tf(x0)

np.allclose(out_fb, out_tf, 1e-4)

>> True

我们的雅可比矩阵是否正确？

is_jacobian_correct(lambda x: forward_backward(x)[1], ffpass_tf)

>> True

需要多长时间呢？

%timeit forward_backward(x0)

>> 115 µs ± 2.38 µs

在Autograd需要3.7 ms的情况下，我们现在只需要115μs。

结论

我已经探索了几种可能的方法来计算雅可比矩阵，在CPU上使用Tensorflow，Autograd和Numpy。每种方法都有不同的优缺点。如果您已准备好指定层的forward 和backward passes，您可以直接使用Numpy获得更好性能。

我发现这很有用，因为网络需要在测试时有效运行是很常见的。在这些情况下，花一点时间来确保它们的实现是有效的是值得的。当需要以交互方式计算雅可比矩阵时，这更是如此。

python求雅可比矩阵_在Python中计算神经网络的雅可比矩阵相关推荐

用python求期望_用Python计算明日方舟2021龙门幸运墙期望
按照去年的惯例,方舟今年春节的时候也整了个红包盲盒. 比起去年简单粗暴的直接送,今年的盲盒实际上增加了两层隐性的保底机制:第一层是每天有两次机会而非一次,两次尝试取收益更高的结果:第二层是如果不幸成为 ...
python求移动平均_如何用NumPy计算移动平均值？
NumPy缺少特定于域的函数可能是由于核心团队的纪律性和对NumPy主指令的忠实性:提供了N维数组类型,以及创建和索引这些数组的函数.像许多基本目标一样,这个目标并不小,纽比做得很出色. 更大的Sci ...
python求完全平方数_【Python】【demo实验6】【练习实例】【完全平方数相关】
题目:一个整数,它加上100后是一个完全平方数,再加上168又是一个完全平方数,请问该数是多少? 程序分析:可填在百位.十位.个位的数字都是1.2.3.4.组成所有的排列后再去掉不满足条件的排列. ...
python求协方差矩阵_协方差矩阵python实现
当你有一个数据集,每一条数据都M种属性,然后你想知道M种属性对数据集的影响的时候.你需要用到协方差矩阵. 求协方差矩阵之前请一定要知道协方差矩阵是干嘛的,是表示属性之间关系的矩阵,协方差矩阵的规模只与 ...
java 求导函数_在MATLAB中计算数值导数的最佳方法是什么？
这些只是一些快速而肮脏的建议 . 希望有人会发现它们有用! 1. Do you have a symbolic function or a set of points? 如果您有符号功能,您可以分析计 ...
python 概率分布模型_使用python的概率模型进行公司估值
python 概率分布模型 Note from Towards Data Science's editors: While we allow independent authors to publis ...
python 时间序列预测_使用Python进行动手时间序列预测
python 时间序列预测 Time series analysis is the endeavor of extracting meaningful summary and statistical ...
python 多项式求系数_在Python中用于计算“多项式系数”的numpy / scipy函数
是否有任何 python函数(可能来自numpy或scipy)计算扩展中x ** r的系数(1 xx ** 2 x ** 3 - x **(k-1))** n ,其中k> = 1,n> = ...
python自定义函数求差_[VBA]发布一个计算桩号之差的Excel自定义函数（VBA）
这是一个可以计算桩号之差(也就是得到长度)的Excel(或WPS)扩展函数,可以减少工程师在统计工程量时的工作量. 该函数具有一定的通用性.可以在MS Office和金山WPS上使用. 文末会给出使用 ...

python求雅可比矩阵_在Python中计算神经网络的雅可比矩阵

python求雅可比矩阵_在Python中计算神经网络的雅可比矩阵相关推荐

最新文章

热门文章