
Neural networks get a bad reputation for being black boxes. And while it certainly takes creativity to understand their decision making, they are really not as opaque as people would have you believe.

神经网络被誉为黑匣子,因此声誉不佳。 尽管理解他们的决策当然需要创造力,但实际上他们并没有人们想像的那么模糊。

In this tutorial, I’ll show you how to use backpropagation to change the input as to classify it as whatever you would like.


Follow along using this colab.

继续使用此colab 。

(This work was co-written with Alfredo Canziani ahead of an upcoming video)

(此作品是在即将上映的视频之前与Alfredo Canziani共同撰写的)

人类像黑匣子 (Humans as black boxes)

Let’s consider the case of humans. If I show you the following input:

让我们考虑人类的情况。 如果我向您显示以下输入:

there’s a good chance you have no idea whether this is a 5 or a 6. In fact, I believe that I could even make a case for convincing you that this might also be an 8.


Now, if you asked a human what they would have to do to make something more into a 5 you might visually do something like this:


And if I wanted you to make this more into an 8, you might do something like this:


Now, the answer to this question is not easy to explain in a few if statements or by looking at a few coefficients (yes, I’m looking at you regression). Unfortunately, with certain types of inputs (images, sound, video, etc…) explainability certainly becomes much harder but not impossible.

现在,用几个if语句或查看几个系数不容易解释这个问题的答案(是的,我正在看您的回归)。 不幸的是,对于某些类型的输入(图像,声音,视频等),可解释性当然会变得更加困难, 但并非不可能

询问神经网络 (Asking the neural network)

How would a neural network answer the same questions I posed above? Well, to answer that, we can use gradient ascent to do exactly that.

神经网络将如何回答我上面提出的相同问题? 好吧,要回答这个问题,我们可以使用梯度上升来做到这一点。

Here’s how the neural network thinks we would need to modify the input to make it more into a 5.


There are two interesting results from this. First, the black areas are where the network things we need to remove pixel density from. Second, the yellow areas are where it thinks we need to add more pixel density.

有两个有趣的结果。 首先,黑色区域是我们需要从中移除像素密度的网络事物的地方。 其次,黄色区域是我们认为需要增加像素密度的地方。

We can take a step in that gradient direction by adding the gradients to the original image. We could of course repeat this procedure over and over again to eventually morph the input into the prediction we are hoping for.

通过将梯度添加到原始图像,我们可以朝该梯度方向迈出一步。 我们当然可以一遍又一遍地重复此过程,以最终将输入变形为我们希望的预测。

You can see that the black patch at the bottom left of the image is very similar to what a human might think to do as well.


Human adds black on the left corner. Network suggests the same
人类在左上角添加黑色。 网络提示相同

What about making the input look more like an 8? Here’s how the network thinks you would have to change the input.

如何使输入看起来更像8? 这是网络认为您必须更改输入的方式。

The notable things, here again, are that there is a black mass at the bottom left and a bright mass around the middle. If we add this with the input we get the following result:

同样值得注意的是,左下方有一个黑色块,中间有一个明亮的块。 如果将其与输入相加,将得到以下结果:

In this case, I’m not particularly convinced that we’ve turned this 5 into an 8. However, we’ve made less of a 5, and the argument to convince you this is an 8 would certainly be easier to win using the image on the right instead of the image on the left.


渐变是您的指南 (Gradients are your guides)

In regression analysis, we look at coefficients to tell us about what we’ve learned. In a random forest, we can look at decision nodes.

在回归分析中,我们查看系数以告诉我们所学的知识。 在随机森林中,我们可以查看决策节点。

In neural networks, it comes down to how creative we are at using gradients. To classify this digit, we generated a distribution over possible predictions.

在神经网络中,这取决于我们在使用渐变时的创造力 。 为了对该数字进行分类,我们在可能的预测上生成了分布。

This is what we call the forward pass.


During the forward pass we calculate a probability distribution over outputs

In code it looks like this (follow along using this colab):

在代码中,它看起来像这样( 继续使用此colab ):

Now imagine that we wanted to trick the network into predicting “5” for the input x. Then the way to do this is to give it an image (x), calculate the predictions for the image and then maximize the probablitity of predicting the label “5”.

现在想象一下,我们想诱使网络为输入x预测“ 5”。 然后,执行此操作的方法是为其提供图像(x),计算该图像的预测,然后最大化预测标签“ 5”的概率。

To do this we can use gradient ascent to calculate the gradients of a prediction at the 6th index (ie: label = 5) (p) with respect to the input x.

为此,我们可以使用梯度上升来计算相对于输入x的第6个索引(即label = 5)( p )的预测的梯度。

To do this in code we feed the input x as a parameter to the neural network, pick the 6th prediction (because we have labels: 0, 1, 2, 3, 4 , 5, …) and the 6th index means label “5”.

为此,我们将输入x作为参数输入神经网络,选择第6个预测(因为我们有标签:0、1、2、3、4、5,…),第6个索引表示标签“ 5 ”。

Visually this looks like:


Gradient of the prediction of a “5” with respect to the input.
相对于输入的“ 5”预测的梯度。

And in code:


When we call .backward() the process that happens can be visualized by the previous animation.

Now that we calculated the gradients, we can visualize and plot them:


The above gradient looks like random noise because the network has not yet been trained… However, once we do train the network, the gradients will be more informative:


通过回调自动化 (Automating this via Callbacks)

This is a hugely helpful tool in helping illuminate what happens inside your network as it trains. In this case, we would want to automate this process so that it happens automatically in training.

这是一个非常有用的工具,有助于阐明网络在训练过程中发生的情况。 在这种情况下,我们希望使该过程自动化,以便在训练中自动发生。

For this, we’ll use PyTorch Lightning to implement our neural network:

为此,我们将使用PyTorch Lightning来实现我们的神经网络:

import torch
import torch.nn.functional as F
import pytorch_lightning as plclass LitClassifier(pl.LightningModule):def __init__(self):super().__init__()self.l1 = torch.nn.Linear(28 * 28, 10)def forward(self, x):return torch.relu(self.l1(x.view(x.size(0), -1)))def training_step(self, batch, batch_idx):x, y = batchy_hat = self(x)loss = F.cross_entropy(y_hat, y)result = pl.TrainResult(loss)# enable the auto confused logit callbackself.last_batch = batchself.last_logits = y_hat.detach()result.log('train_loss', loss, on_epoch=True)return resultdef validation_step(self, batch, batch_idx):x, y = batchy_hat = self(x)loss = F.cross_entropy(y_hat, y)result = pl.EvalResult(checkpoint_on=loss)result.log('val_loss', loss)return resultdef configure_optimizers(self):return torch.optim.Adam(self.parameters(), lr=0.005)

The complicated code to automatically plot what we described here, can be abstracted out into a Callback in Lightning. A callback is a small program that is called at the parts of training you might care about.

可以自动绘制出此处描述内容的复杂代码,可以抽象为Lightning中的Callback。 回调是一个小程序,您可能会在培训的各个部分调用它。

In this case, when a training batch is processed, we want to generate these images in case some of the inputs are confused.


import torch
from pytorch_lightning import Callback
from torch import nnclass ConfusedLogitCallback(Callback):def __init__(self,top_k,projection_factor=3,min_logit_value=5.0,logging_batch_interval=20,max_logit_difference=0.1):super().__init__()self.top_k = top_kself.projection_factor = projection_factorself.max_logit_difference = max_logit_differenceself.logging_batch_interval = logging_batch_intervalself.min_logit_value = min_logit_valuedef on_train_batch_end(self, trainer, pl_module, batch, batch_idx, dataloader_idx):# show images only every 20 batchesif (trainer.batch_idx + 1) % self.logging_batch_interval != 0:return# pick the last batch and logitsx, y = batchtry:logits = pl_module.last_logitsexcept AttributeError as e:m = """please track the last_logits in the training_step like so:def training_step(...):self.last_logits = your_logits"""raise AttributeError(m)# only check when it has opinions (ie: the logit > 5)if logits.max() > self.min_logit_value:# pick the top two confused probs(values, idxs) = torch.topk(logits, k=2, dim=1)# care about only the ones that are at most eps close to each othereps = self.max_logit_differencemask = (values[:, 0] - values[:, 1]).abs() < epsif mask.sum() > 0:# pull out the ones we care aboutconfusing_x = x[mask, ...]confusing_y = y[mask]mask_idxs = idxs[mask]pl_module.eval()self._plot(confusing_x, confusing_y, trainer, pl_module, mask_idxs)pl_module.train()def _plot(self, confusing_x, confusing_y, trainer, model, mask_idxs):from matplotlib import pyplot as pltconfusing_x = confusing_x[:self.top_k]confusing_y = confusing_y[:self.top_k]x_param_a = nn.Parameter(confusing_x)x_param_b = nn.Parameter(confusing_x)batch_size, c, w, h = confusing_x.size()for logit_i, x_param in enumerate((x_param_a, x_param_b)):x_param = x_param.to(model.device)logits = model(x_param.view(batch_size, -1))logits[:, mask_idxs[:, logit_i]].sum().backward()# reshape gradsgrad_a = x_param_a.grad.view(batch_size, w, h)grad_b = x_param_b.grad.view(batch_size, w, h)for img_i in range(len(confusing_x)):x = confusing_x[img_i].squeeze(0).cpu()y = confusing_y[img_i].cpu()ga = grad_a[img_i].cpu()gb = grad_b[img_i].cpu()mask_idx = mask_idxs[img_i].cpu()fig, axarr = plt.subplots(nrows=2, ncols=3, figsize=(15, 10))self.__draw_sample(fig, axarr, 0, 0, x, f'True: {y}')self.__draw_sample(fig, axarr, 0, 1, ga, f'd{mask_idx[0]}-logit/dx')self.__draw_sample(fig, axarr, 0, 2, gb, f'd{mask_idx[1]}-logit/dx')self.__draw_sample(fig, axarr, 1, 1, ga * 2 + x, f'd{mask_idx[0]}-logit/dx')self.__draw_sample(fig, axarr, 1, 2, gb * 2 + x, f'd{mask_idx[1]}-logit/dx')trainer.logger.experiment.add_figure('confusing_imgs', fig, global_step=trainer.global_step)@staticmethoddef __draw_sample(fig, axarr, row_idx, col_idx, img, title):im = axarr[row_idx, col_idx].imshow(img)fig.colorbar(im, ax=axarr[row_idx, col_idx])axarr[row_idx, col_idx].set_title(title, fontsize=20)

But… we’ve made it even easier with pytorch-lightning-bolts which you can simply install


pip install pytorch-lightning-bolts

and import the callback into your training code


from pl_bolts.callbacks.vision import ConfusedLogitCallbacktrainer = Trainer(callbacks=[ConfusedLogitCallback(1)])

放在一起 (Putting it all together)

Finally we can train our model and automatically generate images when logits are “confused”


# data
dataset = MNIST(os.getcwd(), download=True, transform=transforms.ToTensor())
train, val = random_split(dataset, [55000, 5000])# model
model = LitClassifier()# attach callback
trainer = Trainer(callbacks=[ConfusedLogitCallback(1)])# train!
trainer.fit(model, DataLoader(train, batch_size=64), DataLoader(val, batch_size=64))

and tensorboard will automatically generate images that look like this:


摘要 (Summary)

In summary: You learned how to look inside the blackbox using PyTorch, learned the intuition, wrote a callback in PyTorch Lightning and automatically got your Tensorboard instance to plot questionable predictions

简介:您学习了如何使用PyTorch在黑盒中查看内容,了解了直观知识,在PyTorch Lightning中编写了回调函数,并自动获取了Tensorboard实例以绘制可疑的预测

Try it yourself with PyTorch Lightning and PyTorch Lightning Bolts.

使用PyTorch Lightning和PyTorch Lightning Bolts自己尝试一下。

(This article was written ahead of an upcoming video where me (William) and Alfredo Canziani show you how to code this from scratch).


翻译自: https://towardsdatascience.com/peering-inside-the-blackbox-how-to-trick-a-neural-network-757c90a88a73




