
“DeepDream is an experiment that visualizes the patterns learned by a neural network. Similar to when a child watches clouds and tries to interpret random shapes, DeepDream over-interprets and enhances the patterns it sees in an image.

“ DeepDream是一个可视化神经网络学习模式的实验。 类似于孩子看着云并尝试解释随机形状时,DeepDream会过度解释并增强其在图像中看到的图案。

It does so by forwarding an image through the network, then calculating the gradient of the image with respect to the activations of a particular layer. The image is then modified to increase these activations, enhancing the patterns seen by the network, and resulting in a dream-like image. This process was dubbed “Inceptionism” (a reference to InceptionNet, and the movie Inception).”

它是通过网络转发图像,然后计算图像相对于特定层的激活的梯度来实现的。 然后修改图像以增加这些激活,增强网络可见的模式,并生成类似梦境的图像。 这个过程被称为“ Inceptionism”(对InceptionNet和电影 Inception的引用)。

https://www.tensorflow.org/tutorials/generative/deep dream


Let me break it down for you. Consider a Convolutional Neural Network.

让我为您分解。 考虑卷积神经网络。

Let us assume we want to check what happens when we increase the highlighted neuron activation h_{i,j}, and we want to reflect these changes onto input image when we increase these activations.

让我们假设我们想检查一下当增加突出显示的神经元激活h_ {i,j}时会发生什么并且我们想在增加这些激活时将这些变化反映到输入图像上。

In other words, we are optimizing image so that neuron h_{i,j} fires more.

换句话说,我们正在优化图像,以使神经元h_ {i,j }激发更多。

We can pose this optimization problem as:


Image for post
Image by Author

That is, we need to maximize the square norm (in simple words magnitude), of h_{i,j} by changing image.

也就是说,我们需要通过更改图像来最大化h_ {i,j}的平方范数(简单而言,幅度)。

Here is what happens when we do as said above.


Image for post
Left) Image by Von.grzanka | ( )Von.grzanka的图片| ( Right) Image by )图片来自 Tensorflow Tutorial Tensorflow教程

The reason here is that, when the CNN was trained, the neuron in intermediate layers learned to see some patterns(here dog faces). When we increased those activations, the input image started containing more and more dog faces, to maximize the activations.

原因是,当训练CNN时,中间层的神经元学会了看到一些模式(这里是狗的脸)。 当我们增加这些激活时,输入图像开始包含越来越多的狗脸,以最大化激活。

对文本实施深层梦想的直觉。 (The intuition behind implementing deep dream over text.)

Like deep dream in the image, what if we take any hidden layer activation and try to increase its norm, what will happen to the text input?To answer this, a text-classification model was taken and loss function was set-up to increase the magnitude of the hidden layer’s activation.We would expect to see the patterns/representation learned by this hidden layer.


模型 (Model)

Image for post
Image by Author

A model trained to classify IMDB reviews was used. The model achieved validation accuracy of 80.10%.

使用了经过训练可对IMDB评论进行分类的模型。 该模型的验证精度为80.10%

The experiment was set up to capture the representation given by Fully Connected layer 2 or FC2 in short with 512 dimensions.


The cost function used was the norm of fc2 output.


Note: Because “Sequence of Words” are long tensors, they cannot be optimized by back-propagation. Instead, embedding representation of sentence was optimized.

注意:由于“单词序列”是长张量,因此无法通过反向传播进行优化。 相反,优化了句子的嵌入表示。

程序概要 (Outline of Procedure)

Step 1: Convert sentence to tensors.


Step 2: Get the sentence embeddings.


Step 3: Pass through the fc2 layer, and get the fc2 output.

步骤3:穿过fc2层 ,并获得fc2输出

Step 4: Optimize the sentence embeddings to increase fc2 layer output.


Step 5: Repeat step 2 to step 4 with the current sentence embeddings for a given number of iteration.


Image for post
Image by Author
def dream(input, model, iterations, lr):""" Updates the image to maximize outputs for n iterations """model.eval()out, orig_embeddings, hidden = model(input)model.train()losses = []embeddings_steps = []embeddings = torch.autograd.Variable(orig_embeddings.mean(1), requires_grad=True)embeddings_steps.append(embeddings.clone())for i in range(iterations):out, embeddings, hidden = model.forward_from_embeddings(embeddings)loss = hidden.norm()embeddings.retain_grad()loss.backward(retain_graph=True)avg_grad = np.abs(embeddings.grad.data.cpu().numpy()).mean()norm_lr = lr / avg_gradembeddings.data += norm_lr * embeddings.grad.datamodel.zero_grad()embeddings.grad.data.zero_()losses.append(loss.item())embeddings_steps.append(embeddings.clone())plt.plot(losses)plt.title('activation\'s norm vs iteration')plt.ylabel('activation norm')plt.xlabel('iterations')embeddings_steps = torch.cat(embeddings_steps, dim=0).detach().numpy()return embeddings_steps

结果 (Results)

实验1 (Experiment 1)

Simple sentences were used to get classification results and their corresponding sentence embedding we saved.


For example: “I hate this.” , “I love this show.”, we used to classify. These sentences are very simple and convey a negative and positive emotion respectively.

例如: “我讨厌这个。” ,“我喜欢这个节目。” ,我们曾经进行过分类。 这些句子非常简单,分别传达了负面和正面情感。

Dreaming or optimization over these embeddings was done and a graph of activation over iteration was recorded.

完成了对这些嵌入的梦想或优化 ,并记录了迭代过程中的激活图。

Image for post
Image by Author

There are a couple of things that can be observed here.


  1. Activation of the hidden layer’s representation increased almost linearly for both these sentences
  2. Activations of these sentences are different, which means that model can easily differentiate between these two sentences.

For the sentence: “I hate this”. The model correctly predicts this as negative.

对于这句话:“我讨厌这个” 模型正确地将此预测为

相似词测试 (Similar words test)

First, we observe what are the words similar (cosine similarity) to the sentence embeddings before and after dreaming.


Image for post
Image by Author | Word cloud based on similarity. Large font means more similar.
图片作者 基于相似度的词云。 大字体意味着更相似。

Initially, sentence embedding as more similar to neutral words like “this, it, even, same” but as we increased the magnitude of the fc2 activations, the sentence embedding became similar to words like “bad, nothing, worse” which convey a negative meaning, which it makes sense, as the model predicted it a negative sentence.

最初,句子的嵌入与“ this,it,even,same ”之类的中性词更相似,但是随着我们增加fc2激活的幅度,句子的嵌入变得与诸如“ 坏,没事,更糟 ”之类的词相似这传达了否定的含义。意思是有意义的,因为模型预测它为否定句。

可视化迭代中的嵌入。 (Visualizing embeddings over iterations.)

To visualize embeddings over iteration, TSNE algorithm was used to reduce the embedding dimension from 100 to 2. These embeddings were plotted on a 2d map with red dots as negative words(like a bad, worse, mean, mistake) and green dots as positive words(like great, celebrated, wonderful).

为了可视化迭代中的嵌入,使用TSNE算法将嵌入维数从100减小到2 。 这些嵌入被绘制在二维地图上,其中红色点表示否定词(例如,坏,坏,平均,错误), 绿色点表示阳性词(例如好,著名,奇妙)。

Grey dots are intermediate locations of sentence embedding and the black dot is the final location of sentence embedding.

灰点是句子嵌入的中间位置, 黑点是句子嵌入的最终位置。

Image for post
Image by Author | 100 dimensional embeddings on 2d space
图片作者 二维空间上的100维嵌入

The graph clearly shows that embeddings got away from positive words and got near negative words and this is in tune with the model prediction. Moreover, the final sentence embedding is now more similar to red dots(negative words) than green dots(positive words).

该图清楚地表明,嵌入远离正词而接近负词,这与模型预测相符。 而且,最终句子的嵌入现在比红点(否定词)更像红点(否定词)。

For the sentence: “I love this show.”. The model correctly predicts this as positive.

对于这句话:“我喜欢这个节目。” 模型正确地将其预测为

相似词测试 (Similar words test)

Image for post
Image by Author | Word cloud based on similarity. Large font means more similar.
图片作者 基于相似度的词云。 大字体意味着更相似。

Initially sentence embedding as more similar to neutral words like “this, it, even same” but as we increased the magnitude of the fc2 activations, the sentence embedding became similar to positive words like “great, unique”, which makes sense, as the model predicted it a positive sentence. Visualizing embeddings over iterations.

最初的句子嵌入与诸如“ this,it,even same”之类的中性词更相似,但是随着我们增加fc2激活的幅度,句子嵌入变得与诸如“伟大的,独特的”之类的正面词相似,这很有意义。模型预测它是一个肯定的句子。 可视化迭代中的嵌入。

Image for post
Image by Author | 100 dimensional embeddings on 2d space
图片作者 二维空间上的100维嵌入

The key observation here is that initially, the sentence embedding was in between positive and negative words, but as dreaming progresses the embeddings were pushed away from negative words.

此处的主要观察结果是,最初,句子的嵌入介于正词和负词之间,但是随着梦想的进行 ,嵌入被从负词推开。

结论 (Conclusion)

The word embeddings after dreaming become similar to the words in `model prediction`, though if we look at similar words of initial embeddings, they were more or less same for the two sentences even when they were conveying very different meanings, final sentence embedding showed some interesting patterns.




1. negative prediction was pushed near to words like mistake, dirty, bad

1. 负面的预测被推到错误,肮脏,坏的地方

2. positive prediction was pushed near to words like unique, great, celebrated

2. 积极的预测被推到了诸如独特,伟大,著名的词语附近


step_*” ) step_ * ”指定)

We visualize the embeddings of sentences. Observe how the sentence embedding change over iteration (specified by step_* )

我们将句子的嵌入可视化。 观察句子嵌入在迭代过程中如何变化(由step_ *指定)

Observe how sentence embedding starts from step_1 and move to step_21. The sentence embedding started in between positive and negative words and as algorithm dreams, the embedding move towards positive words.

观察如何从步骤_1 开始嵌入句子,然后移至步骤21。 句子的嵌入开始于 负词之间,并且随着算法的梦想,嵌入趋向于正词。

You can try a few more things on hosted embeddings on TensorFlow projector here.


试试这些。 (Try these things.)

  1. Observe embeddings in 3d.
  2. Find words similar to step_1.


  3. Find words similar to step_21.


实验2 (Experiment 2)

We will use difficult sentences now. Sentences which convey one emotion in the first half but change the sentiment in the second half.

我们现在将使用困难的句子。 在上半段传达一种情感但在下半段改变情感的句子。

Sentences like


  • The show was very long and boring but the direction was amazing.
  • I hated the show because of nudity but the acting was classy.

These sentences are difficult for a human to judge what kind of emotion they convey.


Again we will optimize the sentence embeddings, to maximize activations in fc2 layer.


Image for post
Image by Author | Norm of fc2 layer activations as a function of iterations
图片作者 fc2层激活的规范与迭代的关系

Unlike the first case. Activations of the two sentences don’t diverge much, i.e. activations for these sentence are more or less similar, this means there is no classifying power of the model for these sentences.Let’s look at the nearby words of these sentences before and after dreaming.

与第一种情况不同。 这两个句子的激活差异不大,也就是说,这两个句子的激活或多或少相似,这意味着这些句子没有模型的分类能力。让我们在做梦之前和之后看看这些句子附近的单词。

For sentence: “The show was very long and boring but the direction was really amazing.”, the model predicted positive

对于句子: “演出很长很无聊,但方向真的很棒。”,该模型预测为积极

Similar Word test


We will find words similar to initial vs final sentence embedding.


Image for post
Image by Author

Hmm, even though the sentence was classified as positive, the words similar to final sentence embedding don’t reflect any positive sentiment.


For sentence: “I hated the show because of nudity but the acting was really classy.”, the model predicted negative

对于句子: “我因为裸露而讨厌演出,但表演确实很优雅。”,该模型预测为负面

Similar Word test


Image for post
Image by Author

The sentence was classified as negative, the embeddings after dreaming reflect negative sentiment.


结论 (Conclusion)

Because the model had no clear understanding of these sentences, the sentence embedding of these two sentences after dreaming are almost similar(look at similar words after dreaming). This is because model does not have a rich representation of these sentences in its hidden layers.

因为模型对这些句子没有清晰的了解,所以梦后这两个句子的句子嵌入几乎相似(梦后看相似的词)。 这是因为模型在其隐藏层中没有这些句子的丰富表示。

We started by looking at how deep dream on images work, then we proposed how we can implement deep dream over text. Finally, we have shown how to correctly interpret the results. This method can be used to understand what kind of hidden representation the language model has learnt.

我们首先研究图像上的深梦,然后提出如何在文本上实现深梦。 最后,我们展示了如何正确解释结果。 此方法可用于了解语言模型学习了哪种隐藏表示。

Experiments like these help us understand these black boxes better.


You can try a demo in this notebook.

您可以在此笔记本中尝试演示 。

All the related code is available at my Github Repo.

我的Github Repo上提供了所有相关代码。

翻译自: https://towardsdatascience.com/dreaming-over-text-f6745c829cee




