ai css 线条粗细

by Emil Wallner

埃米尔·沃尔纳(Emil Wallner)

如何训练AI将您的设计模型转换为HTML和CSS (How you can train an AI to convert your design mockups into HTML and CSS)

Within three years, deep learning will change front-end development. It will increase prototyping speed and lower the barrier for building software.

三年之内,深度学习将改变前端开发。 它将提高原型制作速度,并降低构建软件的障碍。

The field took off last year when Tony Beltramelli introduced the pix2code paper and Airbnb launched sketch2code.

去年,当Tony Beltramelli提出pix2code纸,而Airbnb推出了sketch2code时,这一领域开始兴起 。

Currently, the largest barrier to automating front-end development is computing power. However, we can use current deep learning algorithms, along with synthesized training data, to start exploring artificial front-end automation right now.

当前,自动化前端开发的最大障碍是计算能力。 但是,我们可以使用当前的深度学习算法以及综合的训练数据来立即开始探索人工前端自动化。

In this post, we’ll teach a neural network how to code a basic a HTML and CSS website based on a picture of a design mockup. Here’s a quick overview of the process:

在这篇文章中,我们将教一个神经网络如何基于设计模型的图片编写一个基本HTML和CSS网站。 以下是该过程的快速概述:

1)将设计图像提供给经过训练的神经网络 (1) Give a design image to the trained neural network)

2)神经网络将图像转换为HTML标记 (2) The neural network converts the image into HTML markup)

3)渲染输出 (3) Rendered output)

We’ll build the neural network in three iterations.

我们将通过三个迭代来构建神经网络。

First, we’ll make a bare minimum version to get a hang of the moving parts. The second version, HTML, will focus on automating all the steps and explaining the neural network layers. In the final version, Bootstrap, we’ll create a model that can generalize and explore the LSTM layer.

首先,我们将制作一个最低限度的最低版本,以掌握更多的活动部件。 第二个版本HTML将着重于自动化所有步骤并解释神经网络层。 在最终版本Bootstrap中,我们将创建一个可以概括和探索LSTM层的模型。

All the code is prepared on GitHub and FloydHub in Jupyter notebooks. All the FloydHub notebooks are inside the floydhub directory and the local equivalents are under local.

所有代码均在Jupyter笔记本中的GitHub和FloydHub上准备。 所有FloydHub笔记本都在floydhub目录中,而本地等效项在local下。

The models are based on Beltramelli‘s pix2code paper and Jason Brownlee’s image caption tutorials. The code is written in Python and Keras, a framework on top of TensorFlow.

这些模型基于Beltramelli的pix2code纸和Jason Brownlee的图像说明教程 。 该代码是用Python和Keras(TensorFlow之上的框架)编写的。

If you’re new to deep learning, I’d recommend getting a feel for Python, backpropagation, and convolutional neural networks. My three earlier posts on FloydHub’s blog will get you started:

如果您是深度学习的新手,建议您对Python,反向传播和卷积神经网络有所了解。 我在FloydHub博客上发表的前三篇文章将帮助您入门:

  • My First Weekend Of Deep Learning

    我的深度学习的第一个周末

  • Coding The History Of Deep Learning

    编码深度学习的历史

  • Colorizing B&W Photos with Neural Networks

    使用神经网络为黑白照片着色

核心逻辑 (Core Logic)

Let’s recap our goal. We want to build a neural network that will generate HTML/CSS markup that corresponds to a screenshot.

让我们回顾一下我们的目标。 我们要构建一个神经网络,该网络将生成与屏幕截图相对应HTML / CSS标记。

When you train the neural network, you give it several screenshots with matching HTML.

训练神经网络时,您会给它几个具有匹配HTML的屏幕截图。

It learns by predicting all the matching HTML markup tags one by one. When it predicts the next markup tag, it receives the screenshot as well as all the correct markup tags until that point.

它通过逐一预测所有匹配HTML标记来学习。 当它预测下一个标记标签时,它将接收屏幕截图以及所有正确的标记标签,直到该点为止。

Here is a simple training data example in a Google Sheet.

这是Google表格中的一个简单的训练数据示例 。

Creating a model that predicts word by word is the most common approach today. There are other approaches, but that’s the method we’ll use throughout this tutorial.

创建一个可以逐词预测的模型是当今最常见的方法。 还有其他方法 ,但这就是我们在本教程中将使用的方法。

Notice that for each prediction it gets the same screenshot. So if it has to predict 20 words, it will get the same design mockup twenty times. For now, don’t worry about how the neural network works. Focus on grasping the input and output of the neural network.

注意,对于每个预测,它都会获得相同的屏幕截图。 因此,如果必须预测20个单词,它将获得20次相同的设计样机。 现在,不必担心神经网络如何工作。 专注于掌握神经网络的输入和输出。

Let’s focus on the previous markup. Say we train the network to predict the sentence “I can code.” When it receives “I,” then it predicts “can.” Next time it will receive “I can” and predict “code.” It receives all the previous words and only has to predict the next word.

让我们集中于上一个标记。 假设我们训练网络以预测句子“我可以编码”。 当它收到“ I”时,它将预测“可以”。 下次它将收到“我可以”并预测“代码”。 它接收所有先前的单词,而只需要预测下一个单词。

The neural network creates features from the data. The network builds features to link the input data with the output data. It has to create representations to understand what is in each screenshot, the HTML syntax, that it has predicted. This builds the knowledge to predict the next tag.

神经网络根据数据创建特征。 网络构建了将输入数据与输出数据链接的功能。 它必须创建表示形式以了解每个屏幕快照中所预测的内容(HTML语法)。 这将积累知识以预测下一个标签。

When you want to use the trained model for real-world usage, it’s similar to when you train the model. The text is generated one by one with the same screenshot each time. Instead of feeding it with the correct HTML tags, it receives the markup it has generated so far. Then, it predicts the next markup tag. The prediction is initiated with a “start tag” and stops when it predicts an “end tag” or reaches a max limit. Here’s another example in a Google Sheet.

当您想将训练后的模型用于实际使用时,与训练模型时相似。 每次使用相同的屏幕截图一一生成文本。 它没有提供正确HTML标记,而是接收到目前为止生成的标记。 然后,它预测下一个标记标签。 预测以“开始标签”开始,并在预测“结束标签”或达到最大限制时停止。 这是Google表格中的另一个示例。

“ Hello World”版本 (“Hello World” Version)

Let’s build a “hello world” version. We’ll feed a neural network a screenshot with a website displaying “Hello World!” and teach it to generate the markup.

让我们构建一个“ hello world”版本。 我们将使用显示“ Hello World!”的网站为神经网络提供屏幕截图。 并教它生成标记。

First, the neural network maps the design mockup into a list of pixel values. From 0–255 in three channels — red, blue, and green.

首先,神经网络将设计模型映射到像素值列表中。 从0到255在三个通道中-红色,蓝色和绿色。

To represent the markup in a way that the neural network understands, I use one hot encoding. Thus, the sentence “I can code” could be mapped like the below.

为了以神经网络理解的方式表示标记,我使用一种热编码 。 因此,句子“我可以编码”可以像下面这样映射。

In the above graphic, we include the start and end tag. These tags are cues for when the network starts its predictions and when to stop.

在上图中,我们包含了开始和结束标记。 这些标记是网络何时开始其预测以及何时停止的线索。

For the input data, we will use sentences, starting with the first word and then adding each word one by one. The output data is always one word.

对于输入数据,我们将使用句子,从第一个单词开始,然后将每个单词一个接一个地添加。 输出数据始终为一个字。

Sentences follow the same logic as words. They also need the same input length. Instead of being capped by the vocabulary, they are bound by maximum sentence length. If it’s shorter than the maximum length, you fill it up with empty words, a word with just zeros.

句子的逻辑与单词相同。 它们还需要相同的输入长度。 它们不受最大词汇量的限制,而不受词汇量的限制。 如果它小于最大长度,则将其填充为空单词,即只有零的单词。

As you see, words are printed from right to left. This forces each word to change position for each training round. This allows the model to learn the sequence instead of memorizing the position of each word.

如您所见,单词是从右到左打印的。 这迫使每个单词改变每个训练回合的位置。 这使模型可以学习顺序,而不必记住每个单词的位置。

In the below graphic there are four predictions. Each row is one prediction. To the left are the images represented in their three color channels: red, green and blue and the previous words. Outside of the brackets are the predictions one by one, ending with a red square to mark the end.

在下图中,有四个预测。 每一行都是一个预测。 左侧是以三个颜色通道表示的图像:红色,绿色和蓝色以及前一个单词。 括号之外是一个接一个的预测,以红色正方形标记结束。

#Length of longest sentencemax_caption_len = 3#Size of vocabulary vocab_size = 3
# Load one screenshot for each word and turn them into digits images = []for i in range(2):    images.append(img_to_array(load_img('screenshot.jpg', target_size=(224, 224))))images = np.array(images, dtype=float)# Preprocess input for the VGG16 modelimages = preprocess_input(images)
#Turn start tokens into one-hot encodinghtml_input = np.array(            [[[0., 0., 0.], #start             [0., 0., 0.],             [1., 0., 0.]],             [[0., 0., 0.], #start <HTML>Hello World!</HTML>             [1., 0., 0.],             [0., 1., 0.]]])
#Turn next word into one-hot encodingnext_words = np.array(            [[0., 1., 0.], # <HTML>Hello World!</HTML>             [0., 0., 1.]]) # end
# Load the VGG16 model trained on imagenet and output the classification featureVGG = VGG16(weights='imagenet', include_top=True)# Extract the features from the imagefeatures = VGG.predict(images)
#Load the feature to the network, apply a dense layer, and repeat the vectorvgg_feature = Input(shape=(1000,))vgg_feature_dense = Dense(5)(vgg_feature)vgg_feature_repeat = RepeatVector(max_caption_len)(vgg_feature_dense)# Extract information from the input seqence language_input = Input(shape=(vocab_size, vocab_size))language_model = LSTM(5, return_sequences=True)(language_input)
# Concatenate the information from the image and the inputdecoder = concatenate([vgg_feature_repeat, language_model])# Extract information from the concatenated outputdecoder = LSTM(5, return_sequences=False)(decoder)# Predict which word comes nextdecoder_output = Dense(vocab_size, activation='softmax')(decoder)# Compile and run the neural networkmodel = Model(inputs=[vgg_feature, language_input], outputs=decoder_output)model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
# Train the neural networkmodel.fit([features, html_input], next_words, batch_size=2, shuffle=False, epochs=1000)

In the hello world version, we use three tokens: start, <HTML><center><H1>Hello World!<;/H1&gt;</center></HTML> and end. A token can be anything. It can be a character, word, or sentence. Character versions require a smaller vocabulary but constrain the neural network. Word level tokens tend to perform best.

在hello world版本中,我们使用三个标记: start<HTML><center><H1>Hello World!< ; / H1& gt; </ center> </ HTML>并结束。 令牌可以是任何东西。 它可以是字符,单词或句子。 字符版本需要较小的词汇量,但会限制神经网络。 单词级令牌往往表现最佳。

Here we make the prediction:

在这里我们进行预测:

# Create an empty sentence and insert the start tokensentence = np.zeros((1, 3, 3)) # [[0,0,0], [0,0,0], [0,0,0]]start_token = [1., 0., 0.] # startsentence[0][2] = start_token # place start in empty sentence    # Making the first prediction with the start tokensecond_word = model.predict([np.array([features[1]]), sentence])    # Put the second word in the sentence and make the final predictionsentence[0][1] = start_tokensentence[0][2] = np.round(second_word)third_word = model.predict([np.array([features[1]]), sentence])    # Place the start token and our two predictions in the sentence sentence[0][0] = start_tokensentence[0][1] = np.round(second_word)sentence[0][2] = np.round(third_word)    # Transform our one-hot predictions into the final tokensvocabulary = ["start", "<HTML><center><H1>Hello World!</H1></center></HTML>", "end"]for i in sentence[0]:print(vocabulary[np.argmax(i)], end=' ')

输出量 (Output)

  • 10 epochs: start start start

    10个纪元: start start start

  • 100 epochs: start <HTML><center><H1>Hello World!</H1></center></HTML> <HTML><;center><H1>Hello World!</H1></center></HTML>

    100个纪元: start <HTML><center><H1>Hello World!</H1></center></HTML> <HTML>< ; center> <H1> Hello World!</ H1> </ center> < / HTML>

  • 300 epochs: start <HTML><center><H1>Hello World!</H1></center></HTML> end

    300个纪元: start <HTML><center><H1>Hello World!</H1 > </ center> </ HTML>结束

我犯的错误: (Mistakes I made:)

  • Build the first working version before gathering the data. Early on in this project, I managed to get a copy of an old archive of the Geocities hosting website. It had 38 million websites. Blinded by the potential, I ignored the huge workload that would be required to reduce the 100K-sized vocabulary.

    在收集数据之前构建第一个工作版本。 在该项目的早期,我设法获得了Geocities托管网站的旧存档的副本。 它拥有3800万个网站。 被潜力所蒙蔽,我忽略了减少10万字汇所需的巨大工作量。

  • Dealing with a terabyte worth of data requires good hardware or a lot of patience. After having my mac run into several problems I ended up using a powerful remote server. Expect to rent a rig with 8 modern CPU cores and a 1GPS internet connection to have a decent workflow.

    处理TB级的数据需要良好的硬件或足够的耐心。 在Mac遇到几个问题之后,我最终使用了功能强大的远程服务器。 期望租用8个现代CPU内核和1GPS互联网连接的钻机,以实现良好的工作流程。

  • Nothing made sense until I understood the input and output data. The input, X, is one screenshot and the previous markup tags. The output, Y, is the next markup tag. When I got this, it became easier to understand everything between them. It also became easier to experiment with different architectures.

    在我理解输入和输出数据之前,没有任何意义。 输入X是一个屏幕截图和之前的标记标签。 输出Y是下一个标记标签。 当我明白了这一点后,就可以更轻松地了解它们之间的所有内容。 尝试不同的体系结构也变得更加容易。

  • Be aware of the rabbit holes. Because this project intersects with a lot of fields in deep learning, I got stuck in plenty of rabbit holes along the way. I spent a week programming RNNs from scratch, got too fascinated by embedding vector spaces, and was seduced by exotic implementations.

    注意兔子的洞。 因为这个项目与深度学习的许多领域相交,所以我一路陷入了很多兔子洞。 我花了一个星期从头开始对RNN进行编程,对嵌入向量空间太着迷了,并被奇特的实现所吸引。

  • Picture-to-code networks are image caption models in disguise. Even when I learned this, I still ignored many of the image caption papers, simply because they were less cool. Once I got some perspective, I accelerated my learning of the problem space.

    图片编码网络是变相的图像字幕模型。 即使知道了这一点,我仍然忽略了许多图像标题文件,只是因为它们不那么酷。 一旦有了观点,我就会加快对问题空间的学习。

在FloydHub上运行代码 (Running the code on FloydHub)

FloydHub is a training platform for deep learning. I came across them when I first started learning deep learning and I’ve used them since for training and managing my deep learning experiments. You can run your first model within 30 seconds by clicking this button:

FloydHub是深度学习的培训平台。 我刚开始学习深度学习时就遇到了它们,从那时起我就一直使用它们来训练和管理我的深度学习实验。 您可以通过以下按钮在30秒内运行您的第一个模型:

It opens a Workspace on FloydHub where you will find the same environment and dataset used for the Bootstrap version. You can also find the trained models for testing.

它将在FloydHub上打开一个工作区 ,您将在其中找到与Bootstrap版本相同的环境和数据集。 您还可以找到训练有素的模型进行测试。

Or you can do a manual installation by following these steps: 2-min installation or my 5-minute walkthrough.

或者,您可以按照以下步骤进行手动安装: 2分钟的安装或5分钟的演练。

克隆存储库 (Clone the repository)

git clone https://github.com/emilwallner/Screenshot-to-code-in-Keras.git

登录并启动FloydHub命令行工具 (Login and initiate FloydHub command-line-tool)

cd Screenshot-to-code-in-Kerasfloyd loginfloyd init s2c

在FloydHub云GPU机器上运行Jupyter笔记本: (Run a Jupyter notebook on a FloydHub cloud GPU machine:)

floyd run --gpu --env tensorflow-1.4 --data emilwallner/datasets/imagetocode/2:data --mode jupyter

All the notebooks are prepared inside the FloydHub directory. The local equivalents are under local. Once it’s running, you can find the first notebook here: floydhub/Helloworld/helloworld.ipynb .

所有笔记本均在FloydHub目录中准备。 本地等效项在本地下。 一旦运行,您可以在这里找到第一个笔记本:floydhub / Hello world / hello world.ipynb。

If you want more detailed instructions and an explanation for the flags, check my earlier post.

如果您需要更详细的说明和标志说明,请查看我之前的文章 。

HTML版本 (HTML Version)

In this version, we’ll automate many of the steps from the Hello World model. This section will focus on creating a scalable implementation and the moving pieces in the neural network.

在此版本中,我们将自动执行Hello World模型中的许多步骤。 本节将重点介绍在神经网络中创建可扩展的实现和活动部分。

This version will not be able to predict HTML from random websites, but it’s still a great setup to explore the dynamics of the problem.

这个版本将无法预测来自随机网站HTML,但它仍然是探索问题动态的绝佳选择。

总览 (Overview)

If we expand the components of the previous graphic it looks like this.

如果我们扩展上一个图形的组件,它看起来像这样。

There are two major sections. First, the encoder. This is where we create image features and previous markup features. Features are the building blocks that the network creates to connect the design mockups with the markup. At the end of the encoder, we glue the image features to each word in the previous markup.

有两个主要部分。 首先是编码器。 我们在这里创建图像功能和以前的标记功能。 功能是网络创建的构建块,用于将设计模型与标记连接起来。 在编码器的末尾,我们将图像特征粘贴到上一个标记中的每个单词上。

The decoder then takes the combined design and markup feature and creates a next tag feature. This feature is run through a fully connected neural network to predict the next tag.

然后,解码器采用组合的设计和标记功能,并创建下一个标签功能。 此功能通过完全连接的神经网络运行,以预测下一个标签。

设计样机功能 (Design mockup features)

Since we need to insert one screenshot for each word, this becomes a bottleneck when training the network (example). Instead of using the images, we extract the information we need to generate the markup.

由于我们需要为每个单词插入一个屏幕截图,因此这在训练网络时会成为瓶颈( 示例 )。 代替使用图像,我们提取生成标记所需的信息。

The information is encoded into image features. This is done by using an already pre-trained convolutional neural network (CNN). The model is pre-trained on Imagenet.

该信息被编码为图像特征。 这是通过使用已经预训练的卷积神经网络(CNN)来完成的。 该模型已在Imagenet上进行了预训练。

We extract the features from the layer before the final classification.

在最终分类之前,我们从图层中提取要素。

We end up with 1536 eight by eight pixel images known as features. Although they are hard to understand for us, a neural network can extract the objects and position of the elements from these features.

我们最终得到1536个八乘八像素的图像,称为特征。 尽管它们对我们来说很难理解,但是神经网络可以从这些特征中提取对象和元素的位置。

标记功能 (Markup features)

In the hello world version, we used a one-hot encoding to represent the markup. In this version, we’ll use a word embedding for the input and keep the one-hot encoding for the output.

在hello world版本中,我们使用单热编码来表示标记。 在此版本中,我们将对输入使用单词嵌入,并为输出保留一键编码。

The way we structure each sentence stays the same, but how we map each token is changed. One-hot encoding treats each word as an isolated unit. Instead, we convert each word in the input data to lists of digits. These represent the relationship between the markup tags.

我们构造每个句子的方式保持不变,但是我们映射每个标记的方式却发生了变化。 一键式编码将每个单词视为一个隔离的单位。 相反,我们将输入数据中的每个单词转换为数字列表。 这些代表标记之间的关系。

The dimension of this word embedding is eight but often varies between 50–500 depending on the size of the vocabulary.

该单词嵌入的维数为8,但通常在50–500之间变化,具体取决于词汇量。

The eight digits for each word are weights similar to a vanilla neural network. They are tuned to map how the words relate to each other (Mikolov et al., 2013).

每个单词的八位数字的权重类似于香草神经网络。 调整它们以映射单词之间的相互关系( Mikolov等,2013 )。

This is how we start developing markup features. Features are what the neural network develops to link the input data with the output data. For now, don’t worry about what they are, we’ll dig deeper into this in the next section.

这就是我们开始开发标记功能的方式。 神经网络开发的功能就是将输入数据与输出数据链接起来。 现在,不必担心它们是什么,我们将在下一部分中对此进行更深入的研究。

编码器 (The Encoder)

We’ll take the word embeddings and run them through an LSTM and return a sequence of markup features. These are run through a Time distributed dense layer — think of it as a dense layer with multiple inputs and outputs.

我们将使用嵌入一词,并通过LSTM运行它们,并返回一系列标记功能。 它们贯穿时间分布密集层-将其视为具有多个输入和输出的密集层。

In parallel, the image features are first flattened. Regardless of how the digits were structured, they are transformed into one large list of numbers. Then we apply a dense layer on this layer to form a high-level feature. These image features are then concatenated to the markup features.

并行地,图像特征首先被展平。 无论数字的结构如何,它们都将转换为一个大数字列表。 然后,在该层上应用密集层以形成高级功能。 然后将这些图像特征连接到标记特征。

This can be hard to wrap your mind around — so let’s break it down.

这可能很难使您的想法转瞬即逝,所以让我们分解一下。

标记功能 (Markup features)

Here we run the word embeddings through the LSTM layer. In this graphic, all the sentences are padded to reach the maximum size of three tokens.

在这里,我们将单词嵌入贯穿LSTM层。 在此图中,所有句子都被填充以达到三个标记的最大大小。

To mix signals and find higher-level patterns, we apply a TimeDistributed dense layer to the markup features. TimeDistributed dense is the same as a dense layer, but with multiple inputs and outputs.

为了混合信号并找到更高级别的模式,我们将TimeDistributed密集层应用于标记功能。 TimeDistributed密集与密集层相同,但具有多个输入和输出。

图片功能 (Image features)

In parallel, we prepare the images. We take all the mini image features and transform them into one long list. The information is not changed, just reorganized.

同时,我们准备图像。 我们将所有微型图像功能都转换为一个长长的列表。 信息没有改变,只是重新组织了。

Again, to mix signals and extract higher level notions, we apply a dense layer. Since we are only dealing with one input value, we can use a normal dense layer. To connect the image features to the markup features, we copy the image features.

同样,为了混合信号并提取更高级别的概念,我们应用了一个密集层。 由于我们只处理一个输入值,因此可以使用普通的密集层。 为了将图像特征连接到标记特征,我们复制了图像特征。

In this case, we have three markup features. Thus, we end up with an equal amount of image features and markup features.

在这种情况下,我们具有三个标记功能。 因此,我们最终得到了相等数量的图像特征和标记特征。

串联图像和标记功能 (Concatenating the image and markup features)

All the sentences are padded to create three markup features. Since we have prepared the image features, we can now add one image feature for each markup feature.

所有句子都被填充以创建三个标记功能。 由于我们已经准备好图像功能,因此现在可以为每个标记功能添加一个图像功能。

After sticking one image feature to each markup feature, we end up with three image-markup features. This is the input we feed into the decoder.

在将一个图像功能粘贴到每个标记功能之后,我们最终得到了三个图像标记功能。 这是我们输入到解码器的输入。

解码器 (The Decoder)

Here we use the combined image-markup features to predict the next tag.

在这里,我们使用组合的图像标记功能来预测下一个标记。

In the below example, we use three image-markup feature pairs and output one next tag feature.

在下面的示例中,我们使用三个图像标记特征对,并输出一个下一个标记特征。

Note that the LSTM layer has the sequence set to false. Instead of returning the length of the input sequence, it only predicts one feature. In our case, it’s a feature for the next tag. It contains the information for the final prediction.

请注意,LSTM层的序列设置为false。 它不返回输入序列的长度,而只是预测一个特征。 在我们的例子中,这是下一个标签的功能。 它包含最终预测的信息。

最终预测 (The final prediction)

The dense layer works like a traditional feedforward neural network. It connects the 512 digits in the next tag feature with the 4 final predictions. Say we have 4 words in our vocabulary: start, hello, world, and end.

密集层的工作方式类似于传统的前馈神经网络。 它将下一个标记功能中的512位数字与4个最终预测相关联。 假设我们的词汇表中有4个单词:开始,你好,世界和结束。

The vocabulary prediction could be [0.1, 0.1, 0.1, 0.7]. The softmax activation in the dense layer distributes a probability from 0–1, with the sum of all predictions equal to 1. In this case, it predicts that the 4th word is the next tag. Then you translate the one-hot encoding [0, 0, 0, 1] into the mapped value, say “end”.

词汇预测可能是[0.1、0.1、0.1、0.7]。 密集层中的softmax激活分配的概率为0-1,所有预测的总和等于1。在这种情况下,它预测第四个单词是下一个标记。 然后,将一键编码[0、0、0、1]转换为映射的值,例如“ end”。

# Load the images and preprocess them for inception-resnetimages = []all_filenames = listdir('images/')all_filenames.sort()for filename in all_filenames:    images.append(img_to_array(load_img('images/'+filename, target_size=(299, 299))))images = np.array(images, dtype=float)images = preprocess_input(images)
# Run the images through inception-resnet and extract the features without the classification layerIR2 = InceptionResNetV2(weights='imagenet', include_top=False)features = IR2.predict(images)
# We will cap each input sequence to 100 tokensmax_caption_len = 100# Initialize the function that will create our vocabulary tokenizer = Tokenizer(filters='', split=" ", lower=False)
# Read a document and return a stringdef load_doc(filename):    file = open(filename, 'r')    text = file.read()    file.close()    return text
# Load all the HTML filesX = []all_filenames = listdir('html/')all_filenames.sort()for filename in all_filenames:    X.append(load_doc('html/'+filename))
# Create the vocabulary from the html filestokenizer.fit_on_texts(X)
# Add +1 to leave space for empty wordsvocab_size = len(tokenizer.word_index) + 1# Translate each word in text file to the matching vocabulary indexsequences = tokenizer.texts_to_sequences(X)# The longest HTML filemax_length = max(len(s) for s in sequences)
# Intialize our final input to the modelX, y, image_data = list(), list(), list()for img_no, seq in enumerate(sequences):    for i in range(1, len(seq)):        # Add the entire sequence to the input and only keep the next word for the output        in_seq, out_seq = seq[:i], seq[i]        # If the sentence is shorter than max_length, fill it up with empty words        in_seq = pad_sequences([in_seq], maxlen=max_length)[0]        # Map the output to one-hot encoding        out_seq = to_categorical([out_seq], num_classes=vocab_size)[0]        # Add and image corresponding to the HTML file        image_data.append(features[img_no])        # Cut the input sentence to 100 tokens, and add it to the input data        X.append(in_seq[-100:])        y.append(out_seq)
X, y, image_data = np.array(X), np.array(y), np.array(image_data)
# Create the encoderimage_features = Input(shape=(8, 8, 1536,))image_flat = Flatten()(image_features)image_flat = Dense(128, activation='relu')(image_flat)ir2_out = RepeatVector(max_caption_len)(image_flat)
language_input = Input(shape=(max_caption_len,))language_model = Embedding(vocab_size, 200, input_length=max_caption_len)(language_input)language_model = LSTM(256, return_sequences=True)(language_model)language_model = LSTM(256, return_sequences=True)(language_model)language_model = TimeDistributed(Dense(128, activation='relu'))(language_model)
# Create the decoderdecoder = concatenate([ir2_out, language_model])decoder = LSTM(512, return_sequences=False)(decoder)decoder_output = Dense(vocab_size, activation='softmax')(decoder)
# Compile the modelmodel = Model(inputs=[image_features, language_input], outputs=decoder_output)model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
# Train the neural networkmodel.fit([image_data, X], y, batch_size=64, shuffle=False, epochs=2)
# map an integer to a worddef word_for_id(integer, tokenizer):    for word, index in tokenizer.word_index.items():        if index == integer:            return word    return None
# generate a description for an imagedef generate_desc(model, tokenizer, photo, max_length):    # seed the generation process    in_text = 'START'    # iterate over the whole length of the sequence    for i in range(900):        # integer encode input sequence        sequence = tokenizer.texts_to_sequences([in_text])[0][-100:]        # pad input        sequence = pad_sequences([sequence], maxlen=max_length)        # predict next word        yhat = model.predict([photo,sequence], verbose=0)        # convert probability to integer        yhat = np.argmax(yhat)        # map integer to word        word = word_for_id(yhat, tokenizer)        # stop if we cannot map the word        if word is None:            break        # append as input for generating the next word        in_text += ' ' + word        # Print the prediction        print(' ' + word, end='')        # stop if we predict the end of the sequence        if word == 'END':            break    return
# Load and image, preprocess it for IR2, extract features and generate the HTMLtest_image = img_to_array(load_img('images/87.jpg', target_size=(299, 299)))test_image = np.array(test_image, dtype=float)test_image = preprocess_input(test_image)test_features = IR2.predict(np.array([test_image]))generate_desc(model, tokenizer, np.array(test_features), 100)

输出量 (Output)

  • 250 epochs

    250个纪元

  • 350 epochs

    350纪

  • 450 epochs

    450纪元

  • 550 epochs

    550个纪元

If you can’t see anything when you click these links, you can right click and click on “View Page Source.” Here is the original website for reference.

如果您在单击这些链接时看不到任何内容,则可以右键单击并单击“查看页面源”。 这是原始网站供参考。

我犯的错误: (Mistakes I made:)

  • LSTMs are a lot heavier for my cognition compared to CNNs. When I unrolled all the LSTMs, they became easier to understand. Fast.ai’s video on RNNs was super useful. Also, focus on the input and output features before you try understanding how they work.

    与CNN相比,LSTM对于我的认知而言要重得多 。 当我展开所有LSTM时,它们变得更容易理解。 Fast.ai的有关RNN的视频非常有用。 另外,在尝试了解输入和输出功能的工作原理之前,请先关注它们。

  • Building a vocabulary from the ground up is a lot easier than narrowing down a huge vocabulary. This includes everything from fonts, div sizes, and hex colors to variable names and normal words.

    从头开始构建词汇表比缩小庞大的词汇表要容易得多。 这包括字体,div大小和十六进制颜色到变量名和常规单词的所有内容。

  • Most of the libraries are created to parse text documents and not code. In documents, everything is separated by a space, but in code, you need custom parsing.

    创建大多数库是为了分析文本文档而不是代码。 在文档中,所有内容都由空格分隔,但是在代码中,您需要自定义解析。

  • You can extract features with a model that’s trained on Imagenet. This might seem counterintuitive since Imagenet has few web images. However, the loss is 30% higher compared to to a pix2code model, which is trained from scratch. It’d be interesting to use a pre-train inception-resnet type of model based on web screenshots.

    您可以使用在Imagenet上训练的模型提取特征。 由于Imagenet的Web图像很少,因此这似乎违反直觉。 但是,与从头开始训练的pix2code模型相比,损失要高30%。 基于网络屏幕截图使用预训练初始-resnet类型的模型会很有趣。

引导版本 (Bootstrap version)

In our final version, we’ll use a dataset of generated bootstrap websites from the pix2code paper. By using Twitter’s bootstrap, we can combine HTML and CSS and decrease the size of the vocabulary.

在最终版本中,我们将使用pix2code文件中生成的引导网站的数据集。 通过使用Twitter的bootstrap ,我们可以结合HTML和CSS并减少词汇量。

We’ll enable it to generate the markup for a screenshot it has not seen before. We’ll also dig into how it builds knowledge about the screenshot and markup.

我们将使它能够为以前从未见过的屏幕截图生成标记。 我们还将深入研究它如何建立有关屏幕截图和标记的知识。

Instead of training it on the bootstrap markup, we’ll use 17 simplified tokens that we then translate into HTML and CSS. The dataset includes 1500 test screenshots and 250 validation images. For each screenshot there are on average 65 tokens, resulting in 96925 training examples.

我们将使用17个简化的令牌,然后将其转换为HTML和CSS,而不是对引导程序标记进行培训。 数据集包括1500个测试屏幕截图和250个验证图像。 每个屏幕截图平均有65个令牌,产生了96925个训练示例。

By tweaking the model in the pix2code paper, the model can predict the web components with 97% accuracy (BLEU 4-ngram greedy search, more on this later).

通过在pix2code文件中调整模型,该模型可以以97%的准确性预测Web组件(BLEU 4-ngram贪婪搜索,稍后将对此进行更多介绍)。

端到端方法 (An end-to-end approach)

Extracting features from pre-trained models works well in image captioning models. But after a few experiments, I realized that pix2code’s end-to-end approach works better for this problem. The pre-trained models have not been trained on web data and are customized for classification.

从预训练模型中提取特征在图像字幕模型中效果很好。 但是经过一些实验,我意识到pix2code的端到端方法可以更好地解决这个问题。 预先训练的模型尚未针对网络数据进行训练,并且已针对分类进行了定制。

In this model, we replace the pre-trained image features with a light convolutional neural network. Instead of using max-pooling to increase information density, we increase the strides. This maintains the position and the color of the front-end elements.

在该模型中,我们用光卷积神经网络替换了预训练的图像特征。 我们没有使用最大池来增加信息密度,而是增加了步幅。 这样可以保持前端元素的位置和颜色。

There are two core models that enable this: convolutional neural networks (CNN) and recurrent neural networks (RNN). The most common recurrent neural network is long-short term memory (LSTM), so that’s what I’ll refer to.

有两种核心模型可以实现这一点:卷积神经网络(CNN)和递归神经网络(RNN)。 最常见的递归神经网络是长期短期记忆(LSTM),这就是我要提到的。

There are plenty of great CNN tutorials, and I covered them in my previous article. Here, I’ll focus on the LSTMs.

有很多很棒的CNN教程,我在上一篇文章中进行了介绍 。 在这里,我将重点介绍LSTM。

了解LSTM中的时间步长 (Understanding timesteps in LSTMs)

One of the harder things to grasp about LSTMs is timesteps. A vanilla neural network can be thought of as two timesteps. If you give it “Hello,” it predicts “World.” But it would struggle to predict more timesteps. In the below example, the input has four timesteps, one for each word.

关于LSTM的难点之一是时间步长。 可以将香草神经网络视为两个时间步长。 如果您将其命名为“ Hello”,则表示预测为“世界”。 但是要预测更多的时间步将很困难。 在下面的示例中,输入具有四个时间步,每个单词一个。

LSTMs are made for input with timesteps. It’s a neural network customized for information in order. If you unroll our model it looks like this. For each downward step, you keep the same weights. You apply one set of weights to the previous output and another set to the new input.

LSTM是随时间步长输入的。 这是为信息有序定制的神经网络。 如果您展开我们的模型,它将看起来像这样。 对于每个向下的步骤,您将保持相同的权重。 您将一组权重应用于先前的输出,并将另一组权重应用于新的输入。

The weighted input and output are concatenated and added together with an activation. This is the output for that timestep. Since we reuse the weights, they draw information from several inputs and build knowledge of the sequence.

加权的输入和输出被串联并与激活一起添加。 这是该时间步的输出。 由于我们重用了权重,因此它们从多个输入中提取信息并建立了序列知识。

Here is a simplified version of the process for each timestep in an LSTM.

这是LSTM中每个时间步的简化过程。

To get a feel for this logic, I’d recommend building an RNN from scratch with Andrew Trask’s brilliant tutorial.

为了体会这种逻辑,我建议使用Andrew Trask 出色的教程从头开始构建RNN。

了解LSTM层中的单位 (Understanding the units in LSTM layers)

The number of units in each LSTM layer determines it’s ability to memorize. This also corresponds to the size of each output feature. Again, a feature is a long list of numbers used to transfer information between layers.

每个LSTM层中的单位数量决定了它的记忆能力。 这也对应于每个输出要素的大小。 同样,功能是一长串数字,用于在层之间传输信息。

Each unit in the LSTM layer learns to keep track of different aspects of the syntax. Below is a visualization of a unit that keeps tracks of the information in the row div. This is the simplified markup we are using to train the bootstrap model.

LSTM层中的每个单元都学会跟踪语法的不同方面。 下面是一个可视化的单元,该单元在div行中跟踪信息。 这是我们用来训练引导程序模型的简化标记。

Each LSTM unit maintains a cell state. Think of the cell state as the memory. The weights and activations are used to modify the state in different ways. This enables the LSTM layers to fine-tune which information to keep and discard for each input.

每个LSTM单元保持一个单元状态。 将单元状态视为内存。 权重和激活用于以不同方式修改状态。 这使LSTM层可以微调为每个输入保留和丢弃的信息。

In addition to passing through an output feature for each input, it also forwards the cell states, one value for each unit in the LSTM. To get a feel for how the components within the LSTM interact, I recommend Colah’s tutorial, Jayasiri’s Numpy implementation, and Karphay’s lecture and write-up.

除了传递每个输入的输出功能外,它还转发单元状态,即LSTM中每个单元的一个值。 为了了解LSTM中的组件如何交互,我建议使用Colah的教程 ,Jayasiri的Numpy实现以及Karphay的讲座和文章。

dir_name = 'resources/eval_light/'
# Read a file and return a stringdef load_doc(filename):    file = open(filename, 'r')    text = file.read()    file.close()    return text
def load_data(data_dir):    text = []    images = []    # Load all the files and order them    all_filenames = listdir(data_dir)    all_filenames.sort()    for filename in (all_filenames):        if filename[-3:] == "npz":            # Load the images already prepared in arrays            image = np.load(data_dir+filename)            images.append(image['features'])        else:            # Load the boostrap tokens and rap them in a start and end tag            syntax = '<START> ' + load_doc(data_dir+filename) + ' <END>'            # Seperate all the words with a single space            syntax = ' '.join(syntax.split())            # Add a space after each comma            syntax = syntax.replace(',', ' ,')            text.append(syntax)    images = np.array(images, dtype=float)    return images, text
train_features, texts = load_data(dir_name)
# Initialize the function to create the vocabulary tokenizer = Tokenizer(filters='', split=" ", lower=False)# Create the vocabulary tokenizer.fit_on_texts([load_doc('bootstrap.vocab')])
# Add one spot for the empty word in the vocabulary vocab_size = len(tokenizer.word_index) + 1# Map the input sentences into the vocabulary indexestrain_sequences = tokenizer.texts_to_sequences(texts)# The longest set of boostrap tokensmax_sequence = max(len(s) for s in train_sequences)# Specify how many tokens to have in each input sentencemax_length = 48
def preprocess_data(sequences, features):    X, y, image_data = list(), list(), list()    for img_no, seq in enumerate(sequences):        for i in range(1, len(seq)):            # Add the sentence until the current count(i) and add the current count to the output            in_seq, out_seq = seq[:i], seq[i]            # Pad all the input token sentences to max_sequence            in_seq = pad_sequences([in_seq], maxlen=max_sequence)[0]            # Turn the output into one-hot encoding            out_seq = to_categorical([out_seq], num_classes=vocab_size)[0]            # Add the corresponding image to the boostrap token file            image_data.append(features[img_no])            # Cap the input sentence to 48 tokens and add it            X.append(in_seq[-48:])            y.append(out_seq)    return np.array(X), np.array(y), np.array(image_data)
X, y, image_data = preprocess_data(train_sequences, train_features)
#Create the encoderimage_model = Sequential()image_model.add(Conv2D(16, (3, 3), padding='valid', activation='relu', input_shape=(256, 256, 3,)))image_model.add(Conv2D(16, (3,3), activation='relu', padding='same', strides=2))image_model.add(Conv2D(32, (3,3), activation='relu', padding='same'))image_model.add(Conv2D(32, (3,3), activation='relu', padding='same', strides=2))image_model.add(Conv2D(64, (3,3), activation='relu', padding='same'))image_model.add(Conv2D(64, (3,3), activation='relu', padding='same', strides=2))image_model.add(Conv2D(128, (3,3), activation='relu', padding='same'))
image_model.add(Flatten())image_model.add(Dense(1024, activation='relu'))image_model.add(Dropout(0.3))image_model.add(Dense(1024, activation='relu'))image_model.add(Dropout(0.3))
image_model.add(RepeatVector(max_length))
visual_input = Input(shape=(256, 256, 3,))encoded_image = image_model(visual_input)
language_input = Input(shape=(max_length,))language_model = Embedding(vocab_size, 50, input_length=max_length, mask_zero=True)(language_input)language_model = LSTM(128, return_sequences=True)(language_model)language_model = LSTM(128, return_sequences=True)(language_model)
#Create the decoderdecoder = concatenate([encoded_image, language_model])decoder = LSTM(512, return_sequences=True)(decoder)decoder = LSTM(512, return_sequences=False)(decoder)decoder = Dense(vocab_size, activation='softmax')(decoder)
# Compile the modelmodel = Model(inputs=[visual_input, language_input], outputs=decoder)optimizer = RMSprop(lr=0.0001, clipvalue=1.0)model.compile(loss='categorical_crossentropy', optimizer=optimizer)
#Save the model for every 2nd epochfilepath="org-weights-epoch-{epoch:04d}--val_loss-{val_loss:.4f}--loss-{loss:.4f}.hdf5"checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_weights_only=True, period=2)callbacks_list = [checkpoint]
# Train the modelmodel.fit([image_data, X], y, batch_size=64, shuffle=False, validation_split=0.1, callbacks=callbacks_list, verbose=1, epochs=50)

测试精度 (Test accuracy)

It’s tricky to find a fair way to measure the accuracy. Say you compare word by word. If your prediction is one word out of sync, you might have 0% accuracy. If you remove one word which syncs the prediction, you might end up with 99/100.

寻找一种公平的方法来测量精度是很棘手的。 假设您逐字比较。 如果您的预测是一个单词不同步,则您的准确度可能为0%。 如果删除一个与预测同步的单词,则最终可能会得到99/100。

I used the BLEU score, best practice in machine translating and image captioning models. It breaks the sentence into four n-grams, from 1–4 word sequences. In the below prediction “cat” is supposed to be “code.”

我使用了BLEU分数,这是机器翻译和图像字幕模型的最佳实践。 它将句子从1-4个单词序列分解为四个n-gram。 在下面的预测中,“猫”应该是“代码”。

To get the final score, you multiply each score with 25%, (4/5) * 0.25 + (2/4) * 0.25 + (1/3) * 0.25 + (0/2) * 0.25 = 0.2 + 0.125 + 0.083 + 0 = 0.408 . The sum is then multiplied with a sentence length penalty. Since the length is correct in our example, it becomes our final score.

要获得最终分数,请将每个分数乘以25%,即(4/5)* 0.25 +(2/4)* 0.25 +(1/3)* 0.25 +(0/2)* 0.25 = 0.2 + 0.125 + 0.083 + 0 = 0.408。 然后将总和乘以句子长度罚分。 由于长度在我们的示例中是正确的,因此它将成为我们的最终分数。

You could increase the number of n-grams to make it harder. A four n-gram model is the model that best corresponds to human translations. I’d recommend running a few examples with the below code and reading the wiki page.

您可以增加n-gram的数量以使其更难。 四个n元语法模型是最适合人类翻译的模型。 我建议使用以下代码运行一些示例,并阅读Wiki页面。

#Create a function to read a file and return its contentdef load_doc(filename):    file = open(filename, 'r')    text = file.read()    file.close()    return text
def load_data(data_dir):    text = []    images = []    files_in_folder = os.listdir(data_dir)    files_in_folder.sort()    for filename in tqdm(files_in_folder):        #Add an image        if filename[-3:] == "npz":            image = np.load(data_dir+filename)            images.append(image['features'])        else:        # Add text and wrap it in a start and end tag            syntax = '<START> ' + load_doc(data_dir+filename) + ' <END>'            #Seperate each word with a space            syntax = ' '.join(syntax.split())            #Add a space between each comma            syntax = syntax.replace(',', ' ,')            text.append(syntax)    images = np.array(images, dtype=float)    return images, text
#Intialize the function to create the vocabularytokenizer = Tokenizer(filters='', split=" ", lower=False)#Create the vocabulary in a specific ordertokenizer.fit_on_texts([load_doc('bootstrap.vocab')])
dir_name = '../../../../eval/'train_features, texts = load_data(dir_name)
#load model and weights json_file = open('../../../../model.json', 'r')loaded_model_json = json_file.read()json_file.close()loaded_model = model_from_json(loaded_model_json)# load weights into new modelloaded_model.load_weights("../../../../weights.hdf5")print("Loaded model from disk")
# map an integer to a worddef word_for_id(integer, tokenizer):    for word, index in tokenizer.word_index.items():        if index == integer:            return word    return Noneprint(word_for_id(17, tokenizer))
# generate a description for an imagedef generate_desc(model, tokenizer, photo, max_length):    photo = np.array([photo])    # seed the generation process    in_text = '<START> '    # iterate over the whole length of the sequence    print('\nPrediction---->\n\n<START> ', end='')    for i in range(150):        # integer encode input sequence        sequence = tokenizer.texts_to_sequences([in_text])[0]        # pad input        sequence = pad_sequences([sequence], maxlen=max_length)        # predict next word        yhat = loaded_model.predict([photo, sequence], verbose=0)        # convert probability to integer        yhat = argmax(yhat)        # map integer to word        word = word_for_id(yhat, tokenizer)        # stop if we cannot map the word        if word is None:            break        # append as input for generating the next word        in_text += word + ' '        # stop if we predict the end of the sequence        print(word + ' ', end='')        if word == '<END>':            break    return in_text
max_length = 48
# evaluate the skill of the modeldef evaluate_model(model, descriptions, photos, tokenizer, max_length):    actual, predicted = list(), list()    # step over the whole set    for i in range(len(texts)):        yhat = generate_desc(model, tokenizer, photos[i], max_length)        # store actual and predicted        print('\n\nReal---->\n\n' + texts[i])        actual.append([texts[i].split()])        predicted.append(yhat.split())    # calculate BLEU score    bleu = corpus_bleu(actual, predicted)    return bleu, actual, predicted
bleu, actual, predicted = evaluate_model(loaded_model, texts, train_features, tokenizer, max_length)
#Compile the tokens into HTML and cssdsl_path = "compiler/assets/web-dsl-mapping.json"compiler = Compiler(dsl_path)compiled_website = compiler.compile(predicted[0], 'index.html')
print(compiled_website )print(bleu)

输出量 (Output)

Links to sample output

链接到样本输出

  • Generated website 1 — Original 1

    生成的网站1- 原始1

  • Generated website 2 — Original 2

    生成的网站2- 原始2

  • Generated website 3 — Original 3

    生成的网站3- 原始3

  • Generated website 4 — Original 4

    生成的网站4- 原始4

  • Generated website 5 — Original 5

    生成的网站5- 原始5

我犯的错误: (Mistakes I made:)

  • Understand the weakness of the models instead of testing random models. First, I applied random things such as batch normalization and bidirectional networks and tried implementing attention. After looking at the test data and seeing that it could not predict color and position with high accuracy, I realized there was a weakness in the CNN. This lead me to replace maxpooling with increased strides. The validation loss went from 0.12 to 0.02 and increased the BLEU score from 85% to 97%.

    了解模型的弱点,而不是测试随机模型。 首先,我应用了诸如批处理规范化和双向网络之类的随机事物,并尝试实现注意力。 在查看测试数据并发现它无法高精度预测颜色和位置之后,我意识到CNN存在缺陷。 这使我以更大的步伐取代了maxpooling。 验证损失从0.12增加到0.02,并将BLEU评分从85%增加到97%。

  • Only use pre-trained models if they are relevant. Given the small dataset I thought that a pre-trained image model would improve the performance. From my experiments, and end-to-end model is slower to train and requires more memory, but is 30% more accurate.

    仅在相关时使用预先训练的模型。 考虑到较小的数据集,我认为预先训练的图像模型可以提高性能。 根据我的实验,端到端模型的训练速度较慢,并且需要更多的内存,但准确性要高出30%。

  • Plan for slight variance when you run your model on a remote server. On my mac, it reads the files in alphabetic order. However, on the server, it was randomly located. This created a mismatch between the screenshots and the code. It still converged, but the validation data was 50% worse than when I fixed it.

    在远程服务器上运行模型时,请计划有细微的差异。 在我的Mac上,它按字母顺序读取文件。 但是,在服务器上,它是随机放置的。 这在屏幕截图和代码之间造成了不匹配。 它仍然收敛,但是验证数据比我修复时差50%。

  • Make sure you understand library functions. Include space for the empty token in your vocabulary. When I didn’t add it, it did not include one of the tokens. I only noticed it after looking at the final output several times and noticing that it never predicted a “single” token. After a quick check, I realized it wasn’t even in the vocabulary. Also, use the same order in the vocabulary for training and testing.

    确保您了解库函数。 在词汇表中包含用于空令牌的空间。 当我不添加它时,它不包含令牌之一。 在多次查看最终输出并注意到它从未预测到“单个”令牌后,我才注意到它。 快速检查后,我意识到它甚至不在词汇表中。 另外,在词汇表中使用相同的顺序进行培训和测试。

  • Use lighter models when experimenting. Using GRUs instead of LSTMs reduced each epoch cycle by 30%, and did not have a large effect on the performance.

    实验时请使用较轻的模型。 使用GRU代替LSTM将每个纪元周期减少了30%,并且对性能没有太大影响。

下一步 (Next steps)

Front-end development is an ideal space to apply deep learning. It’s easy to generate data, and the current deep learning algorithms can map most of the logic.

前端开发是应用深度学习的理想空间。 生成数据很容易,当前的深度学习算法可以映射大多数逻辑。

One of the most exciting areas is applying attention to LSTMs. This will not just improve the accuracy, but enable us to visualize where the CNN puts its focus as it generates the markup.

最令人兴奋的领域之一是关注LSTM 。 这不仅可以提高准确性,还可以使我们可视化CNN生成标记时将焦点放在何处。

Attention is also key for communicating between markup, stylesheets, scripts and eventually the backend. Attention layers can keep track of variables, enabling the network to communicate between programming languages.

注意也是标记,样式表,脚本以及最终后端之间进行通信的关键。 注意层可以跟踪变量,使网络能够在编程语言之间进行通信。

But in the near feature, the biggest impact will come from building a scalable way to synthesize data. Then you can add fonts, colors, words, and animations step-by-step.

但是在近期功能中,最大的影响将来自构建可扩展的方法来合成数据。 然后,您可以逐步添加字体,颜色,单词和动画。

So far, most progress is happening in taking sketches and turning them into template apps. In less then two years, we’ll be able to draw an app on paper and have the corresponding front-end in less than a second. There are already two working prototypes built by Airbnb’s design team and Uizard.

到目前为止,大多数进展都发生在绘制草图并将其转换为模板应用程序中。 在不到两年的时间内,我们将能够在纸上绘制一个应用程序,并在不到一秒钟的时间内拥有相应的前端。 Airbnb的设计团队和Uizard已经建立了两个工作原型。

Here are some experiments to get started.

这是一些实验入门。

实验 (Experiments)

Getting started

入门

  • Run all the models
    运行所有模型
  • Try different hyper parameters
    尝试不同的超级参数
  • Test a different CNN architecture
    测试不同的CNN架构
  • Add Bidirectional LSTM models
    添加双向LSTM模型
  • Implement the model with a different dataset. (You can easily mount this dataset in your FloydHub jobs with this flag --data emilwallner/datasets/100k-html:data)

    用不同的数据集实现模型。 (您可以使用此标志--data emilwallner/datasets/100k-html:data轻松地将此数据集挂载到FloydHub作业中)

Further experiments

进一步的实验

  • Creating a solid random app/web generator with the corresponding syntax.
    使用相应的语法创建可靠的随机应用程序/网络生成器。
  • Data for a sketch to app model. Auto-convert the app/web screenshots into sketches and use a GAN to create variety.
    草图到应用程序模型的数据。 将应用程序/网络屏幕快照自动转换为草图,并使用GAN创造多样性。
  • Apply an attention layer to visualize the focus on the image for each prediction, similar to this model.

    类似于此模型 ,为每个预测应用关注层以使图像上的焦点可视化。

  • Create a framework for a modular approach. Say, having encoder models for fonts, one for color, another for layout and combine them with one decoder. A good start could be solid image features.
    创建模块化方法的框架。 假设有一个用于字体的编码器模型,一个用于颜色的编码器模型,另一个用于布局的编码器模型,并将它们与一个解码器组合在一起。 良好的开始可能是坚实的图像功能。
  • Feed the network simple HTML components and teach it to generate animations using CSS. It would be fascinating to have an attention approach and visualize the focus on both input sources.
    向网络提供简单HTML组件,并教其使用CSS生成动画。 拥有注意力的方法并使对两个输入源的关注可视化将是非常有趣的。

Huge thanks to Tony Beltramelli and Jon Gold for their research and ideas, and for answering questions. Thanks to Jason Brownlee for his stellar Keras tutorials (I included a few snippets from his tutorial in the core Keras implementation), and Beltramelli for providing the data. Also thanks to Qingping Hou, Charlie Harrington, Sai Soundararaj, Jannes Klaas, Claudio Cabral, Alain Demenet and Dylan Djian for reading drafts of this.

非常感谢 Tony Beltramelli和Jon Gold的研究和想法,并回答了问题。 感谢Jason Brownlee提供了出色的Keras教程(我在核心Keras实现中包括了他的教程中的一些摘录),以及Beltramelli提供了数据。 还要感谢侯庆平,查理·哈灵顿,赛·桑达拉拉杰,珍妮丝·克拉斯,克劳迪奥·卡布拉尔,阿兰·德梅内和迪伦·德吉恩阅读本文的草稿。

关于埃米尔·沃纳(Emil Wallner) (About Emil Wallner)

This the fourth part of a multi-part blog series from Emil as he learns deep learning. Emil has spent a decade exploring human learning. He’s worked for Oxford’s business school, invested in education startups, and built an education technology business. Last year, he enrolled at Ecole 42 to apply his knowledge of human learning to machine learning.

这是Emil学习深度学习的博客系列的第四部分。 埃米尔(Emil)花了十年时间探索人类学习。 他曾在牛津大学商学院工作,投资了教育创业公司,并建立了教育技术公司。 去年,他加入了42大学 ,将他对人类学习的知识应用于机器学习。

If you build something or get stuck, ping me below or on twitter: emilwallner. I’d love to see what you are building.

如果您有建物或被卡住,请在下面或在Twitter上对我ping: emilwallner 。 我很想看看你在建什么。

This was first published as a community post on Floydhub’s blog.

它最初是作为社区帖子发布在Floydhub的博客上的。

翻译自: https://www.freecodecamp.org/news/how-you-can-train-an-ai-to-convert-your-design-mockups-into-html-and-css-cc7afd82fed4/

ai css 线条粗细

ai css 线条粗细_如何训练AI将您的设计模型转换为HTML和CSS相关推荐

  1. Github每日精选(第33期):Screenshot-to-code训练 AI 将设计模型转换为 HTML 和 CSS

    Screenshot-to-code Screenshot-to-code 深度学习将改变前端开发.它将增加原型设计速度并降低构建软件的门槛. github上的地址在这里. 当 Tony Beltra ...

  2. 神码ai人工智能写作机器人_神经符号AI为我们提供具有真正常识的机器

    神码ai人工智能写作机器人 By Katia Moskvitch 卡蒂亚·莫斯科维奇(Katia Moskvitch) "那只狗躲在床底下. 再次." ("The dog ...

  3. java课程设计打字训练测试软件_打字训练测试软件-Java课程设计

    打字训练测试软件-Java课程设计 <程序设计实践>题目: 打字训练测试软件 学校:陕西工业学院 学院:信息学院 班级:信管 12-2 学号: 201213156619 姓名:刘克豪 20 ...

  4. 怎么用ai做出适量插画_如何用AI把照片制作成矢量线条插画

    课课家将在本教程里使用Illustrator把照片制作成矢量线条插画,教程难度只算一般,但是挺有创意的,也算是一个技巧的分享吧.喜欢的朋友让我们一起来学习吧. 我们先来看一下效果图: 2.依据图片的复 ...

  5. ai字母组合发音规律_使用和尚ai的俄语字母分类

    ai字母组合发音规律 目录: (Table of contents:) Introduction介绍 About the Dataset关于数据集 Setting up Monk and Prereq ...

  6. ai人工智能换脸原理_他们如何看待AI监视内部工作原理

    ai人工智能换脸原理 Large scale intelligent surveillance systems used by governments and corporates have attr ...

  7. ai建立使用图案_怎么用AI创建无缝拼贴图案样式 Illustrator创建无缝拼贴图案教程...

    无缝拼贴图案是很多设计师喜欢在界面中用到的图案,并且在很多领域都有应用,那么这个很有前景的图案纹理该怎么制作呢,下面西西就给大家详细讲解下怎么用Illustrator制作出大气典雅的无缝拼贴图案. 具 ...

  8. 怎么用ai做出适量插画_怎么用AI和AE制作矢量插画风猫咪动效

    1、打开AI,[Ctrl+N]新建画布,[宽度]为500px,[高度]为500px,点击[确定]. 2、使用[圆角矩形工具]画出圆角矩形充满画布,使用[吸管工具]吸取提前做好的色板上的深蓝色使其填充矩 ...

  9. ai预测占比_百度推出AI翻译工具 中国人工智能行业数据分析及预测(图)

    日前,推出一款人工智能工具,可实时将英语翻译成中文和德语,以挑战谷歌的竞争产品.谷歌先前推出一款PixelBuds产品,称这款无线耳机可以进行实时翻译.目前,百度的翻译工具只能互译英语和中文,或者互译 ...

最新文章

  1. JVM内存溢出分析-实战JVM(二)
  2. vscode里面如何配置库_VSCode中C/C++库文件的配置
  3. APP天气预报界面设计灵感
  4. Android开发者指南(25) —— Resource Types - Style
  5. 核心动画——CAAnimation
  6. 【Matlab取整函数】
  7. hashmap自定义排序java,如何在Java中对HashMap进行排序
  8. 电动汽车仿真系列-基于动态规划的混合动力汽车能量管理
  9. 用Excel做数据地图
  10. 一键批量修改文件夹名称
  11. 1038: 顺序表中重复数据的删除
  12. ArcGIS 发布自定义打印模板
  13. SLG游戏中绘制六边形网格地形
  14. 17.(cesium之家)cesium调整倾斜摄影位置(高度,平移,旋转,缩放)
  15. Scrapy中的Rules理解
  16. 解读爱奇艺Q2财报:会员数破亿,其他收入打开新增长空间?
  17. 计算机ps工具字母,PhotoShop7怎么把英文字母抠出来 一个工具搞定
  18. python人工智能入门纳米学位_最近看到udacity的纳米学位很火,号称学完可以找到工作了,这是真的吗?...
  19. Word2003快速操作技巧及常用快捷键使用
  20. solidity的函数修改器(modifier)

热门文章

  1. 1小时学会:最简单的iOS直播推流(三)使用系统接口捕获音视频数据
  2. 5 OC 中的三种对象
  3. 异步系统接口调用流程图
  4. 十分钟成为 Contributor 系列 | 为 TiDB 重构 built-in 函数
  5. “”开天眼“”,天地分割效果
  6. [Notice]博客地址转移 vitostack.com
  7. C++与.net的编译方式
  8. 路印协议受邀参加澳洲新南威尔士政府孵化器Haymarket HQ分享论坛
  9. Word 2013无法发布文章到博客园
  10. Linux硬盘性能测试工具 - FIO