github怎么为代码着色

by Emil Wallner

埃米尔·沃尔纳(Emil Wallner)

如何仅用100行神经网络代码为黑白照片着色 (How to colorize black & white photos with just 100 lines of neural network code)

Earlier this year, Amir Avni used neural networks to troll the subreddit/r/Colorization — a community where people colorize historical black and white images manually using Photoshop.

今年早些时候,阿米尔·阿夫尼(Amir Avni)使用神经网络拖曳了subreddit / r / Colorization(一个社区,人们可以在其中使用Photoshop手动为历史黑白图像着色)。

They were astonished with Amir’s deep learning bot. What could take up to a month of manual labour could now be done in just a few seconds.

他们对Amir的深度学习机器人感到惊讶。 现在,仅需几秒钟即可完成长达一个月的体力劳动。

I was fascinated by Amir’s neural network, so I reproduced it and documented the process. First off, let’s look at some of the results/failures from my experiments (scroll to the bottom for the final result).

我对Amir的神经网络着迷,所以我对其进行了复制并记录了该过程。 首先,让我们看一下我的实验中的一些结果/失败(滚动到底部以获得最终结果)。

Today, colorization is usually done by hand in Photoshop. To appreciate all the hard work behind this process, take a peek at this gorgeous colorization memory lane video:

如今,着色通常是在Photoshop中手动完成的。 要欣赏此过程背后的所有辛苦工作,请看一下这段华丽的彩色存储通道视频:

In short, a picture can take up to one month to colorize. It requires extensive research. A face alone needs up to 20 layers of pink, green and blue shades to get it just right.

简而言之,一张照片最多可能需要一个月才能着色。 它需要广泛的研究。 仅一张脸就需要多达20层粉色,绿色和蓝色阴影才能使它正确。

This article is for beginners. Yet, if you’re new to deep learning terminology, you can read my previous two posts here and here, and watch Andrej Karpathy’s lecture for more background.

本文适用于初学者。 但是,如果您不熟悉深度学习术语,则可以在这里和这里阅读我的前两篇文章,并观看Andrej Karpathy的演讲以获取更多背景知识。

I’ll show you how to build your own colorization neural net in three steps.

我将向您展示如何分三步构建自己的着色神经网络。

The first section breaks down the core logic. We’ll build a bare-bones 40-line neural network as an “alpha” colorization bot. There’s not a lot of magic in this code snippet. This well help us become familiar with the syntax.

第一部分分解了核心逻辑。 我们将构建一个准系统的40行神经网络作为“ alpha”着色机器人。 此代码段中没有很多魔术。 这很好地帮助我们熟悉了语法。

The next step is to create a neural network that can generalize — our “beta” version. We’ll be able to color images the bot has not seen before.

下一步是创建一个可以泛化的神经网络-我们的“测试版”版本。 我们将为机器人以前从未见过的图像着色。

For our “final” version, we’ll combine our neural network with a classifier. We’ll use an Inception Resnet V2 that has been trained on 1.2 million images. To make the coloring pop, we’ll train our neural network on portraits from Unsplash.

对于“最终”版本,我们将神经网络与分类器结合起来。 我们将使用经过训练的120万张图像的Inception Resnet V2 。 为了使着色流行,我们将在Unsplash的肖像上训练我们的神经网络。

If you want to look ahead, here’s a Jupyter Notebook with the Alpha version of our bot. You can also check out the three versions on FloydHub and GitHub, along with code for all the experiments I ran on FloydHub’s cloud GPUs.

如果您想放眼未来,请使用我们的机器人Alpha版本的Jupyter笔记本 。 您还可以在FloydHub和GitHub上查看这三个版本,以及我在FloydHub的云GPU上进行的所有实验的代码。

核心逻辑 (Core logic)

In this section, I’ll outline how to render an image, the basics of digital colors, and the main logic for our neural network.

在本节中,我将概述如何渲染图像,数字颜色的基础知识以及神经网络的主要逻辑。

Black and white images can be represented in grids of pixels. Each pixel has a value that corresponds to its brightness. The values span from 0–255, from black to white.

黑白图像可以用像素网格表示。 每个像素都有一个与其亮度相对应的值。 取值范围是0到255,从黑色到白色。

Color images consist of three layers: a red layer, a green layer, and a blue layer. This might be counter-intuitive to you. Imagine splitting a green leaf on a white background into the three channels. Intuitively, you might think that the plant is only present in the green layer.

彩色图像由三层组成:红色层,绿色层和蓝色层。 这可能与您的直觉相反。 想象一下,将白色背景上的绿色叶子分成三个通道。 凭直觉,您可能认为该植物仅存在于绿色层中。

But, as you see below, the leaf is present in all three channels. The layers not only determine color, but also brightness.

但是,正如您在下面看到的那样,所有三个通道中都存在叶子。 这些层不仅决定颜色,还决定亮度。

To achieve the color white, for example, you need an equal distribution of all colors. By adding an equal amount of red and blue, it makes the green brighter. Thus, a color image encodes the color and the contrast using three layers:

例如,要获得白色,您需要所有颜色的均等分布。 通过添加相等数量的红色和蓝色,可以使绿色更亮。 因此,彩色图像使用三层对颜色和对比度进行编码:

Just like black and white images, each layer in a color image has a value from 0–255. The value 0 means that it has no color in this layer. If the value is 0 for all color channels, then the image pixel is black.

就像黑白图像一样,彩色图像中的每个图层的值都在0-255之间。 值0表示此图层中没有颜色。 如果所有颜色通道的值为0,则图像像素为黑色。

As you may know, a neural network creates a relationship between an input value and output value. To be more precise with our colorization task, the network needs to find the traits that link grayscale images with colored ones.

如您所知,神经网络会在输入值和输出值之间建立关系。 为了更精确地执行我们的着色任务,网络需要找到将灰度图像与彩色图像联系起来的特征。

In sum, we are searching for the features that link a grid of grayscale values to the three color grids.

总之,我们正在寻找将灰度值网格链接到三个颜色网格的功能。

阿尔法版本 (Alpha version)

We’ll start by making a simple version of our neural network to color an image of a woman’s face. This way, you can get familiar with the core syntax of our model as we add features to it.

首先,我们将制作一个简单的神经网络版本来为女性面部图像上色。 这样,您可以在向模型添加功能时熟悉其核心语法。

With just 40 lines of code, we can make the following transition. The middle picture is done with our neural network and the picture to the right is the original color photo. The network is trained and tested on the same image — we’ll get back to this during the beta-version.

仅需40行代码,我们就可以进行以下转换。 中间图片是使用我们的神经网络完成的,而右边的图片是原始的彩色照片。 网络是在同一张图片上经过训练和测试的-在beta版本中,我们将再次讨论。

色彩空间 (Color space)

First, we’ll use an algorithm to change the color channels, from RGB to Lab. L stands for lightness, and a and b for the color spectra green–red and blue–yellow.

首先,我们将使用一种算法来更改颜色通道,从RGB到Lab。 L代表亮度, ab代表绿色-红色和蓝色-黄色光谱。

As you can see below, a Lab encoded image has one layer for grayscale, and has packed three color layers into two. This means that we can use the original grayscale image in our final prediction. Also, we only have two channels to predict.

如下所示,实验室编码的图像具有一层灰度,并且将三个颜色层打包为两个。 这意味着我们可以在最终预测中使用原始灰度图像。 另外,我们只有两个预测渠道。

Science fact — 94% of the cells in our eyes determine brightness. That leaves only 6% of our receptors to act as sensors for colors. As you can see in the above image, the grayscale image is a lot sharper than the color layers. This is another reason to keep the grayscale image in our final prediction.

科学事实-我们眼睛中94%的细胞决定亮度。 剩下的受体只有6%可以用作颜色传感器。 如上图所示,灰度图像比颜色图层要清晰得多。 这是将灰度图像保留在我们最终预测中的另一个原因。

从黑白到彩色 (From B&W to color)

Our final prediction looks like this. We have a grayscale layer for input, and we want to predict two color layers, the ab in Lab. To create the final color image we’ll include the L/grayscale image we used for the input. The result will be creating a Lab image.

我们的最终预测如下所示。 我们有一个用于输入的灰度层,我们想预测两个颜色层,即Lab中ab 。 为了创建最终的彩色图像,我们将包含用于输入的L /灰度图像。 结果将创建一个Lab图像。

To turn one layer into two layers, we use convolutional filters. Think of them as the blue/red filters in 3D glasses. Each filter determines what we see in a picture. They can highlight or remove something to extract information out of the picture. The network can either create a new image from a filter or combine several filters into one image.

为了将一层变成两层,我们使用了卷积滤波器。 可以将它们视为3D眼镜中的蓝色/红色滤镜。 每个过滤器确定我们在图片中看到的内容。 他们可以突出显示或删除某些内容以从图片中提取信息。 网络可以从过滤器创建新图像,也可以将多个过滤器组合为一个图像。

For a convolutional neural network, each filter is automatically adjusted to help with the intended outcome. We’ll start by stacking hundreds of filters and narrow them down into two layers, the a and b layers.

对于卷积神经网络,将自动调整每个过滤器以帮助实现预期的结果。 我们将从堆叠数百个过滤器开始并将它们缩小为两层,即ab层。

Before we get into detail into how it works, let’s run the code.

在详细研究其工作原理之前,让我们运行代码。

在FloydHub上运行代码 (Run the code on FloydHub)

Click the below button you open a Workspace on FloydHub where you will find the same environment and dataset used for the Full version. You can also find the trained models for Serving.

单击下面的按钮,在FloydHub上打开工作区 ,您将在其中找到用于完全版的相同环境和数据集。 您还可以找到训练有素的Serving模型。

You can also make a local FloydHub installation with their 2-min installation, watch my 5-min video tutorial or check out my step-to-step guide. It’s the best (and easiest) way to train deep learning models on cloud GPUs.

您还可以通过2分钟的安装进行本地FloydHub安装,观看5分钟的视频教程或查看我的分步指南 。 这是在云GPU上训练深度学习模型的最佳(也是最简单)方法。

阿尔法版本 (Alpha version)

Once FloydHub is installed, use the following commands:

一旦安装了FloydHub,请使用以下命令:

git clone https://github.com/emilwallner/Coloring-greyscale-images-in-Keras

Open the folder and initiate FloydHub.

打开文件夹并启动FloydHub。

cd Coloring-greyscale-images-in-Keras/floydhubfloyd init colornet

The FloydHub web dashboard will open in your browser. You will be prompted to create a new FloydHub project called colornet. Once that's done, go back to your terminal and run the same initcommand.

FloydHub Web仪表板将在您的浏览器中打开。 系统将提示您创建一个名为colornet的新FloydHub项目。 完成后,返回您的终端并运行相同的init命令。

floyd init colornet

Okay, let’s run our job:

好吧,让我们开始工作:

floyd run --data emilwallner/datasets/colornet/2:data --mode jupyter --tensorboard

Some quick notes about our job:

关于我们工作的一些简短说明:

  • We mounted a public dataset on FloydHub (which I’ve already uploaded) at the datadirectory with the below line:

    我们在data目录的FloydHub(我已经上传了)上安装了一个公共数据集,内容如下:

--dataemilwallner/datasets/colornet/2:data

You can explore and use this dataset (and many other public datasets) by viewing it on FloydHub

您可以在FloydHub上查看和使用此数据集(以及许多其他公共数据集)

  • We enabled Tensorboard with --tensorboard

    我们通过--tensorboard启用了--tensorboard

  • We ran the job in Jupyter Notebook mode with --mode jupyter

    我们使用--mode jupyter在Jupyter Notebook模式下运行该作业

  • If you have GPU credit, you can also add the GPU flag --gputo your command. This will make it approximately 50x faster

    如果您拥有GPU积分,还可以在命令中添加GPU标志--gpu 。 这将使其速度提高约50倍

Go to the Jupyter notebook. Under the Jobs tab on the FloydHub website, click on the Jupyter Notebook link and navigate to this file:

转到Jupyter笔记本。 在FloydHub网站上的Jobs选项卡下,单击Jupyter Notebook链接并导航到以下文件:

floydhub/Alpha version/working_floyd_pink_light_full.ipynb

Open it and click Shift+Enter on all the cells.

打开它,然后在所有单元格上单击Shift + Enter。

Gradually increase the epoch value to get a feel for how the neural network learns.

逐渐增加历元值,以了解神经网络的学习方式。

model.fit(x=X, y=Y, batch_size=1, epochs=1)

Start with an epoch value of 1 and the increase it to 10, 100, 500, 1000 and 3000. The epoch value indicates how many times the neural network learns from the image. You will find the image img_result.pngin the main folder once you’ve trained your neural network.

从一个时期值1开始,然后将其增加到10、100、500、1000和3000。该时期值表明神经网络从图像中学习了多少次。 训练好神经网络后,您将在主文件夹中找到img_result.png图像。

# Get imagesimage = img_to_array(load_img('woman.png'))image = np.array(image, dtype=float)
# Import map images into the lab colorspaceX = rgb2lab(1.0/255*image)[:,:,0]Y = rgb2lab(1.0/255*image)[:,:,1:]Y = Y / 128X = X.reshape(1, 400, 400, 1)Y = Y.reshape(1, 400, 400, 2)
# Building the neural networkmodel = Sequential()model.add(InputLayer(input_shape=(None, None, 1)))model.add(Conv2D(8, (3, 3), activation='relu', padding='same', strides=2))model.add(Conv2D(8, (3, 3), activation='relu', padding='same'))model.add(Conv2D(16, (3, 3), activation='relu', padding='same'))model.add(Conv2D(16, (3, 3), activation='relu', padding='same', strides=2))model.add(Conv2D(32, (3, 3), activation='relu', padding='same'))model.add(Conv2D(32, (3, 3), activation='relu', padding='same', strides=2))model.add(UpSampling2D((2, 2)))model.add(Conv2D(32, (3, 3), activation='relu', padding='same'))model.add(UpSampling2D((2, 2)))model.add(Conv2D(16, (3, 3), activation='relu', padding='same'))model.add(UpSampling2D((2, 2)))model.add(Conv2D(2, (3, 3), activation='tanh', padding='same'))
# Finish modelmodel.compile(optimizer='rmsprop',loss='mse')
#Train the neural networkmodel.fit(x=X, y=Y, batch_size=1, epochs=3000)print(model.evaluate(X, Y, batch_size=1))
# Output colorizationsoutput = model.predict(X)output = output * 128canvas = np.zeros((400, 400, 3))canvas[:,:,0] = X[0][:,:,0]canvas[:,:,1:] = output[0]imsave("img_result.png", lab2rgb(canvas))imsave("img_gray_scale.png", rgb2gray(lab2rgb(canvas)))

FloydHub command to run this network:

FloydHub命令运行此网络:

floyd run --data emilwallner/datasets/colornet/2:data --mode jupyter --tensorboard

技术说明 (Technical explanation)

To recap, the input is a grid representing a black and white image. It outputs two grids with color values. Between the input and output values, we create filters to link them together. This is a convolutional neural network.

概括地说,输入是一个代表黑白图像的网格。 它输出带有颜色值的两个网格。 在输入和输出值之间,我们创建了将它们链接在一起的过滤器。 这是一个卷积神经网络。

When we train the network, we use colored images. We convert RGB colors to the Lab color space. The black and white layer is our input and the two colored layers are the output.

训练网络时,我们使用彩色图像。 我们将RGB颜色转换为Lab颜色空间。 黑色和白色层是我们的输入,两个彩色层是输出。

To the left side, we have the B&W input, our filters, and the prediction from our neural network.

在左侧,我们有B&W输入,过滤器以及来自神经网络的预测。

We map the predicted values and the real values within the same interval. This way, we can compare the values. The interval ranges from -1 to 1. To map the predicted values, we use a tanh activation function. For any value you give the tanh function, it will return -1 to 1.

我们在相同的时间间隔内映射预测值和实际值。 这样,我们可以比较这些值。 间隔范围是-1到1。要映射预测值,我们使用tanh激活函数。 对于您给tanh函数的任何值,它将返回-1到1。

The true color values range between -128 and 128. This is the default interval in the Lab color space. By dividing them by 128, they too fall within the -1 to 1 interval. This “normalization” enables us to compare the error from our prediction.

真实颜色值的范围是-128至128。这是Lab颜色空间中的默认间隔。 通过将它们除以128,它们也落在-1到1的间隔内。 这种“规范化”使我们能够比较预测中的误差。

After calculating the final error, the network updates the filters to reduce the total error. The network continues in this loop until the error is as low as possible.

计算最终错误后,网络将更新过滤器以减少总错误。 网络将继续此循环,直到错误尽可能降低为止。

Let’s clarify some syntax in the code snippet.

让我们在代码片段中阐明一些语法。

X = rgb2lab(1.0/255*image)[:,:,0]Y = rgb2lab(1.0/255*image)[:,:,1:]

1.0/255 indicates that we are using a 24-bit RGB color space. It means that we are using numbers between 0–255 for each color channel. This results in 16.7 million color combinations.

1.0 / 255表示我们正在使用24位RGB颜色空间。 这意味着我们为每个颜色通道使用0–255之间的数字。 这导致1670万种颜色组合。

Since humans can only perceive 2–10 million colors, it does not make much sense to use a larger color space.

由于人类只能感知2–1000万种颜色,因此使用更大的颜色空间没有多大意义。

Y = Y / 128

The Lab color space has a different range in comparison to RGB. The color spectrum ab in Lab ranges from -128 to 128. By dividing all values in the output layer by 128, we bound the range between -1 and 1.

与RGB相比,Lab色彩空间的范围有所不同。 Lab中的色谱ab范围是-128至128。通过将输出层中的所有值除以128,我们将范围限制在-1和1之间。

We match it with our neural network, which also returns values between -1 and 1.

我们将其与神经网络匹配,该神经网络还会返回-1和1之间的值。

After converting the color space using the function rgb2lab() we select the grayscale layer with: [ : , : , 0]. This is our input for the neural network. [ : , : , 1: ] selects the two color layers, green–red and blue–yellow.

使用功能rgb2lab()转换色彩空间后,我们选择具有[ : , : , 0].的灰度层[ : , : , 0]. 这是我们对神经网络的输入。 [ : , : , 1: ]选择两个颜色层,绿红色和蓝黄色。

After training the neural network, we make a final prediction which we convert into a picture.

在训练了神经网络之后,我们做出最终预测,并将其转换为图片。

output = model.predict(X)output = output * 128

Here, we use a grayscale image as input and run it through our trained neural network. We take all the output values between -1 and 1 and multiply it by 128. This gives us the correct color in the Lab color spectrum.

在这里,我们使用灰度图像作为输入,并通过我们训练有素的神经网络运行它。 我们将所有输出值放在-1和1之间,然后乘以128。这将为我们提供Lab色谱中正确的颜色。

canvas = np.zeros((400, 400, 3))canvas[:,:,0] = X[0][:,:,0]canvas[:,:,1:] = output[0]

Lastly, we create a black RGB canvas by filling it with three layers of 0s. Then we copy the grayscale layer from our test image. Then we add our two color layers to the RGB canvas. This array of pixel values is then converted into a picture.

最后,我们将三层0填充为黑色RGB画布。 然后,我们从测试图像中复制灰度层。 然后,我们将两个颜色层添加到RGB画布。 然后将该像素值数组转换为图片。

Alpha版本的要点 (Takeaways from the Alpha version)

  • Reading research papers is challenging. Once I summarized the core characteristics of each paper, it became easier to skim papers. It also allowed me to put the details into a context.

    阅读研究论文具有挑战性 。 一旦我总结了每篇论文的核心特征,就可以更轻松地浏览论文。 这也使我能够将细节放在上下文中。

  • Starting simple is key. Most of the implementations I could find online were 2–10K lines long. That made it hard to get an overview of the core logic of the problem. Once I had a barebones version, it became easier to read both the code implementation, and also the research papers.

    从简单开始是关键 。 我可以在网上找到的大多数实现都是2到10K行。 这使得很难大致了解问题的核心逻辑。 一旦有了准系统版本,就可以更轻松地阅读代码实现和研究论文。

  • Explore public projects. To get a rough idea for what to code, I skimmed 50–100 projects on colorization on Github.

    探索公共项目。 为了大致了解编码内容,我在Github上浏览了50–100个有关着色的项目。

  • Things won’t always work as expected. In the beginning, it could only create red and yellow colors. At first, I had a Relu activation function for the final activation. Since it only maps numbers into positive digits, it could not create negative values, the blue and green spectrums. Adding a tanh activation function and mapping the Y values fixed this.

    事情不会总是按预期进行。 一开始,它只能创建红色和黄色。 首先,我有一个Relu激活功能来进行最终激活。 由于它仅将数字映射为正数,因此无法创建负值,即蓝色和绿色光谱。 添加tanh激活函数并映射Y值可解决此问题。

  • Understanding > Speed. Many of the implementations I saw were fast but hard to work with. I chose to optimize for innovation speed instead of code speed.

    理解>速度 。 我看到的许多实现都很快速,但是很难使用。 我选择针对创新速度而不是代码速度进行优化。

测试版 (Beta version)

To understand the weakness of the alpha version, try coloring an image it has not been trained on. If you try it, you’ll see that it makes a poor attempt. It’s because the network has memorized the information. It has not learned how to color an image it hasn’t seen before. But this is what we’ll do in the beta version. We’ll teach our network to generalize.

要了解Alpha版本的弱点,请尝试为未经训练的图像着色。 如果尝试一下,您会发现尝试失败。 这是因为网络已经存储了该信息。 它还没有学会如何给以前从未见过的图像着色。 但这是我们将在Beta版中进行的操作。 我们将教我们的网络进行概括。

Below is the result of coloring the validation images with our beta version.

以下是使用我们的Beta版本为验证图像着色的结果。

Instead of using Imagenet, I created a public dataset on FloydHub with higher quality images. The images are from Unsplash — creative commons pictures by professional photographers. It includes 9,500 training images and 500 validation images.

我没有使用Imagenet,而是在FloydHub上创建了具有更高质量图像的公共数据集 。 这些图像来自Unsplash ,这是专业摄影师的创意照片。 它包括9,500个训练图像和500个验证图像。

特征提取器 (The feature extractor)

Our neural network finds characteristics that link grayscale images with their colored versions.

我们的神经网络发现了将灰度图像与其彩色版本链接起来的特征。

Imagine you had to color black and white images — but with restriction that you can only see nine pixels at a time. You could scan each image from the top left to bottom right and try to predict which color each pixel should be.

想象一下,您必须为黑白图像着色-但由于限制,一次只能看到9个像素。 您可以从左上角到右下角扫描每个图像,并尝试预测每个像素应为哪种颜色。

For example, these nine pixels are the edge of the nostril from the woman just above. As you can imagine, it’d be next to impossible to make a good colorization, so you break it down into steps.

例如,这九个像素是上一个女人的鼻Kong边缘。 您可以想象,要进行良好的着色几乎是不可能的,因此您将其分解为若干步骤。

First, you look for simple patterns: a diagonal line, all black pixels, and so on. You look for the same exact pattern in each square and remove the pixels that don’t match. You generate 64 new images from your 64 mini filters.

首先,您寻找简单的图案:对角线,所有黑色像素,依此类推。 您在每个正方形中寻找相同的确切图案,并删除不匹配的像素。 您可以从64个微型滤镜生成64个新图像。

If you scan the images again, you’d see the same small patterns you’ve already detected. To gain a higher level understanding of the image, you decrease the image size in half.

如果再次扫描图像,将会看到与已经检测到的相同的小图案。 为了对图像有更高的了解,请将图像尺寸减小一半。

You still only have a 3x3 filter to scan each image. But by combining your new nine pixels with your lower level filters, you can detect more complex patterns. One pixel combination might form a half circle, a small dot, or a line. Again, you repeatedly extract the same pattern from the image. This time, you generate 128 new filtered images.

您仍然只有3x3滤镜来扫描每个图像。 但是,通过将新的九个像素与较低级别的滤镜组合在一起,您可以检测到更复杂的图案。 一个像素组合可能会形成一个半圆,一个小点或一条线。 同样,您反复从图像中提取相同的图案。 这次,您生成128个新的过滤图像。

After a couple of steps the filtered images you produce might look something like these:

经过几个步骤,您生成的过滤图像可能看起来像这样:

As mentioned, you start with low-level features, such as an edge. Layers closer to the output are combined into patterns. Then, they are combined into details, and eventually transformed into a face. This video tutorial provides a further explanation.

如前所述,您从低级功能(例如边缘)开始。 靠近输出的图层将合并为图案。 然后,将它们组合成细节,并最终变成一张脸。 该视频教程提供了进一步的说明。

The process is similar to that of most neural networks that deal with vision. The type of network here is known as a convolutional neural network. In these networks, you combine several filtered images to understand the context in the image.

该过程与大多数处理视觉的神经网络相似。 这里的网络类型称为卷积神经网络。 在这些网络中,您组合了几个过滤后的图像以了解图像中的上下文。

从特征提取到颜色 (From feature extraction to color)

The neural network operates in a trial and error manner. It first makes a random prediction for each pixel. Based on the error for each pixel, it works backward through the network to improve the feature extraction.

神经网络以反复试验的方式运行。 它首先对每个像素进行随机预测。 根据每个像素的误差,它通过网络向后工作以改善特征提取。

It starts adjusting for the situations that generate the largest errors. In this case, the adjustments are: whether to color or not, and how to locate different objects.

它开始针对产生最大错误的情况进行调整。 在这种情况下,调整包括:是否着色以及如何定位不同的对象。

The network starts by coloring all the objects brown. It’s the color that is most similar to all other colors, thus producing the smallest error.

网络首先将所有对象都着色为棕色。 该颜色与所有其他颜色最相似,因此产生的误差最小。

Because most of the training data is quite similar, the network struggles to differentiate between different objects. It will fail to generate more nuanced colors. That’s what we’ll explore in the full version.

由于大多数训练数据非常相似,因此网络很难区分不同的对象。 它将无法产生更多细微的色彩。 这就是我们将在完整版中探讨的内容。

Below is the code for the beta version, followed by a technical explanation of the code.

下面是Beta版的代码,然后是代码的技术说明。

# Get imagesX = []for filename in os.listdir('../Train/'):    X.append(img_to_array(load_img('../Train/'+filename)))X = np.array(X, dtype=float)
# Set up training and test datasplit = int(0.95*len(X))Xtrain = X[:split]Xtrain = 1.0/255*Xtrain
#Design the neural networkmodel = Sequential()model.add(InputLayer(input_shape=(256, 256, 1)))model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))model.add(Conv2D(64, (3, 3), activation='relu', padding='same', strides=2))model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))model.add(Conv2D(128, (3, 3), activation='relu', padding='same', strides=2))model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))model.add(Conv2D(256, (3, 3), activation='relu', padding='same', strides=2))model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))model.add(UpSampling2D((2, 2)))model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))model.add(UpSampling2D((2, 2)))model.add(Conv2D(32, (3, 3), activation='relu', padding='same'))model.add(Conv2D(2, (3, 3), activation='tanh', padding='same'))model.add(UpSampling2D((2, 2)))
# Finish modelmodel.compile(optimizer='rmsprop', loss='mse')
# Image transformerdatagen = ImageDataGenerator(        shear_range=0.2,        zoom_range=0.2,        rotation_range=20,        horizontal_flip=True)
# Generate training databatch_size = 50def image_a_b_gen(batch_size):    for batch in datagen.flow(Xtrain, batch_size=batch_size):        lab_batch = rgb2lab(batch)        X_batch = lab_batch[:,:,:,0]        Y_batch = lab_batch[:,:,:,1:] / 128        yield (X_batch.reshape(X_batch.shape+(1,)), Y_batch)
# Train modelTensorBoard(log_dir='/output')model.fit_generator(image_a_b_gen(batch_size), steps_per_epoch=10000, epochs=1)# Test imagesXtest = rgb2lab(1.0/255*X[split:])[:,:,:,0]Xtest = Xtest.reshape(Xtest.shape+(1,))Ytest = rgb2lab(1.0/255*X[split:])[:,:,:,1:]Ytest = Ytest / 128print model.evaluate(Xtest, Ytest, batch_size=batch_size)
# Load black and white imagescolor_me = []for filename in os.listdir('../Test/'):        color_me.append(img_to_array(load_img('../Test/'+filename)))color_me = np.array(color_me, dtype=float)color_me = rgb2lab(1.0/255*color_me)[:,:,:,0]color_me = color_me.reshape(color_me.shape+(1,))
# Test modeloutput = model.predict(color_me)output = output * 128
# Output colorizationsfor i in range(len(output)):        cur = np.zeros((256, 256, 3))        cur[:,:,0] = color_me[i][:,:,0]        cur[:,:,1:] = output[i]        imsave("result/img_"+str(i)+".png", lab2rgb(cur))

Here’s the FloydHub command to run the Beta neural network:

这是FloydHub命令,用于运行Beta神经网络:

floyd run --data emilwallner/datasets/colornet/2:data --mode jupyter --tensorboard

技术说明 (Technical explanation)

The main difference from other visual neural networks is the importance of pixel location. In coloring networks, the image size or ratio stays the same throughout the network. In other types of network, the image gets distorted the closer it gets to the final layer.

与其他视觉神经网络的主要区别在于像素位置的重要性。 在着色网络中,图像大小或比率在整个网络中保持不变。 在其他类型的网络中,图像越接近最终层,图像就会失真。

The max-pooling layers in classification networks increase the information density, but also distort the image. It only values the information, but not the layout of an image. In coloring networks we instead use a stride of 2, to decrease the width and height by half. This also increases information density but does not distort the image.

分类网络中的最大池层增加了信息密度,但同时也扭曲了图像。 它仅重视信息,而不重视图像的布局。 在着色网络中,我们改为使用跨度为2,以将宽度和高度减小一半。 这也增加了信息密度,但不会使图像失真。

Two further differences are: upsampling layers and maintaining the image ratio. Classification networks only care about the final classification. Therefore, they keep decreasing the image size and quality as it moves through the network.

另外两个区别是:上采样层和保持图像比率。 分类网络只关心最终分类。 因此,随着图像在网络中移动,他们不断减小图像的大小和质量。

Coloring networks keep the image ratio constant. This is done by adding white padding like the visualization above. Otherwise, each convolutional layer cuts the images. It’s done with the *padding='same'*parameter.

着色网络使图像比率保持恒定。 可以通过添加白色填充来完成,如上面的图表所示。 否则,每个卷积层都会剪切图像。 这是通过*padding='same'*参数完成的。

To double the size of the image, the coloring network uses an upsampling layer.

为了使图像大小增加一倍,着色网络使用了升采样层 。

for filename in os.listdir('/Color_300/Train/'):    X.append(img_to_array(load_img('/Color_300/Test'+filename)))

This for-loop first counts all the file names in the directory. Then, it iterates through the image directory and converts the images into an array of pixels. Finally, it combines them into a giant vector.

此for循环首先计算目录中的所有文件名。 然后,遍历图像目录并将图像转换为像素数组。 最后,它将它们组合成一个巨大的向量。

datagen = ImageDataGenerator(        shear_range=0.2,        zoom_range=0.2,        rotation_range=20,        horizontal_flip=True)

With ImageDataGenerator, we adjust the setting for our image generator. This way, each image will never be the same, thus improving the learning rate. The shear_rangetilts the image to the left or right, and the other settings are zoom, rotation and horizontal-flip.

使用ImageDataGenerator ,我们可以调整图像生成器的设置。 这样,每个图像将永远不会相同,从而提高了学习速度。 shear_range将图像向左或向右倾斜,其他设置为缩放,旋转和水平翻转。

batch_size = 50def image_a_b_gen(batch_size):    for batch in datagen.flow(Xtrain, batch_size=batch_size):        lab_batch = rgb2lab(batch)        X_batch = lab_batch[:,:,:,0]        Y_batch = lab_batch[:,:,:,1:] / 128        yield (X_batch.reshape(X_batch.shape+(1,)), Y_batch)

We use the images from our folder, Xtrain, to generate images based on the settings above. Then, we extract the black and white layer for the X_batch and the two colors for the two color layers.

我们使用文件夹Xtrain中的图像根据上述设置生成图像。 然后,我们为X_batch提取黑白层,为两个颜色层提取两种颜色。

model.fit_generator(image_a_b_gen(batch_size), steps_per_epoch=1, epochs=1000)

The stronger the GPU you have, the more images you can fit into it. With this setup, you can use 50–100 images. steps_per_epoch is calculated by dividing the number of training images with your batch size.

您拥有的GPU越强大,就可以容纳更多的图像。 通过此设置,您可以使用50-100张图像。 通过将训练图像的数量除以您的批次大小来计算steps_per_epoch

For example: 100 images with a batch size of 50 gives 2 steps per epoch. The number of epochs determines how many times you want to train all images. 10K images with 21 epochs will take about 11 hours on a Tesla K80 GPU.

例如:批处理大小为50的100张图像在每个时期给出2个步骤。 时期数确定您要训练所有图像的次数。 在Tesla K80 GPU上,具有21个时期的10K图像大约需要11个小时。

外卖 (Takeaways)

  • Run a lot of experiments in smaller batches before you make larger runs. Even after 20–30 experiments, I still found mistakes. Just because it’s running doesn’t mean it’s working. Bugs in a neural network are often more nuanced than traditional programming errors. One of the more bizarre ones was my Adam hiccup.

    在进行较大的运行之前,请以较小的批次进行大量实验。 即使经过20–30次实验,我仍然发现错误。 仅仅因为它正在运行并不意味着它正在运行。 与传统的编程错误相比,神经网络中的错误通常更细微。 我的亚当(Adam)打ic是比较奇怪的一个。

  • A more diverse dataset makes the pictures brownish. If you have very similar images, you can get a decent result without needing a more complex architecture. The trade-off is the network becomes worse at generalizing.

    更加多样化的数据集使图片变成褐色。 如果图像非常相似 ,则无需复杂的体系结构即可获得不错的结果。 权衡是网络在泛化方面变得更糟。

  • Shapes, shapes, and shapes. The size of each image has to be exact and remain proportional throughout the network. In the beginning, I used an image size of 300. Halving this three times gives sizes of 150, 75, and 35.5. The result is losing half a pixel! This led to many “hacks” until I realized it’s better to use a power of two: 2, 8, 16, 32, 64, 256 and so on.

    形状,形状和形状。 每个图像的大小必须准确,并在整个网络中保持成比例。 最初,我使用的图像大小为300。将这三倍减半可以得到150、75和35.5的大小。 结果损失了半个像素! 这导致了许多“骇客”,直到我意识到最好使用2的幂:2、8、16、32、64、256等。

  • Creating datasets: a) Disable the .DS_Store file, it drove me crazy. b) Be creative. I ended up with a Chrome console script and an extension to download the files. c) Make a copy of the raw files you scrape and structure your cleaning scripts.

    创建数据集: 禁用 .DS_Store文件,这让我发疯。 b)有创造力。 我最终得到了一个Chrome 控制台脚本和一个用于下载文件的扩展程序 。 c)复制您抓取的原始文件并构建清理脚本 。

完整版本 (Full-version)

Our final version of the colorization neural network has four components. We split the network we had before into an encoder and a decoder. Between them, we’ll use a fusion layer. If you are new to classification networks, I’d recommend having a look at this tutorial.

我们的着色神经网络的最终版本包含四个组成部分。 我们将之前的网络分为编码器和解码器。 在它们之间,我们将使用融合层。 如果您不熟悉分类网络,建议您阅读本教程 。

In parallel to the encoder, the input images also run through one of today’s most powerful classifiers — the Inception ResNet v2 . This is a neural network trained on 1.2M images. We extract the classification layer and merge it with the output from the encoder.

与编码器并行,输入图像还通过当今功能最强大的分类器之一-Inception ResNet v2运行 。 这是在1.2M图像上训练的神经网络。 我们提取分类层,并将其与编码器的输出合并。

Here is a more detailed visual from the original paper.

这是原始论文的更详细的视觉效果 。

By transferring the learning from the classifier to the coloring network, the network can get a sense of what’s in the picture. Thus, enabling the network to match an object representation with a coloring scheme.

通过将学习从分类器转移到着色网络,网络可以了解图片中的内容。 因此,使网络能够将对象表示与着色方案进行匹配。

Here are some of the validation images, using only 20 images to train the network on.

以下是一些验证图像,仅使用20张图像来训练网络。

Most of the images turned out poor. But I was able to find a few decent ones because of a large validation set (2,500 images). Training it on more images gave a more consistent result, but most of them turned out brownish. Here is a full list of the experiments I ran including the validation images.

大部分图像效果不佳。 但是由于验证集很大(2,500张图像),我能够找到一些不错的图像。 在更多图像上进行训练可获得更一致的结果,但大多数结果都显示为褐色。 这是我进行的实验的完整列表,其中包括验证图像。

Here are the most common architectures from previous research, with links:

以下是以前研究中最常见的体系结构,并提供链接:

  • Manually adding small dots of color in a picture to guide the neural network (link)

    在图片中手动添加颜色的小点以引导神经网络( 链接 )

  • Find a matching image and transfer the coloring (learn more here and here)

    查找匹配的图像并转移颜色( 在此处和此处了解更多信息 )

  • Residual encoder and merging classification layers (link)

    残留编码器和合并分类层( 链接 )

  • Merging hypercolumns from a classifying network (more detail here and here)

    合并分类网络中的超列( 此处和此处有更多详细信息)

  • Merging the final classification between the encoder and decoder (details here and here)

    合并编码器和解码器之间的最终分类( 此处和此处详细信息)

Colorspaces: Lab, YUV, HSV, and LUV (more detail here and here)

色彩空间:实验室,YUV,HSV和LUV( 此处和此处有更多详细信息)

Loss: Mean square error, classification, weighted classification (link)

损失:均方误差,分类,加权分类( 链接 )

I chose the ‘fusion layer’ architecture (the fifth one in the list above).

我选择了“融合层”架构(上面列表中的第五个)。

This was because it produces some of the best results. It is also easier to understand and reproduce in Keras. Although it’s not the strongest colorization network design, it is a good place to start. It’s a great architecture to understand the dynamics of the coloring problem.

这是因为它产生了一些最佳结果。 在Keras中也更容易理解和复制。 尽管它不是最强大的着色网络设计,但它是一个不错的起点。 了解着色问题的动态变化是一个很棒的体系结构。

I used the neural network design from this paper by Federico Baldassarre and collaborators. I proceeded with my own interpretation in Keras.

我使用了Federico Baldassarre及其合作者从本文中获得的神经网络设计。 我在Keras中进行了自己的解释。

Note: in the below code I switch from Keras’ sequential model to their functional API. [Documentation]

注意:在下面的代码中,我从Keras的顺序模型切换到其功能API。 [ 文档 ]

# Get imagesX = []for filename in os.listdir('/data/images/Train/'):    X.append(img_to_array(load_img('/data/images/Train/'+filename)))X = np.array(X, dtype=float)Xtrain = 1.0/255*X
#Load weightsinception = InceptionResNetV2(weights=None, include_top=True)inception.load_weights('/data/inception_resnet_v2_weights_tf_dim_ordering_tf_kernels.h5')inception.graph = tf.get_default_graph()
embed_input = Input(shape=(1000,))
#Encoderencoder_input = Input(shape=(256, 256, 1,))encoder_output = Conv2D(64, (3,3), activation='relu', padding='same', strides=2)(encoder_input)encoder_output = Conv2D(128, (3,3), activation='relu', padding='same')(encoder_output)encoder_output = Conv2D(128, (3,3), activation='relu', padding='same', strides=2)(encoder_output)encoder_output = Conv2D(256, (3,3), activation='relu', padding='same')(encoder_output)encoder_output = Conv2D(256, (3,3), activation='relu', padding='same', strides=2)(encoder_output)encoder_output = Conv2D(512, (3,3), activation='relu', padding='same')(encoder_output)encoder_output = Conv2D(512, (3,3), activation='relu', padding='same')(encoder_output)encoder_output = Conv2D(256, (3,3), activation='relu', padding='same')(encoder_output)
#Fusionfusion_output = RepeatVector(32 * 32)(embed_input) fusion_output = Reshape(([32, 32, 1000]))(fusion_output)fusion_output = concatenate([encoder_output, fusion_output], axis=3) fusion_output = Conv2D(256, (1, 1), activation='relu', padding='same')(fusion_output)
#Decoderdecoder_output = Conv2D(128, (3,3), activation='relu', padding='same')(fusion_output)decoder_output = UpSampling2D((2, 2))(decoder_output)decoder_output = Conv2D(64, (3,3), activation='relu', padding='same')(decoder_output)decoder_output = UpSampling2D((2, 2))(decoder_output)decoder_output = Conv2D(32, (3,3), activation='relu', padding='same')(decoder_output)decoder_output = Conv2D(16, (3,3), activation='relu', padding='same')(decoder_output)decoder_output = Conv2D(2, (3, 3), activation='tanh', padding='same')(decoder_output)decoder_output = UpSampling2D((2, 2))(decoder_output)
model = Model(inputs=[encoder_input, embed_input], outputs=decoder_output)
#Create embeddingdef create_inception_embedding(grayscaled_rgb):    grayscaled_rgb_resized = []    for i in grayscaled_rgb:        i = resize(i, (299, 299, 3), mode='constant')        grayscaled_rgb_resized.append(i)    grayscaled_rgb_resized = np.array(grayscaled_rgb_resized)    grayscaled_rgb_resized = preprocess_input(grayscaled_rgb_resized)    with inception.graph.as_default():        embed = inception.predict(grayscaled_rgb_resized)    return embed
# Image transformerdatagen = ImageDataGenerator(        shear_range=0.4,        zoom_range=0.4,        rotation_range=40,        horizontal_flip=True)
#Generate training databatch_size = 20
def image_a_b_gen(batch_size):    for batch in datagen.flow(Xtrain, batch_size=batch_size):        grayscaled_rgb = gray2rgb(rgb2gray(batch))        embed = create_inception_embedding(grayscaled_rgb)        lab_batch = rgb2lab(batch)        X_batch = lab_batch[:,:,:,0]        X_batch = X_batch.reshape(X_batch.shape+(1,))        Y_batch = lab_batch[:,:,:,1:] / 128        yield ([X_batch, create_inception_embedding(grayscaled_rgb)], Y_batch)
#Train model      tensorboard = TensorBoard(log_dir="/output")model.compile(optimizer='adam', loss='mse')model.fit_generator(image_a_b_gen(batch_size), callbacks=[tensorboard], epochs=1000, steps_per_epoch=20)
#Make a prediction on the unseen imagescolor_me = []for filename in os.listdir('../Test/'):    color_me.append(img_to_array(load_img('../Test/'+filename)))color_me = np.array(color_me, dtype=float)color_me = 1.0/255*color_mecolor_me = gray2rgb(rgb2gray(color_me))color_me_embed = create_inception_embedding(color_me)color_me = rgb2lab(color_me)[:,:,:,0]color_me = color_me.reshape(color_me.shape+(1,))
# Test modeloutput = model.predict([color_me, color_me_embed])output = output * 128
# Output colorizationsfor i in range(len(output)):    cur = np.zeros((256, 256, 3))    cur[:,:,0] = color_me[i][:,:,0]    cur[:,:,1:] = output[i]    imsave("result/img_"+str(i)+".png", lab2rgb(cur))

Here’s the FloydHub command to run the full neural network:

这是运行完整的神经网络的FloydHub命令:

floyd run --data emilwallner/datasets/colornet/2:data --mode jupyter --tensorboard

技术说明 (Technical Explanation)

Keras’ functional API is ideal when we are concatenating or merging several models.

当我们串联或合并多个模型时, Keras的功能性API是理想的选择。

First, we download the Inception ResNet v2 neural network and load the weights. Since we will be using two models in parallel, we need to specify which model we are using. This is done in Tensorflow, the backend for Keras.

首先,我们下载Inception ResNet v2神经网络并加载权重。 由于我们将并行使用两个模型,因此我们需要指定正在使用的模型。 这是在Tensorflow(Keras的后端)中完成的。

inception = InceptionResNetV2(weights=None, include_top=True)inception.load_weights('/data/inception_resnet_v2_weights_tf_dim_ordering_tf_kernels.h5')inception.graph = tf.get_default_graph()

To create our batch, we use the tweaked images. We conver them to black and white and run them through the Inception ResNet model.

要创建批处理,我们使用调整后的图像。 我们将它们转换为黑白,并通过Inception ResNet模型运行它们。

grayscaled_rgb = gray2rgb(rgb2gray(batch))embed = create_inception_embedding(grayscaled_rgb)

First, we have to resize the image to fit into the Inception model. Then we use the preprocessor to format the pixel and color values according to the model. In the final step, we run it through the Inception network and extract the final layer of the model.

首先,我们必须调整图像大小以适合Inception模型。 然后,我们使用预处理器根据模型格式化像素和颜色值。 在最后一步,我们通过Inception网络运行它并提取模型的最后一层。

def create_inception_embedding(grayscaled_rgb):    grayscaled_rgb_resized = []    for i in grayscaled_rgb:        i = resize(i, (299, 299, 3), mode='constant')        grayscaled_rgb_resized.append(i)    grayscaled_rgb_resized = np.array(grayscaled_rgb_resized)    grayscaled_rgb_resized = preprocess_input(grayscaled_rgb_resized)    with inception.graph.as_default():        embed = inception.predict(grayscaled_rgb_resized)    return embed

Let’s go back to the generator. For each batch, we generate 20 images in the below format. It takes about an hour on a Tesla K80 GPU. It can do up to 50 images at a time with this model without having memory problems.

让我们回到生成器。 对于每一批,我们将按照以下格式生成20张图像。 在Tesla K80 GPU上大约需要一个小时。 使用此模型,它一次最多可以处理50张图像,而不会出现内存问题。

yield ([X_batch, create_inception_embedding(grayscaled_rgb)], Y_batch)

This matches with our colornet model format.

这与我们的colornet模型格式匹配。

model = Model(inputs=[encoder_input, embed_input], outputs=decoder_output)

encoder_inputis fed into our Encoder model, the output of the Encoder model is then fused with the embed_inputin the fusion layer; the output of the fusion is then used as input in our Decoder model, which then returns the final output, decoder_output.

encoder_input输入到我们的Encoder模型中,然后将Encoder模型的输出与融合层中的embed_input融合; 然后将融合的输出用作我们的Decoder模型中的输入,然后返回最终输出decoder_output

fusion_output = RepeatVector(32 * 32)(embed_input) fusion_output = Reshape(([32, 32, 1000]))(fusion_output)fusion_output = concatenate([fusion_output, encoder_output], axis=3) fusion_output = Conv2D(256, (1, 1), activation='relu')(fusion_output)

In the fusion layer, we first multiply the 1000 category layer by 1024 (32 * 32). This way, we get 1024 rows with the final layer from the Inception model.

在融合层中,我们首先将1000类别层乘以1024(32 * 32)。 这样,我们从Inception模型的最后一层获得1024行。

This is then reshaped from 2D to 3D, a 32 x 32 grid with the 1000 category pillars. These are then linked together with the output from the encoder model. We apply a 254 filtered convolutional network with a 1X1 kernel, the final output of the fusion layer.

然后将其从2D变为3D,即具有1000个类别Struts的32 x 32网格。 然后将它们与编码器模型的输出链接在一起。 我们应用带有1X1内核的254过滤卷积网络,这是融合层的最终输出。

外卖 (Takeaways)

  • The research terminology was daunting. I spent three days googling for ways to implement the “fusion model” in Keras. Because it sounded complex, I didn’t want to face the problem. Instead, I tricked myself into searching for short cuts.

    研究术语令人生畏。 我花了三天的时间来研究在Keras中实现“融合模型”的方法。 因为听起来很复杂,所以我不想面对这个问题。 相反,我欺骗自己寻找捷径。

  • I asked questions online. I didn’t have a single comment in the Keras slack channel and Stack Overflow deleted my questions. But, by publicly breaking down the problem to make it simple to answer, it forced me to isolate the error, taking me closer to a solution.

    我在网上问问题。 Keras松弛频道中没有任何评论,Stack Overflow删除了我的问题。 但是,通过公开分解问题以使其易于回答,这迫使我隔离错误,使我更接近解决方案。

  • Email people. Although forums can be cold, people care if you connect with them directly. Discussing color spaces over Skype with a researcher is inspiring!

    给人们发送电子邮件。 尽管论坛可能会很冷淡,但人们会关心您是否直接与他们联系。 与研究人员讨论Skype上的色彩空间令人振奋!

  • After delaying on the fusion problem, I decided to build all the components before I stitched them together. Here are a few experiments I used to break down the fusion layer.

    延迟了融合问题后,我决定在将所有组件缝合在一起之前先构建它们。 这是我用来分解融合层的一些实验 。

  • Once I had something I thought would work, I was hesitant to run it. Although I knew the core logic was okay, I didn’t believe it would work. After a cup of lemon tea and a long walk — I ran it. It produced an error after the first line in my model. But after four days, several hundred bugs and several thousand Google searches, “Epoch 1/22” appeared under my model.

    一旦有了我认为可以工作的东西,我就会犹豫要运行它。 尽管我知道核心逻辑还可以,但我不相信它会起作用。 喝完一杯柠檬茶并走了很长一段路后–我跑了。 在我模型的第一行之后,它产生了一个错误。 但是经过四天,几百个错误和几千次Google搜索,在我的模型下出现了“ Epoch 1/22”。

下一步 (Next steps)

Colorizing images is a deeply fascinating problem. It is as much as a scientific problem as artistic one. I wrote this article so you can get up to speed in coloring and continue where I left off. Here are some suggestions to get started:

使图像着色是一个非常引人入胜的问题。 这既是艺术问题,又是科学问题。 我写了这篇文章,以便您可以快速上色并继续我的工作。 以下是一些入门建议:

  • Implement it with another pre-trained model
    用另一个预先训练的模型实施
  • Try a different dataset
    尝试其他数据集
  • Increase the network’s accuracy by using more pictures
    通过使用更多图片来提高网络的准确性
  • Build an amplifier within the RGB color space. Create a similar model to the coloring network, that takes a saturated colored image as input and the correct colored image as output.
    在RGB颜色空间内构建一个放大器。 创建与着色网络类似的模型,该模型将饱和的彩色图像作为输入,将正确的彩色图像作为输出。
  • Implement a weighted classification
    实施加权分类
  • Apply it to video. Don’t worry too much about the colorization, but make the switch between images consistent. You could also do something similar for larger images, by tiling smaller ones.
    将其应用于视频。 不必过多担心着色问题,但可以使图像之间的切换保持一致。 您也可以通过平铺较小的图像来处理较大的图像。

You can also easily colorize your own black and white images with my three versions of the colorization neural network using FloydHub.

您还可以使用FloydHub使用我的三个版本的着色神经网络轻松地为自己的黑白图像着色。

  • For the alpha version, simply replace the woman.jpgfile with your file with the same name (image size 400x400 pixels).

    对于Alpha版本,只需将woman.jpg文件(图像尺寸为400x400像素)替换为woman.jpg文件即可。

  • For the beta and the full version, add your images to the Testfolder before you run the FloydHub command. You can also upload them directly in the Notebook to the Test folder while the notebook is running. Note that these images need to be exactly 256x256 pixels. Also, you can upload all test images in color because it will automatically convert them into B&W.

    对于Beta版和完整版,请在运行FloydHub命令之前将图像添加到Test文件夹中。 您还可以在笔记本计算机运行时将它们直接从笔记本计算机上传到“测试”文件夹。 请注意,这些图像必须恰好是256x256像素。 另外,您可以上传所有彩色测试图像,因为它将自动将其转换为黑白。

If you build something or get stuck, ping me on twitter: emilwallner. I’d love to see what you are building.

如果您有建物或被卡住,请在Twitter上给我ping: emilwallner 。 我很想看看你在建什么。

Huge thanks to Federico Baldassarre, for answering my questions and their previous work on colorization. Also thanks to Muthu Chidambaram, who influenced the core implementation in Keras, and the Unsplash community for providing the pictures. Thanks also to Marine Haziza, Valdemaras Repsys, Qingping Hou, Charlie Harrington, Sai Soundararaj, Jannes Klaas, Claudio Cabral, Alain Demenet, and Ignacio Tonoli for reading drafts of this.

非常感谢 Federico Baldassarre回答了我的问题以及他们以前的着色工作。 还要感谢影响Keras核心实现的Muthu Chidambaram,以及Unsplash社区提供了图片。 还要感谢Marine Haziza,Valdemaras Repsys,侯庆平,Charlie Harrington,Sai Soundararaj,Jannes Klaas,Claudio Cabral,Alain Demenet和Ignacio Tonoli的阅读稿。

关于埃米尔·沃纳(Emil Wallner) (About Emil Wallner)

This the third part in a multi-part blog series from Emil as he learns deep learning. Emil has spent a decade exploring human learning. He’s worked for Oxford’s business school, invested in education startups, and built an education technology business. Last year, he enrolled at Ecole 42 to apply his knowledge of human learning to machine learning.

这是Emil学习深度学习的博客系列的第三部分。 埃米尔(Emil)花了十年时间探索人类学习。 他曾在牛津大学商学院工作,投资了教育创业公司,并建立了教育技术公司。 去年,他加入了42大学 ,将他对人类学习的知识应用于机器学习。

You can follow along with Emil on Twitter and Medium.

您可以在Twitter和Medium上与Emil一起关注 。

This was first published as a community post on Floydhub’s blog.

它最初是作为社区帖子发布在Floydhub的博客上的。

翻译自: https://www.freecodecamp.org/news/colorize-b-w-photos-with-a-100-line-neural-network-53d9b4449f8d/

github怎么为代码着色

github怎么为代码着色_如何仅用100行神经网络代码为黑白照片着色相关推荐

  1. hacker代码_如何仅用7行R代码构建Hacker News Frontpage抓取工具

    hacker代码 by AMR 通过AMR 如何仅用7行R代码构建Hacker News Frontpage抓取工具 (How to build a Hacker News Frontpage scr ...

  2. 100行的python作品详解_漫画喵的100行Python代码逆袭

    小喵的唠叨话:这次的博客,讲的是使用python编写一个爬虫工具.为什么要写这个爬虫呢?原因是小喵在看完<极黑的布伦希尔特>这个动画之后,又想看看漫画,结果发现各大APP都没有资源,最终好 ...

  3. python编程100行_自己动手写100行Python代码抢火车票!

    今年你不妨自己写一段代码来抢回家的火车票,是不是很Cool. 先准备好: 1)12306网站用户名和密码 2)chrome浏览器及下载chromedriver 3)下载Python代码 代码用的Pyt ...

  4. python模拟别人说话的声音_如何用100行Python代码做出魔性声控游戏“八分音符酱”...

    最近几天,一款魔性的小游戏在微博上刷屏了,各大平台的主播也纷纷如感染病毒一样直播自己怎么玩这个游戏(被游戏玩). 这个游戏叫做<不要停!八分音符酱♪>.它是一款来自岛国的恶搞游戏,主角是一 ...

  5. 100行JavaScript代码实现JavaScript

    先看效果: 100行JavaScript代码实现经典游戏俄罗斯方块 新建一个html文件,复制如下代码,用浏览器打开即可: <!doctype html> <html> < ...

  6. 100行php代码实现加密端口转发

    2019独角兽企业重金招聘Python工程师标准>>> 如题,这个是100行php代码实现代理上网最终篇,玩具性质,个人用可以,因为1性能不高,2加解密IV重复使用. 用法就是soc ...

  7. 100个必会的python脚本-100行Python代码实现自动抢火车票(附源码)

    前言 又要过年了,今年你不妨自己写一段代码来抢回家的火车票,是不是很Cool.下面话不多说了,来一起看看详细的介绍吧. 先准备好: 12306网站用户名和密码 chrome浏览器及下载chromedr ...

  8. sql行数少于10_如何用少于100行的代码创建生成艺术

    sql行数少于10 by Eric Davidson 埃里克·戴维森(Eric Davidson) 如何用少于100行的代码创建生成艺术 (How to Create Generative Art I ...

  9. c语言微信挑一挑编程,100行python代码实现微信跳一跳辅助程序

    写在前面 分享一下今天下午用python写的"跳一跳"小游戏的辅助程序.之前是准备用树莓派操控一个"机械手指"来代替人的触摸操作,但该方案还在酝酿中,实现了再分 ...

最新文章

  1. com.mysql.jdbc.PacketTooBigException: Packet for query is too large (1169 1024)
  2. iOS pop至指定页面
  3. 安卓手机可以用python编程软件-可以在手机上进行Java,Python的编程软件,你用过么?...
  4. 计算机网络及应用 pdf,计算机网络及应用卷.pdf
  5. 阻止计算机访问注册表,电脑怎么样防止注册表被强行的篡改,保护电脑安全
  6. 小白的算法初识课堂(part9)--SHA及Simhash算法
  7. 震撼!波士顿动力最新逆天机器人视频,倒立翻筋斗!人类集体沉默...
  8. python学习(十七) 扩展python
  9. ## CSP 201609-2 火车购票购买,C语言版(90分版)
  10. struts2登录注册示例_Struts2资源包和本地化示例
  11. secure CRT连接华三、华为模拟器
  12. 中兴c语言 面试题,华为,英飞凌,中兴硬件工程师面试题
  13. 同构 JavaScript 应用开发
  14. qqxml图片代码_分享三款高级qqxml消息卡片代码
  15. STM 贴片机流程记录
  16. AbstractApplicationContext#refresh
  17. 纪录片让你开阔眼界、增长见识
  18. u盘里的视频文件损坏了怎么修复?
  19. 学计算机颈椎,长期玩电脑颈椎病
  20. DynaSLAM源码笔记-检测动态物体部分梳理

热门文章

  1. php 富文本编辑器,富文本编辑器(php)
  2. 地下城3Dungeons 3 1.6
  3. 51单片机IIC 12864 OLED屏幕滚动显示仿真
  4. Spring生命周期注解之@PostConstruct,@PreDestroy
  5. BZOJ2460 [BeiJing2011]元素
  6. MPU6050传感器使用DMP库重启失败问题解决办法
  7. python爬虫分析
  8. Computed-属性-vs-Watched-属性
  9. Selenium webdriver安装配置
  10. Linux-内网服务器通过代理服务器访问外网