深层学习基础 (DEEP LEARNING BASICS)

Aim of this article is to provide an intuitive understanding behind the inner working of key layers in a convolution neural network. The idea is to go beyond simply stating the facts and exploring how image manipulation actually works.

本文的目的是提供对卷积神经网络中关键层内部工作的直观了解。这个想法不只是简单地陈述事实 ，而是探索图像处理的实际作用 。

目标 (The Objective)

Out aim is to design a deep learning framework capable of classify cat and dog images like those shown below. Let us start by thinking about what challenges such an algorithm must overcome.

最终目的是设计一个能够对猫和狗图像进行分类的深度学习框架，如下所示。 让我们首先考虑一下这种算法必须克服的挑战。

It should be able to detect cats and dogs of different color, size, shape, and breed. It must be able to detect and classify animals even from pictures where the dog or the cat is not entirely visible. It must be sensitive to presence of more than one dog in the image. Most importantly, the algorithm must be spatially invariant — it must be able to recognise dogs physically located in any corner of the image.

它应该能够检测出不同颜色，大小，形状和品种的猫和狗。 即使从狗或猫不完全可见的图片中，它也必须能够对动物进行检测和分类。 它必须对图像中有不止一只狗的情况敏感。 最重要的是，该算法必须在空间上不变-它必须能够识别物理上位于图像任何角落的狗。

计算机如何读取图像。 (How computer reads images.)

Images are composed of pixels that have values ranging from 0–255 that depict brightness. 0 means black, 255 is white and everything else is some shade of grey. More the pixels, better the image quality.

图像由像素组成，其值在0-255之间，表示亮度。 0表示黑色，255表示白色，其他所有内容均为灰色。 像素越多，图像质量越好。

While a greyscale image is made of a single channel (i.e. a 2D array). A color image in the RBG format is composed of three different layers, stacked on top of each other

灰度图像由单个通道(即2D阵列)组成。 RBG格式的彩色图像由三个不同的层组成，彼此堆叠

多层感知器的局限性。 (Limitations of a multi-layered perceptron.)

The contents of each pixel are fed into the perceptron separately. Each neuron processes a pixel in the input layer. For a image of dimensions 350*720, the total number of parameters to be learned for the input layer alone will be (350*720*3 (three channels for each pixel)*2 (two parameters per neuron, weight and bias)) 1.5 million. This number will scale linearly with number of layers, making the MLP an incredibly computationally intensive to learn. This however is not the only challeng with MLPs.

每个像素的内容分别送入感知器。 每个神经元处理输入层中的一个像素。 对于尺寸为350 * 720的图像，仅输入层要学习的参数总数将为(350 * 720 * 3(每个像素三个通道)* 2(每个神经元两个参数，权重和偏差) 150万。 该数目将随层数线性增长，这使得MLP的计算量大到难以学习。 但是，这并不是MLP的唯一挑战。

MLPs have no inbuilt mechanism for being spatially invariant. If a MLP has been trained to detect dogs in the top right corner of the image, it will fail when dogs are located in other positions. This is a serious drawback and in the subsequent sections we will discuss how to overcome this challenge.

MLP没有内置的机制来保持空间不变。 如果已训练MLP在图像的右上角检测狗，则当狗位于其他位置时，它将失败。 这是一个严重的缺陷，在接下来的部分中，我们将讨论如何克服这一挑战。

A convolution neural network aims to ameliorate these drawbacks using built-in mechanism for (1) extracting different high level features (2) introducing spatial invariance (3) improving networks learning ability.

卷积神经网络旨在使用内置机制来缓解这些缺点，该机制用于 (1)提取不同的高级特征(2)引入空间不变性(3)改善网络学习能力。

图像特征提取。 (Image feature extraction.)

Convolution (discrete convolution to be specific) is based on use to linear transformations to extract the key features of input images while preserving the ordering of information. The input is convolved with a kernel to generate the output, similar to the response generated by a network of neurons in the visual cortex.

卷积(具体来说是离散卷积)基于线性变换的使用，以在保持信息有序的同时提取输入图像的关键特征。 输入与内核进行卷积以生成输出，类似于视觉皮层中神经元网络生成的响应。

Kernel

核心

The kernel (also known as a filter or a feature detector) samples the input image matrix with a pre-determined step size (known as stride) in both horizontal and vertical directions. As the kernel slides over the input image, the element-wise product between each element of the kernel and overlapping elements of the input image is calculated to obtain to the output for the current location. When the input image is composed of multiple channels (which is almost always the case), the kernel has the same depth as the number of channels in the input image. The dot product in such cases is added to arrive at a feature map composed of a single channel.

内核(也称为过滤器或特征检测器)在水平和垂直方向上以预定步长(称为步幅)对输入图像矩阵进行采样。 当内核在输入图像上滑动时，将计算内核的每个元素与输入图像的重叠元素之间的逐元素乘积，以获取当前位置的输出。 当输入图像由多个通道组成时(几乎总是这样)，内核的深度与输入图像中通道的数量相同。 在这种情况下，将点积相加即可得出由单个通道组成的特征图。

**Convolution** : The image 卷积：将表示为张量为7 * 7 * 1的图像I represented as a tensor of dimension 7*7*1 is convolved with a 3*3 filter I与3 * 3滤镜K to result in a 5*5 output image. Shown above is one such step of the matrix multiplication process. K卷积，以生成5 * 5的输出图像。上面显示的是矩阵乘法过程中的一个这样的步骤。 Source资源

If you are new to matrix multiplication, check out this youtube video for a more detailed explanation.

如果您不熟悉矩阵乘法，请 观看此 youtube视频以获取更详细的说明。

**Single Stride Convolution** : This animation shows how a kernel scans through a input image from left to right and from top to bottom to result in a output image. For a stride one convolution, the kernel moves a unit distance in each direction during very step. **单步卷积(Single Stride Convolution)** ：此动画显示了内核如何从左到右，从上到下扫描输入图像以生成输出图像。对于跨步一次卷积，内核在非常大的一步中沿每个方向移动单位距离。 Source资源

While a CNN made of a single convolution layer would only be able to extract/learn low level features of the input image, adding successive convolution layers significantly improves the ability of the CNN to learn high level features.

尽管由单个卷积层构成的CNN仅能够提取/学习输入图像的低级特征，但是添加连续的卷积层会显着提高CNN学习高级特征的能力。

**Double Stride Convolution** : This animation shows how a kernel scans through a input image from left to right and from top to bottom to result in a output image. For a stride two convolution, the kernel moves two units distance in each direction during very step. **Double Stride Convolution** ：此动画显示了内核如何从左到右以及从上到下扫描输入图像以生成输出图像。对于大步两次卷积，内核在非常大的一步中沿每个方向移动了两个单位距离。 Source资源

Rectifier

整流器

To introduce non-linearity into the system and improve the learning capacity, the output from the convolution operation is passed through a non-saturating activation function like sigmoid or rectified linear unit (ReLU). Check out this excellent article about these and several other commonly used activation functions.

为了将非线性引入系统并提高学习能力，卷积运算的输出将通过非饱和激活函数(如S型或整流线性单元(ReLU))传递。 查看 关于这些以及其他几个常用激活功能的 出色文章 。

**Rectifer** : The two most widely used rectifier functions, sigmoid and ReLU.整流器：最广泛使用的两个整流器功能，S型和ReLU。

Padding

填充

The feature map resulting from convolution is smaller in size compared to the input image. For an input image of I*I that is convolved with a kernel of size K*K with a stride S, the output will be [(I-F)/S + 1]* [(I-F)/S + 1]. This can result in substantial reduction in image size in large CovNets made of several convolution layers. A zero padding of [(F-1)/2] all around the output image can be used to preserve the convolution output. Alternatively, the padding size itself can be turned into one of the hyperparameters that is learned during the training of the CNN.

与输入图像相比，由卷积产生的特征图的大小较小。 对于 I * I 的输入图像， 它 与步长为 S 的大小为 K * K 的内核卷积 ，输出将为 [(IF)/ S + 1] * [(IF)/ S +1] 。 在由多个卷积层组成的大型CovNet中，这可能会导致图像大小的显着减小。 输出图像周围的 [[F-1)/ 2] 零填充 可用于保留卷积输出。 备选地，填充大小本身可以变成在CNN的训练期间学习的超参数之一。

For the most general case where an input image of size I*I is convolved with a filter of size K*K with a stride S and padding P, the output will have the dimension [(I+2P-K)/S +1]*[(I+2P-K)/S +1].

对于最一般的情况，其中输入大小为 I * I的 图像与大小 为 K * K 且步幅为 S 且填充为 P 的滤波器进行卷积时 ，输出将具有 [[I + 2P-K)/ S +1 ] * [(I + 2P-K)/ S +1] 。

**Padding** : When a 5*5 image is convolved with a 3*3 kernel without padding, the resultant image is 3*3. A single layer of padding changes the input image dimensions to 7*7. This when convolved with a 3*3 filter results in a 5*5 output. 填充：将5 * 5图像与3 * 3内核卷积而不填充时，所得图像为3 * 3。单层填充将输入图像的尺寸更改为7 * 7。与3 * 3过滤器卷积时，将产生5 * 5的输出。 Source资源

Pooling

汇集

The convolution output is pooled so as to introduce spatial invariance i.e the ability to detect the same feature in different images. The idea here is to retain key information corresponding to important features that the CNN must learn and at the same time reduce image size by getting rid of insignificant information. While there are several variation, max pooling is the most commonly used strategy. The convolution product is split into non-overlapping patches of size K*K and only the maximum value of each patch is recorded in the output.

合并卷积输出以引入空间不变性，即在不同图像中检测相同特征的能力。 这里的想法是保留与CNN必须学习的重要功能相对应的关键信息，同时通过摆脱无关紧要的信息来减小图像尺寸。 尽管存在多种变体，但最大池化是最常用的策略。 卷积积被拆分为大小为 K * K的 非重叠面片， 并且仅每个面片的最大值记录在输出中。

**Max-Pooling** : A 4*4 input image is max-pooled with a 2*2 kernel resulting in a 2*2 output.最大池化：将4 * 4输入图像最大池化为2 * 2内核，从而产生2 * 2输出。

Other less frequently used pooling strategies include average pooling, ‘mixed’ max-average pooling, stochastic pooling, spatial pyramid pooling etc.

其他不常用的合并策略包括平均合并，“混合”最大平均合并，随机合并，空间金字塔合并等。

Let us summarise the concepts discussed so far as they apply to the VGGNet 16 architecture. Show below are the convolution layers of this network.

让我们总结一下所讨论的概念，直到它们适用于VGGNet 16架构。 下面显示的是该网络的卷积层。

**VGGNet 16 :** The 13 convolution layers of VGGNet16. **VGGNet 16：VGGNet16**的13个卷积层。 Source资源

This network excepts a image made of 224*224 pixels and 3 channels (corresponding to red, green and blue) as input. It is then processes through a series of convolution layers (shown in black), not all of which are followed by a max pooling step. Five distinct convolution blocks are depicted in the image above. All convolution steps use 3*3 kernels and all max pooling steps use a 2*2 kernel. The number of kernels used in each convolution block gradually increases, from 64 in first to 512 in the fourth and fifth convolution block. Initially two and later three convolution layers are used per block. This is important for increasing the receptive field since the kernel size is mantained constant throughout this architecture. The output of block five is passed through a maxpooling layer at the end, resulting in a 7*7*512 output. The output from the last convolution block is then fed into the fully connected layer discussed in the subsequent section. For a more detailed understanding of VGGNet, read the original paper.

该网络不包括由224 * 224像素和3个通道(分别对应于红色，绿色和蓝色)作为输入的图像。 然后，它通过一系列卷积层(以黑色显示)进行处理，并非所有卷积层之后都是最大池化步骤。 上图中描绘了五个不同的卷积块。 所有卷积步骤都使用3 * 3内核，所有最大池化步骤都使用2 * 2内核。 每个卷积块中使用的内核数量逐渐增加，从最初的64个增加到第四个和第五个卷积块中的512个。 最初，每个块使用两个和后来的三个卷积层。 这对于增加接收场很重要，因为在整个体系结构中内核大小保持恒定。 块5的输出最后通过maxpooling层，产生7 * 7 * 512的输出。 然后，最后一个卷积块的输出将馈送到下一节中讨论的完全连接的层中。 为了更详细地了解VGGNet，请阅读 原始文章 。

翻译自: https://medium.com/@aseem.kash/a-comprehensive-guide-to-convolution-neural-networks-4bc10584cbac

查看全文

http://www.taodudu.cc/news/show-863412.html

深度学习正则化正则化率_何时以及如何在深度学习中使用正则化
杨超越微数据_资料来源同意：数据科学技能超越数据
统计概率分布_概率统计中的重要分布
人口预测和阻尼-增长模型_使用分类模型预测利率-第1部分
基于kb的问答系统_1KB以下基于表的Q学习
图论为什么这么难_图论是什么，为什么要关心？
使用RNN和TensorFlow创建自己的Harry Potter短故事
bitnami如何使用_使用Bitnami获取完全配置的Apache Airflow Docker开发堆栈
cox风险回归模型参数估计_信用风险管理：分类模型和超参数调整
支持向量机回归分析_支持向量机和回归分析
ai/ml_您本周应阅读的有趣的AI / ML文章（8月15日）
chime-4 lstm_CHIME-6挑战赛回顾
文本文件加密和解密_解密文本见解和相关业务用例
有关糖尿病模型建立的论文_预测糖尿病结果的模型比较
chi-squared检验_每位数据科学家都必须具备Chi-S方检验统计量：客户流失中的案例研究
深度学习：在图像上找到手势_使用深度学习的人类情绪和手势检测器：第2部分
爆破登录测试网页_预测危险的地震爆破第一部分：EDA，特征工程和针对不平衡数据集的列车测试拆分
概率论在数据挖掘_为什么概率论在数据科学中很重要
集合计数二项式反演_对计数数据使用负二项式
使用TorchElastic训练DeepSpeech
神经网络架构搜索_神经网络架构
raspberry pi_通过串行蓝牙从Raspberry Pi传感器单元发送数据
问答机器人接口python_设计用于机器学习工程的Python接口
k均值算法二分k均值算法_如何获得K均值算法面试问题
支持向量机概念图解_支持向量机：基本概念
如何设置Jupiter Notebook服务器并从任何地方访问它（Windows 10）
无监督学习 k-means_监督学习-它意味着什么？
logistic 回归_具有Logistic回归的优秀初学者项目
脉冲多普勒雷达_是人类还是动物？多普勒脉冲雷达和神经网络的目标分类
pandas内置绘图_使用Pandas内置功能探索数据集

卷积神经网络如何解释和预测图像相关推荐

Landslide detection from an open satellite imagery 使用注意力增强卷积神经网络从开放的卫星图像和数字高程模型数据集检测滑坡
2020.01 武汉大学论文下载地址:https://sci-hub.st/10.1007/s10346-020-01353-2 目录 Landslide detection from an ope ...
Chemistry.AI | 基于图卷积神经网络（GCN）预测分子性质
GCN: Graph Convolutional Network(图卷积网络) 环境准备 Python版本:Python 3.6.8 PyTorch版本:PyTorch1.1.0 RDKit版本:RD ...
Chemistry.AI | 基于卷积神经网络（CNN）预测分子特性
CNN :Convolutional Neural Networks (卷积神经网络 ) 环境准备 Python版本:Python 3.6.8 PyTorch版本:PyTorch1.1.0 RDKit ...
基于卷积神经网络的序列特异性预测研究--云南大学范航恺硕士论文
基于卷积神经网络的序列特异性预测研究--云南大学范航恺硕士论文摘要研究背景研究现状研究意义与所作工作: 第二章生物学应用背景模体的概念模体的表示方法模体序列特异性的评价方法第三章深 ...
分类预测 | MATLAB实现CNN卷积神经网络多特征分类预测
分类预测 | MATLAB实现CNN卷积神经网络多特征分类预测目录分类预测 | MATLAB实现CNN卷积神经网络多特征分类预测分类效果基本介绍程序设计参考资料致谢分类效果基本介绍 ...
应用卷积神经网络对乳腺癌组织病理图像进行分类
在这里给大家分享一篇关于用深度学习进行乳腺癌识别的论文(原文地址),翻译成了中文以便大家快速学习,中间难免有疏忽遗漏的地方,请大家谅解. 深度医疗(2) -乳腺癌诊断识别 1. 介绍癌症是世界上一个 ...
一文带你解读：卷积神经网络自动判读胸部CT图像的机器学习原理
本文介绍了利用机器学习实现胸部CT扫描图像自动判读的任务,这对我来说是一个有趣的课题,因为它是我博士论文研究的重点.这篇文章的主要参考资料是我最近的预印本 "Machine-Learning ...
lime 深度学习_用LIME解释机器学习预测并建立信任
lime 深度学习 It's needless to say: machine learning is powerful. 不用说:机器学习功能强大. At the most basic level, ...
基于卷积神经网络CNN的水果分类预测，卷积神经网络水果等级识别
目录背影卷积神经网络CNN的原理卷积神经网络CNN的定义卷积神经网络CNN的神经元卷积神经网络CNN的激活函数卷积神经网络CNN的传递函数卷积神经网络CNN水果分类预测基本结构主要参 ...

卷积神经网络如何解释和预测图像

深层学习基础 (DEEP LEARNING BASICS)

目标 (The Objective)

计算机如何读取图像。 (How computer reads images.)

多层感知器的局限性。 (Limitations of a multi-layered perceptron.)

图像特征提取。 (Image feature extraction.)

相关文章：

卷积神经网络如何解释和预测图像相关推荐

最新文章

热门文章