1、池化层——Pooling Layer

池化运算：对信号进行“收集”并“总结”，类似水池收集水资源，因为得名池化层。

收集：多变少；

总结：最大值/平均值

1.1 nn.MaxPool2d

功能：对二维信号（图像）进行最大值池化；

nn.MaxPool2d(kernel_size, stride=None,padding=0, dilation=1,return_indices=False,ceil_mode=False)

kernel_size：池化核尺寸；
stride：步长；
padding：填充个数；
dilation：池化核间隔大小；
ceil_mode：池化过程中有一个除法操作，当不能整除时，如果参数设置为True，尺寸向上取整，默认参数值是False，向下取整；
return_indices：记录池化像素索引，记录最大值像素所在的位置索引，通常在最大值反池化中使用；

(最大值反池化示意图)(最大值反池化示意图)(最大值反池化示意图)】
早期的自编码器以及图像分割任务中都会涉及一个上采样的操作，会采用一个最大值反池化上采样。上图中的左边是一个正常的最大值池化下采样，将一个44的图像下采样到22，经过一系列的网络层之后来到一个上采样解码器的过程，就是将尺寸较小的图片上采样到尺寸较大的图片，这时就会涉及一个将像素值放到什么位置的问题。这时候就可以使用到在最大值池化中使用到的索引。

下面通过代码查看nn.MaxPool2d的具体使用方法：

import os
import torch
import random
import numpy as np
import torchvision
import torch.nn as nn
from torchvision import transforms
from matplotlib import pyplot as plt
from PIL import Image
from common_tools import transform_invert, set_seedset_seed(1)  # 设置随机种子path_img = os.path.join(os.path.dirname(os.path.abspath(__file__)), "lena.png")
img = Image.open(path_img).convert('RGB')  # 0~255# convert to tensor
img_transform = transforms.Compose([transforms.ToTensor()])
img_tensor = img_transform(img)
img_tensor.unsqueeze_(dim=0)    # C*H*W to B*C*H*Wmaxpool_layer = nn.MaxPool2d((2, 2), stride=(2, 2))   # input:(i, o, size) weights:(o, i , h, w)
img_pool = maxpool_layer(img_tensor)print("池化前尺寸:{}\n池化后尺寸:{}".format(img_tensor.shape, img_pool.shape))
img_pool = transform_invert(img_pool[0, 0:3, ...], img_transform)
img_raw = transform_invert(img_tensor.squeeze(), img_transform)
plt.subplot(122).imshow(img_pool)
plt.subplot(121).imshow(img_raw)
plt.show()

图片输出如下图所示：

通过图片可以发现，虽然图片的尺寸减小了，但是图片的质量还是可以的。池化操作可以实现一个冗余信息的剔除以及减小后面的计算量。

除了最大池化，还有另外一个常用的池化方法是平均池化。

1.2 nn.AvgPool2d

功能：对二维信号（图像）进行平均值池化；

nn.AvgPool2d(kernel_size, stride=None,padding=0, ceil_mode=False，count_include_pad=True,divisor_override=None)

主要参数：

kernel_size：池化核尺寸；
stride：步长；
padding：填充个数；
ceil_mode：尺寸向上取整；
count_include_pad：如果参数值为True，使用填充值用于计算；
divisor_override：求平均值的时候，可以不使用像素值的个数作为分母，而是使用除法因子；

下面通过代码学习nn.AvgPool2d：

import os
import torch
import random
import numpy as np
import torchvision
import torch.nn as nn
from torchvision import transforms
from matplotlib import pyplot as plt
from PIL import Image
from common_tools import transform_invert, set_seedset_seed(1)  # 设置随机种子path_img = os.path.join(os.path.dirname(os.path.abspath(__file__)), "lena.png")
img = Image.open(path_img).convert('RGB')  # 0~255# convert to tensor
img_transform = transforms.Compose([transforms.ToTensor()])
img_tensor = img_transform(img)
img_tensor.unsqueeze_(dim=0)    # C*H*W to B*C*H*Wavgpoollayer = nn.AvgPool2d((2, 2), stride=(2, 2))   # input:(i, o, size) weights:(o, i , h, w)
img_pool = avgpoollayer(img_tensor)print("池化前尺寸:{}\n池化后尺寸:{}".format(img_tensor.shape, img_pool.shape))
img_pool = transform_invert(img_pool[0, 0:3, ...], img_transform)
img_raw = transform_invert(img_tensor.squeeze(), img_transform)
plt.subplot(122).imshow(img_pool)
plt.subplot(121).imshow(img_raw)
plt.show()

得到的输出图片如下所示，对比于最大池化，平均池化的图片亮度比最大池化的稍微暗一点：

1.2 nn.MaxUnpool2d

功能：对二维信号（图像）进行最大值池化上采样，将小尺寸图片池化为大尺寸图片；

上图中的反池化过程将一个22的图片反池化为一个44的图片，这里涉及像素值应该放到哪一个位置的问题，放到哪一个位置由最大值池化中记录的最大值像素所在的位置，所在的索引。把最大值池化层中的最大值像素所在的位置传入到反池化层中，会根据从最大值池化中得到的索引将像素值放到对应的位置。所以最大值反池化层和池化层是类似的。唯一不同就是在forward()函数中药传入一个indices，也就是反池化需要的索引值。

nn.MaxUnpool2d(kernel_size,stride=None,padding=0)forward(self, input, indices, output_size=None)

主要参数：

kernel_size：池化核尺寸
stride：步长
padding：填充个数

下面通过代码学习最大值反池化：

# pooling
img_tensor = torch.randint(high=5, size=(1, 1, 4, 4), dtype=torch.float)  # randint()随机初始化一个4*4的图像，
maxpool_layer = nn.MaxPool2d((2, 2), stride=(2, 2), return_indices=True)  # 进行最大值池化
img_pool, indices = maxpool_layer(img_tensor)  # 记录最大值所在的indices# unpooling
img_reconstruct = torch.randn_like(img_pool, dtype=torch.float)  # 创建一个反池化的输入，这个输入的尺寸和池化之后的尺寸是一样的
maxunpool_layer = nn.MaxUnpool2d((2, 2), stride=(2, 2))  # 构建反池化层
img_unpool = maxunpool_layer(img_reconstruct, indices)  # 将输入和indices索引传入反池化层print("raw_img:\n{}\nimg_pool:\n{}".format(img_tensor, img_pool))
print("img_reconstruct:\n{}\nimg_unpool:\n{}".format(img_reconstruct, img_unpool))

其输出结果如下所示：

raw_img:
tensor([[[[0., 4., 4., 3.],[3., 3., 1., 1.],[4., 2., 3., 4.],[1., 3., 3., 0.]]]])img_pool:
tensor([[[[4., 4.],[4., 4.]]]])img_reconstruct:
tensor([[[[-1.0276, -0.5631],[-0.8923, -0.0583]]]])img_unpool:
tensor([[[[ 0.0000, -1.0276, -0.5631,  0.0000],[ 0.0000,  0.0000,  0.0000,  0.0000],[-0.8923,  0.0000,  0.0000, -0.0583],[ 0.0000,  0.0000,  0.0000,  0.0000]]]])

2、线性层——Linear Layer

线性层又称为全连接层，其每个神经元与上一层所有神经元相连，实现对前一层的线性组合。

2.1 nn.Linear

功能： 对一维信号（向量）进行线性组合；

nn.Linear(in_features, out_features, bias=True)

主要参数：

in_features：输入结点数；
out_features：输出结点数；
bias：是否需要偏置

计算公式： y=xWT+biasy=xW^T+biasy=xWT+bias

下面通过代码查看怎么使用nn.Linear：

inputs = torch.tensor([[1., 2, 3]])  # 构建一个输入
linear_layer = nn.Linear(3, 4)  # 构建一个线性层
linear_layer.weight.data = torch.tensor([[1., 1., 1.],  # 对权值矩阵进行初始化[2., 2., 2.],[3., 3., 3.],[4., 4., 4.]])linear_layer.bias.data.fill_(0.5)  # 偏置
output = linear_layer(inputs)
print(inputs, inputs.shape)
print(linear_layer.weight.data, linear_layer.weight.data.shape)
print(output, output.shape)

其对应输出为：

tensor([[1., 2., 3.]]) torch.Size([1, 3])tensor([[1., 1., 1.],[2., 2., 2.],[3., 3., 3.],[4., 4., 4.]]) torch.Size([4, 3])tensor([[ 6.5000, 12.5000, 18.5000, 24.5000]], grad_fn=<AddmmBackward>) torch.Size([1, 4])

3、激活函数层——Activation Layer

激活函数对特征进行非线性变换，赋予多层神经网络具有深度的意义。

3.1 nn.sigmoid

计算公式：y=11+e−xy=\frac{1}{1+e^{-x}}y=1+e−x1
梯度公式：y′=y∗(1−y)y'=y * (1-y)y′=y∗(1−y)

特性：

输出值在（0，1），符合概率；
导数范围是[0，0.25]，容易导致梯度消失；
输出为非0均值，破坏数据分布；

3.2 nn.tanh

计算公式：y=sinxcosx=ex−e−xex+e−x=21+e−2x+1y=\frac{sinx}{cosx}=\frac{e^x-e^{-x}}{e^x+e^{-x}}=\frac{2}{1+e{-2x}}+1y=cosxsinx=ex+e−xex−e−x=1+e−2x2+1
梯度公式：y′=1−y2y'=1-y^2y′=1−y2

特性：

输出值在（-1，1），数据符合0均值；
导数范围是（0，1），易导致梯度消失；

3.3 nn.ReLU

计算公式：y=max(0,x)y=max(0,x)y=max(0,x)
梯度公式：y′={1,x>0undefined,x=00,x<0y'=\left\{\begin{matrix} 1,x>0\\ undefined, x=0\\ 0, x<0 \end{matrix}\right.y′=⎩⎨⎧1,x>0undefined,x=00,x<0
特性：

输出值均为正数，负半轴导致死神经元
导数是1，缓解梯度消失，但易引发梯度爆炸

为了缓解ReLU存在的问题，提出了一些改进的方法：

nn.LeakyReLU

negatice_slope：负半轴斜率

nn.PReLU

init：可学习斜率

nn.PReLU

lower：均匀分布下限
upper：均匀分布上线

pytorch —— 池化、线性、激活函数层相关推荐

【PyTorch】3.4 nn网络层-池化、线性、激活函数层
目录一.池化层--Pooling Layer 1.最大值池化 2.平均值池化 3.最大值上采样池化二.线性层(全连接层)--Linear Layer 三.激活函数层--Activation Lay ...
pytorch 池化
池化层参考: https://cs231n.github.io/convolutional-networks/ https://www.oreilly.com/radar/visualizing-c ...
pytorch池化maxpool2D注意事项
注意: 在搭建网络的时候用carpool2D的时候,让高度和宽度方向不同池化时,用如下: nn.MaxPool2d(kernel_size=2, stride=(2, 1), padding=(0, ...
caffe详解之激活函数层
从零开始,一步一步学习caffe的使用,期间贯穿深度学习和调参的相关知识! 激活函数参数配置在激活层中,对输入数据进行激活操作,是逐元素进行运算的,在运算过程中,没有改变数据的大小,即输入和输出的数 ...
PyTorch——池化层
参考链接 https://tangshusen.me/Dive-into-DL-PyTorch/#/chapter05_CNN/5.4_pooling 二维最大池化层和平均池化层池化(pooling ...
Week3：[任务三] nn网络层-池化、线性、激活函数网络层
[目录] 池化层--Pooling Layer 线性层--Linear Layer 激活函数层--Activation Layer 1.池化层据观察可得,池化后的图片与池化前相比较,细节信息并没有丢 ...
卷积、池化、激活函数、初始化、归一化、正则化、学习率——深度学习基础总结
有幸拜读大佬言有三的书<深度学习之模型设计>,以下是我的读书笔记,仅供参考,详细的内容还得拜读原著,错误之处还望指正.下面的三张图片来自知乎. <深度学习之模型设计>读书笔记- ...
Global Average Pooling全局平均池化的一点理解
Traditional Pooling Methods 要想真正的理解Global Average Pooling,首先要了解深度网络中常见的pooling方式,以及全连接层. 众所周知CNN网络中常 ...
简单粗暴PyTorch之nn网络层(卷积、池化、线性、激活)
nn网络层一.卷积层 1.1 卷积概念 1.2 nn.Conv2d 1.3 转置卷积二.池化层 Pooling Layer 2.1 最大池化nn.MaxPool2d 2.2 平均池化nn.AvgP ...

pytorch —— 池化、线性、激活函数层

1、池化层——Pooling Layer

1.1 nn.MaxPool2d

1.2 nn.AvgPool2d

1.2 nn.MaxUnpool2d

2、线性层——Linear Layer

2.1 nn.Linear

3、激活函数层——Activation Layer

3.1 nn.sigmoid

3.2 nn.tanh

3.3 nn.ReLU

pytorch —— 池化、线性、激活函数层相关推荐

最新文章

热门文章