SoftPool算法详解

Refining activation downsampling with SoftPool-论文链接-代码链接

1、需求解读

在各种各样的计算机视觉任务中，都可以看到池化层的身影。自从2012年深度学习火热起来之后，池化层就伴随着卷积层等一起出现。我们经常会看到卷积层的各种讨论和各种改进版本，很少有学者们和工程师门关注池化层。
当前，我们常用的池化层主要包含两种类型，具体包括：最大池化层和平均池化层，前者取特定区域中的最大值作为输出，后者取特定区域中的平均值作为输出。池化层的主要作用包括：（1）保留主要特征的同时减少计算量，降低特征的冗余度，防止模型过拟合；（2）保持变换不变形，包括平移、尺度和旋转不变性。
大量的实验结果表明，这两种池化操作在池化的同时会丢失图像中的大多数信息，降低了整个网络的性能。为了尽可能减少池化操作过程中的信息的损失，SoftPool池化操作应运而生。

2、SoftPool算法简介

SoftPool是一种变种的池化层，它可以在保持池化层功能的同时尽可能减少池化过程中带来的信息损失。如下图所示，P1、P2、P3和P4表示原图上的一个2*2大小的区域，首先利用公式将P1、P2、P3和P4转换成蓝色的区域；然后将将绿色区域的2*2矩阵与蓝色区域的2*2进行相乘与相加，从而获得最终的结果。
epi∑i=14epi\frac{e^{pi}}{\sum_{i=1}^{4}e^{pi}} ∑i=14epiepi

3、SoftPool算法详解

3.1 池化算法变种

上图展示了多个变种的池化层，具体包括Average Pooling、Max Pooling、Power Average Pooling、Stochastic Pooling、S3 Pooling、Local Importance Pooling与SoftPool。通过观察我们可以发现：（1）其它的池化操作基本都是在最大池化或者平均池化的变种；（2）S3池化操作的思路与最大池化类似；（3）其它的池化操作基本都是平均池化的变种；（4）Local Importance Pooling与SoftPool池化操作的思路类似，都给原图的区域计算了对应的区域，并进行了累计操作。

3.2 SoftPool计算

上图展示了SoftPool操作的Forward阶段与Backward阶段，6*6大小的区域表示的是激活映射a。
前向计算的步骤包括：（1）计算候选的3*3区域的权重w；（2）将权重w与激活映射a相乘相加获得a~\tilde{a}a~。
反向计算的步骤包括：（1）计算a~\tilde{a}a~的梯度值▽a~\bigtriangledown \tilde{a}▽a~；（2）将 ▽a~\bigtriangledown \tilde{a}▽a~与权重w相乘获得▽a\bigtriangledown {a}▽a。

4、SoftPool代码实现

soft_pool1d代码实现如下所示：

'''
---  S T A R T  O F  F U N C T I O N  S O F T _ P O O L 1 D  ---[About]Function for dowsampling based on the exponenial proportion rate of pixels (soft pooling).If the tensor is in CUDA the custom operation is used. Alternatively, the function usesstandard (mostly) in-place PyTorch operations for speed and reduced memory consumption.It is also possible to use non-inplace operations in order to improve stability.[Args]- x: PyTorch Tensor, could be in either cpu of CUDA. If in CUDA the homonym extension is used.- kernel_size: Integer or Tuple, for the kernel size to be used for downsampling. If an `Integer`is used, a `Tuple` is created for the rest of the dimensions. Defaults to 2.- stride: Integer or Tuple, for the steps taken between kernels (i.e. strides). If `None` thestrides become equal to the `kernel_size` tuple. Defaults to `None`.- force_inplace: Bool, determines if in-place operations are to be used regardless of the CUDAcustom op. Mostly useful for time monitoring. Defaults to `False`.[Returns]- PyTorch Tensor, subsampled based on the specified `kernel_size` and `stride`
'''
def soft_pool1d(x, kernel_size=2, stride=None, force_inplace=False):if x.is_cuda and not force_inplace:x = CUDA_SOFTPOOL1d.apply(x, kernel_size, stride)# Replace `NaN's if foundif torch.isnan(x).any():return torch.nan_to_num(x)return xkernel_size = _single(kernel_size)if stride is None:stride = kernel_sizeelse:stride = _single(stride)# Get input sizes_, c, d = x.size()# Create per-element exponential value sum : Tensor [b x c x d]e_x = torch.exp(x)# Apply mask to input and pool and calculate the exponential sum# Tensor: [b x c x d] -> [b x c x d']return F.avg_pool1d(x.mul(e_x), kernel_size, stride=stride).mul_(sum(kernel_size)).div_(F.avg_pool1d(e_x, kernel_size, stride=stride).mul_(sum(kernel_size)))
'''

soft_pool2d代码实现如下所示：

'''
---  S T A R T  O F  F U N C T I O N  S O F T _ P O O L 2 D  ---[About]Function for dowsampling based on the exponenial proportion rate of pixels (soft pooling).If the tensor is in CUDA the custom operation is used. Alternatively, the function usesstandard (mostly) in-place PyTorch operations for speed and reduced memory consumption.It is also possible to use non-inplace operations in order to improve stability.[Args]- x: PyTorch Tensor, could be in either cpu of CUDA. If in CUDA the homonym extension is used.- kernel_size: Integer or Tuple, for the kernel size to be used for downsampling. If an `Integer`is used, a `Tuple` is created for the rest of the dimensions. Defaults to 2.- stride: Integer or Tuple, for the steps taken between kernels (i.e. strides). If `None` thestrides become equal to the `kernel_size` tuple. Defaults to `None`.- force_inplace: Bool, determines if in-place operations are to be used regardless of the CUDAcustom op. Mostly useful for time monitoring. Defaults to `False`.[Returns]- PyTorch Tensor, subsampled based on the specified `kernel_size` and `stride`
'''
def soft_pool2d(x, kernel_size=2, stride=None, force_inplace=False):if x.is_cuda and not force_inplace:x = CUDA_SOFTPOOL2d.apply(x, kernel_size, stride)# Replace `NaN's if foundif torch.isnan(x).any():return torch.nan_to_num(x)return xkernel_size = _pair(kernel_size)if stride is None:stride = kernel_sizeelse:stride = _pair(stride)# Get input sizes_, c, h, w = x.size()# Create per-element exponential value sum : Tensor [b x c x h x w]e_x = torch.exp(x)# Apply mask to input and pool and calculate the exponential sum# Tensor: [b x c x h x w] -> [b x c x h' x w']return F.avg_pool2d(x.mul(e_x), kernel_size, stride=stride).mul_(sum(kernel_size)).div_(F.avg_pool2d(e_x, kernel_size, stride=stride).mul_(sum(kernel_size)))
'''

soft_pool3d代码实现如下所示：

'''
---  S T A R T  O F  F U N C T I O N  S O F T _ P O O L 3 D  ---[About]Function for dowsampling based on the exponenial proportion rate of pixels (soft pooling).If the tensor is in CUDA the custom operation is used. Alternatively, the function usesstandard (mostly) in-place PyTorch operations for speed and reduced memory consumption.It is also possible to use non-inplace operations in order to improve stability.[Args]- x: PyTorch Tensor, could be in either cpu of CUDA. If in CUDA the homonym extension is used.- kernel_size: Integer or Tuple, for the kernel size to be used for downsampling. If an `Integer`is used, a `Tuple` is created for the rest of the dimensions. Defaults to 2.- stride: Integer or Tuple, for the steps taken between kernels (i.e. strides). If `None` thestrides become equal to the `kernel_size` tuple. Defaults to `None`.- force_inplace: Bool, determines if in-place operations are to be used regardless of the CUDAcustom op. Mostly useful for time monitoring. Defaults to `False`.[Returns]- PyTorch Tensor, subsampled based on the specified `kernel_size` and `stride`
'''
def soft_pool3d(x, kernel_size=2, stride=None, force_inplace=False):if x.is_cuda and not force_inplace:x = CUDA_SOFTPOOL3d.apply(x, kernel_size, stride)# Replace `NaN's if foundif torch.isnan(x).any():return torch.nan_to_num(x)return xkernel_size = _triple(kernel_size)if stride is None:stride = kernel_sizeelse:stride = _triple(stride)# Get input sizes_, c, d, h, w = x.size()# Create per-element exponential value sum : Tensor [b x c x d x h x w]e_x = torch.exp(x)# Apply mask to input and pool and calculate the exponential sum# Tensor: [b x c x d x h x w] -> [b x c x d' x h' x w']return F.avg_pool3d(x.mul(e_x), kernel_size, stride=stride).mul_(sum(kernel_size)).div_(F.avg_pool3d(e_x, kernel_size, stride=stride).mul_(sum(kernel_size)))
'''

5、SoftPool效果展示与分析

5.1、SoftPool主观效果展示与分析

上图展示了SoftPool在一些测试图片上面的具体效果，为了客观的进行比较，作者对比了该算法与Max池化与Avg池化的效果，具体的细节请看原图。通过观察我们可以得出以下的初步结论：（1）与原图相比，SoftPool操作能够保留原图中更多的细节，Avg池化次之，Max池化丢失的信息最多；（2）从计算复杂度来讲，SoftPool的复杂度最高，Avg池化次之，Max池化最低。

5.2、SoftPool客观效果展示与分析

上表展示了5个大小不同的kernel和SoftPool的前向和反向运行时间。通过观察我们可以得出以下的初步结论：（1）在CPU设备上面，Avg池化最快，SoftPool池化次之，Max池化最慢；（2）在CUDA上面，Avg池化最快，SoftPool池化次之，Max池化最慢；（3）从内存占用率角度而言，Avg池化占用的内存空间最小，SoftPool次之，Max池化占用的内存空间最多。

上表展示了利用SoftPool替换掉原始的池化层之后在多个分类模型上面的分类精度。通过观察我们可以得出以下初步的结论：（1）SoftPool层在不同的分类模型中的top1与top5精度都极大的超越了原始的池化操作；（2）对于ResNet网络架构而言，随着参数量的不断增加，GFLOP也得到了对应的提升。

6、总结与分析

SoftPool是一种变种的池化层，它可以在保持池化层功能的同时尽可能减少池化过程中带来的信息损失。大量的实验结果表明该算法的性能优于原始的Avg池化与Max池化。
随着神经网络的设计变得越来越困难，而通过NAS等方法也几乎不能大幅度提升算法的性能，为了打破这个瓶颈，从基础的网络层优化入手，不失为一种可靠有效的精度提升手段，一旦提出，可以将其扩展到多个不同的计算机视觉任务中。

参考资料

[1] 原始论文

注意事项

[1] 该博客是本人原创博客，如果您对该博客感兴趣，想要转载该博客，请与我联系（qq邮箱：1575262785@qq.com）,我会在第一时间回复大家，谢谢大家的关注。
[2] 由于个人能力有限，该博客可能存在很多的问题，希望大家能够提出改进意见。
[3] 如果您在阅读本博客时遇到不理解的地方，希望您可以联系我，我会及时的回复您，和您交流想法和意见，谢谢。
[4] 本人业余时间承接各种本科毕设设计和各种项目，包括图像处理（数据挖掘、机器学习、深度学习等）、matlab仿真、python算法及仿真等，有需要的请加QQ：1575262785详聊，备注“项目”！！！