tf.nn.conv2d()函数详解(strides与padding的关系)

tf.nn.conv2d()是TensorFlow中用于创建卷积层的函数，这个函数的调用格式如下：

def conv2d(input: Any,filter: Any,strides: Any,padding: Any,use_cudnn_on_gpu: bool = True,data_format: str = "NHWC",dilations: List[int] = [1, 1, 1, 1],name: Any = None) -> Any

其中，比较重要的参数是 input， filter， strides， padding。

input 就是输入的数据，格式就是TensorFlow的标准，使用四维矩阵的形式，分别是Btach_size（可以说是要处理的图片数量），height， width，deepth（或者说是channel也就是通道数，比如RGB，3个通道）。

filter 在TensorFlow中称为滤波器，本质就相当于卷积核权重矩阵，这里要注意filter的形式，也是四维数组的形式，分别是
height高度（卷积核的高度），
width宽度（卷积核的宽度），
deepth（channel）深度（这个与input的deepth一致），
Feature Map的数目，也可以说是卷积核的数目，也就是最后生成的特种图的数目。

stride 也就是步长了，按照上面的 “一贯作风”，也是四维数组的形式，分别表示
在batch_size上的步长，
高度的步长，
宽度的步长以
深度的步长，
对应的是input的四个维度，一般对于图片输入来说，只需要改变中间两个值。这个步长一定层度上决定了输出特征图的大小。

padding 是填充，这里只有两个值 SAME 和 VALID，

后面的参数就是加速之类的选项，不是很重要。

stride和padding两个参数应该结合在一起来说。

1、padding取值为’SAME’

由于图片大小和卷积核大小不一定是倍数关系，在SAME模式下，会通过周围补零来保证所有数据都能被扫描到。那到底补多少零，最后输出的特征图的大小为多少，在这个函数里，是由stride来决定的。先看代码：

import tensorflow as tfdata=tf.Variable(tf.random_normal([64,43,43,3]),dtype=tf.float32)
weight=tf.Variable(tf.random_normal([3,3,3,64]),dtype=tf.float32)sess=tf.InteractiveSession()
tf.global_variables_initializer().run()conv1=tf.nn.conv2d(data,weight,strides=[1,1,1,1],padding='SAME')
conv2=tf.nn.conv2d(data,weight,strides=[1,2,2,1],padding='SAME')
conv3=tf.nn.conv2d(data,weight,strides=[1,3,3,1],padding='SAME')
conv4=tf.nn.conv2d(data,weight,strides=[1,4,4,1],padding='SAME')print(conv1)
print(conv2)
print(conv3)
print(conv4)

输出为
Tensor(“Conv2D_8:0”, shape=(64, 43, 43, 64), dtype=float32)
Tensor(“Conv2D_9:0”, shape=(64, 22, 22, 64), dtype=float32)
Tensor(“Conv2D_10:0”, shape=(64, 15, 15, 64), dtype=float32)
Tensor(“Conv2D_11:0”, shape=(64, 11, 11, 64), dtype=float32)

可以看出输出的尺寸大小与stride的第二第三个参数是倍数关系：
当strides=[1,1,1,1]时，输出尺寸与原始尺寸相同
当strides=[1,2,2,1]时，43不是2的倍数，先把43增加到44，再除2，得22
当strides=[1,3,3,1]时，43不是3的倍数，先把43增加到45，再除3，得15
当strides=[1,4,4,1]时，43不是4的倍数，先把43增加到44，再除4，得11
依次类推

我们再来看看，输出的特征图尺寸与卷积核的大小有没有关系：

import tensorflow as tfdata=tf.Variable(tf.random_normal([64,43,43,3]),dtype=tf.float32)
weight=tf.Variable(tf.random_normal([5,5,3,64]),dtype=tf.float32)sess=tf.InteractiveSession()
tf.global_variables_initializer().run()conv1=tf.nn.conv2d(data,weight,strides=[1,1,1,1],padding='SAME')
conv2=tf.nn.conv2d(data,weight,strides=[1,2,2,1],padding='SAME')
conv3=tf.nn.conv2d(data,weight,strides=[1,3,3,1],padding='SAME')
conv4=tf.nn.conv2d(data,weight,strides=[1,4,4,1],padding='SAME')print(conv1)
print(conv2)
print(conv3)
print(conv4)

输出为
Tensor(“Conv2D_16:0”, shape=(64, 43, 43, 64), dtype=float32)
Tensor(“Conv2D_17:0”, shape=(64, 22, 22, 64), dtype=float32)
Tensor(“Conv2D_18:0”, shape=(64, 15, 15, 64), dtype=float32)
Tensor(“Conv2D_19:0”, shape=(64, 11, 11, 64), dtype=float32)

通过上面可以知道，在SAME模式下，输出的尺寸与卷积核的尺寸没有关系，只与strides有关系

2、padding取值为VALID

在VALID模式下，不会补零，扫描不到的数据会被直接抛弃。

import tensorflow as tfdata=tf.Variable(tf.random_normal([64,43,43,3]),dtype=tf.float32)
weight=tf.Variable(tf.random_normal([5,5,3,64]),dtype=tf.float32)sess=tf.InteractiveSession()
tf.global_variables_initializer().run()
conv1=tf.nn.conv2d(data,weight,strides=[1,1,1,1],padding='VALID')
conv2=tf.nn.conv2d(data,weight,strides=[1,2,2,1],padding='VALID')
conv3=tf.nn.conv2d(data,weight,strides=[1,3,3,1],padding='VALID')
conv4=tf.nn.conv2d(data,weight,strides=[1,4,4,1],padding='VALID')print(conv1)
print(conv2)
print(conv3)
print(conv4)

输出为：
Tensor(“Conv2D_20:0”, shape=(64, 39, 39, 64), dtype=float32)
Tensor(“Conv2D_21:0”, shape=(64, 20, 20, 64), dtype=float32)
Tensor(“Conv2D_22:0”, shape=(64, 13, 13, 64), dtype=float32)
Tensor(“Conv2D_23:0”, shape=(64, 10, 10, 64), dtype=float32)

import tensorflow as tfdata=tf.Variable(tf.random_normal([64,43,43,3]),dtype=tf.float32)
weight=tf.Variable(tf.random_normal([3,3,3,64]),dtype=tf.float32)sess=tf.InteractiveSession()
tf.global_variables_initializer().run()
conv1=tf.nn.conv2d(data,weight,strides=[1,1,1,1],padding='VALID')
conv2=tf.nn.conv2d(data,weight,strides=[1,2,2,1],padding='VALID')
conv3=tf.nn.conv2d(data,weight,strides=[1,3,3,1],padding='VALID')
conv4=tf.nn.conv2d(data,weight,strides=[1,4,4,1],padding='VALID')print(conv1)
print(conv2)
print(conv3)
print(conv4)

Tensor(“Conv2D_28:0”, shape=(64, 41, 41, 64), dtype=float32)
Tensor(“Conv2D_29:0”, shape=(64, 21, 21, 64), dtype=float32)
Tensor(“Conv2D_30:0”, shape=(64, 14, 14, 64), dtype=float32)
Tensor(“Conv2D_31:0”, shape=(64, 11, 11, 64), dtype=float32)

从上面两组数据对比可以看出，在VALID模式下，输出的尺寸与卷积核的尺寸，步长都相关。计算公式如下：
$heght_{out} = (height_{in} - height_{kernel} + 2*padding) / stride + 1$
$width_{out} = (width_{in} - width_{kernel} + 2*padding)/stride +1$
由于是VALID模式，padding=0