c++调用caffe ssd_【caffe教程5】caffe中的卷积

本文首发于公众号《有三AI》

[caffe解读] caffe从数学公式到代码实现5-caffe中的卷积mp.weixin.qq.com

今天要讲的就是跟卷积相关的一些layer了

im2col_layer.cpp

base_conv_layer.cpp

conv_layer.cpp

deconv_layer.cpp

inner_product_layer.cpp

1 im2col_layer.cpp

这是caffe里面的重要操作，caffe为什么这么耗显存，跟这个有很大关系。im2col的目的，就是把要滑动卷积的图像，先一次性存起来，然后再进行矩阵乘操作。简单来说，它的输入是一个C*H*W的blob，经过im2col操作会变成K' x (H x W) 的矩阵，其中K' =C*kernel_r*kernel_r，kernel_r就是卷积核的大小，这里只看正方形的卷积核。

如果不用这样的操作，贾扬清有一个吐槽，对于输入大小为W*H，维度为D的blob，卷积核为M*K*K，那么如果利用for循环，会是这样的一个操作，6层for循环，计算效率是极其低下的。

for w in 1..Wfor h in 1..Hfor x in 1..Kfor y in 1..Kfor m in 1..Mfor d in 1..Doutput(w, h, m) += input(w+x, h+y, d) * filter(m, x, y, d)endendendendend
end

具体im2col是什么原理呢？先贴出贾扬清的回答。

https://www.zhihu.com/question/28385679

上面说了，要把C*H*W的blob，变成K' x (H x W)或者 (H x W) xK' 的矩阵,把filters也复制成一个大矩阵，这样两者直接相乘就得到结果，下面看一个简单小例子。

借用网友一张图，虽然和caffe细节上不同，但是还是有助于理解。http://blog.csdn.net/mrhiuser/article/details/52672824

4*4的原始数据，进行stride=1的3*3操作，其中im2col的操作就是：

也就是说4*4的矩阵，经过了im2col后，变成了9*4的矩阵，卷积核可以做同样扩展，卷积操作就变成了两个矩阵相乘。

下面看im2col的代码；

template <typename Dtype>
void im2col_cpu(const Dtype* data_im, const int channels,const int height, const int width, const int kernel_h, const int kernel_w,const int pad_h, const int pad_w,const int stride_h, const int stride_w,const int dilation_h, const int dilation_w,
Dtype* data_col) {//输入为data_im，kernel_h，kernel_w以及各类卷积参数，输出就是data_col。//out_put_h，out_put_w，是输出的图像尺寸。const int output_h = (height + 2 * pad_h -(dilation_h * (kernel_h - 1) + 1)) / stride_h + 1;const int output_w = (width + 2 * pad_w -(dilation_w * (kernel_w - 1) + 1)) / stride_w + 1;const int channel_size = height * width;//外层channel循环不管for (int channel = channels; channel--; data_im += channel_size) {//这是一个关于kernel_row和kernel_col的2层循环 for (int kernel_row = 0; kernel_row < kernel_h; kernel_row++) {
for (int kernel_col = 0; kernel_col < kernel_w; kernel_col++) {int input_row = -pad_h + kernel_row * dilation_h;//这是一个关于output_h和output_w的循环，这实际上就是上图例子中每一行的数据 for (int output_rows = output_h; output_rows; output_rows--) {//边界条件属特殊情况，可以细下推敲if (!is_a_ge_zero_and_a_lt_b(input_row, height)) {for (int output_cols = output_w; output_cols; output_cols--) {*(data_col++) = 0;}} else {int input_col = -pad_w + kernel_col * dilation_w;for (int output_col = output_w; output_col; output_col--) {
if (is_a_ge_zero_and_a_lt_b(input_col, width)) {
//这就是核心的赋值语句，按照循环的顺序，我们可以知道是按照输出output_col*output_h的尺寸，一截一截地串接成了一个col。*(data_col++) = data_im[input_row * width + input_col];} else {*(data_col++) = 0;}input_col += stride_w;}}input_row += stride_h;}}}}
}

相关注释已经放在了上面，col2im的操作非常类似，可以自行看源码，这一段要自己写出来怕是需要调试一些时间。

有了上面的核心代码后，Forward只需要调用im2col，输入为bottom_data，输出为top_data，Backward只需要调用col2im，输入为top_diff，输出为bottom_diff即可，代码就不贴出了。

2 conv_layer.cpp，base_conv_layer.cpp

数学定义不用说，我们直接看代码，这次要两个一起看。由于conv_layer.cpp依赖于base_conv_layer.cpp，我们先来看看base_conv_layer.hpp中包含了什么东西，非常多。

base_conv_layer.hpp变量：

/// @brief The spatial dimensions of a filter kernel.
Blob<int> kernel_shape_;/// @brief The spatial dimensions of the stride.Blob<int> stride_;/// @brief The spatial dimensions of the padding.Blob<int> pad_;/// @brief The spatial dimensions of the dilation.Blob<int> dilation_;/// @brief The spatial dimensions of the convolution input.Blob<int> conv_input_shape_;/// @brief The spatial dimensions of the col_buffer.vector<int> col_buffer_shape_;/// @brief The spatial dimensions of the output.
vector<int> output_shape_;const vector<int>* bottom_shape_;int num_spatial_axes_;int bottom_dim_;int top_dim_;int channel_axis_;int num_;int channels_;int group_;int out_spatial_dim_;int weight_offset_;int num_output_;bool bias_term_;bool is_1x1_;bool force_nd_im2col_;int num_kernels_im2col_;int num_kernels_col2im_;int conv_out_channels_;int conv_in_channels_;int conv_out_spatial_dim_;int kernel_dim_;int col_offset_;int output_offset_;Blob<Dtype> col_buffer_;Blob<Dtype> bias_multiplier_;

非常之多，因为卷积发展到现在，已经有很多的参数需要控制。无法一一解释了，stride_，pad_，dilation是和卷积步长有关参数，kernel_shape_是卷积核大小，conv_input_shape_是输入大小，output_shape是输出大小，其他都是以后遇到了再说，现在我们先绕过。更具体的解答，有一篇博客可以参考http://blog.csdn.net/lanxuecc/article/details/53188738

下面直接看conv_layer.cpp

既然是卷积，输出的大小就取决于很多参数，所以先要计算输出的大小。

void ConvolutionLayer<Dtype>::compute_output_shape() {const int* kernel_shape_data = this->kernel_shape_.cpu_data();const int* stride_data = this->stride_.cpu_data();const int* pad_data = this->pad_.cpu_data();const int* dilation_data = this->dilation_.cpu_data();this->output_shape_.clear();for (int i = 0; i < this->num_spatial_axes_; ++i) {// i + 1 to skip channel axisconst int input_dim = this->input_shape(i + 1);const int kernel_extent = dilation_data[i] * (kernel_shape_data[i] - 1) + 1;const int output_dim = (input_dim + 2 * pad_data[i] - kernel_extent)/ stride_data[i] + 1;this->output_shape_.push_back(output_dim);}
}

然后，在forward函数中，

template <typename Dtype>
void ConvolutionLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {const Dtype* weight = this->blobs_[0]->cpu_data();for (int i = 0; i < bottom.size(); ++i) {const Dtype* bottom_data = bottom[i]->cpu_data();Dtype* top_data = top[i]->mutable_cpu_data();for (int n = 0; n < this->num_; ++n) {this->forward_cpu_gemm(bottom_data + n * this->bottom_dim_, weight,top_data + n * this->top_dim_);if (this->bias_term_) {const Dtype* bias = this->blobs_[1]->cpu_data();this->forward_cpu_bias(top_data + n * this->top_dim_, bias);}}}
}

我们知道卷积层的输入，是一个blob，输出是一个blob，从上面代码知道卷积核的权重存在了this->blobs_[0]->cpu_data()中, this->blobs_[1]->cpu_data()则是bias，当然不一定有值。外层循环大小为bottom.size()，可见其实可以有多个输入。

看看里面最核心的函数，this->forward_cpu_gemm

而这个函数是在这里被调用的；输入input，输出col_buff，关于这个函数的解析，https://tangxman.github.io/2015/12/07/caffe-conv/解释地挺详细，我大概总结一下。

首先，按照调用顺序，对于3*3等正常的卷积，forward_cpu_gemm会调用conv_im2col_cpu函数（在base_conv_layer.hpp中），它的作用看名字就知道，将图像先转换为一个大矩阵，将卷积核也按列复制成大矩阵；

然后利用caffe_cpu_gemm计算矩阵相乘得到卷积后的结果。

template <typename Dtype>
void BaseConvolutionLayer<Dtype>::forward_cpu_gemm(const Dtype* input,const Dtype* weights, Dtype* output, bool skip_im2col) {const Dtype* col_buff = input;if (!is_1x1_) {if (!skip_im2col) {// 如果没有1x1卷积，也没有skip_im2col    // 则使用conv_im2col_cpu对使用卷积核滑动过程中的每一个kernel大小的图像块    // 变成一个列向量，形成一个height=kernel_dim_的    // width = 卷积后图像heght*卷积后图像width   conv_im2col_cpu(input, col_buffer_.mutable_cpu_data());}col_buff = col_buffer_.cpu_data();}// 使用caffe的cpu_gemm来进行计算  // 假设输入是20个feature map，输出是10个feature map，group_=2// 那么他就会把这个训练网络分解成两个10->5的网络，由于两个网络结构是// 一模一样的，那么就可以利用多个GPU完成训练加快训练速度for (int g = 0; g < group_; ++g) {caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, conv_out_channels_ /group_, conv_out_spatial_dim_, kernel_dim_,(Dtype)1., weights + weight_offset_ * g, col_buff + col_offset_ * g,(Dtype)0., output + output_offset_ * g);//weights <--- blobs_[0]->cpu_data()。类比全连接层，//weights为权重，col_buff相当与数据，矩阵相乘weights×col_buff. //其中，weights的维度为(conv_out_channels_ /group_) x kernel_dim_，//col_buff的维度为kernel_dim_ x conv_out_spatial_dim_， //output的维度为(conv_out_channels_ /group_) x conv_out_spatial_dim_.}
}

反向传播：

template <typename Dtype>
void ConvolutionLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {const Dtype* weight = this->blobs_[0]->cpu_data();Dtype* weight_diff = this->blobs_[0]->mutable_cpu_diff();for (int i = 0; i < top.size(); ++i) {const Dtype* top_diff = top[i]->cpu_diff();const Dtype* bottom_data = bottom[i]->cpu_data();Dtype* bottom_diff = bottom[i]->mutable_cpu_diff();// Bias gradient, if necessary.if (this->bias_term_ && this->param_propagate_down_[1]) {Dtype* bias_diff = this->blobs_[1]->mutable_cpu_diff();for (int n = 0; n < this->num_; ++n) {this->backward_cpu_bias(bias_diff, top_diff + n * this->top_dim_);}}if (this->param_propagate_down_[0] || propagate_down[i]) {for (int n = 0; n < this->num_; ++n) {// gradient w.r.t. weight. Note that we will accumulate diffs.if (this->param_propagate_down_[0]) {this->weight_cpu_gemm(bottom_data + n * this->bottom_dim_,top_diff + n * this->top_dim_, weight_diff);}// gradient w.r.t. bottom data, if necessary.if (propagate_down[i]) {this->backward_cpu_gemm(top_diff + n * this->top_dim_, weight,bottom_diff + n * this->bottom_dim_);}}}}

略去bias，从上面源码可以看出，有this->weight_cpu_gemm和this->backward_cpu_gemm两项。

this->backward_cpu_gemm是计算bottom_data的反向传播的，也就是feature map的反向传播。

template <typename Dtype>
void BaseConvolutionLayer<Dtype>::backward_cpu_gemm(const Dtype* output,const Dtype* weights, Dtype* input) {Dtype* col_buff = col_buffer_.mutable_cpu_data();if (is_1x1_) {col_buff = input;}for (int g = 0; g < group_; ++g) {caffe_cpu_gemm<Dtype>(CblasTrans, CblasNoTrans, kernel_dim_ / group_,conv_out_spatial_dim_, conv_out_channels_ / group_,(Dtype)1., weights + weight_offset_ * g, output + output_offset_ * g,(Dtype)0., col_buff + col_offset_ * g);}if (!is_1x1_) {conv_col2im_cpu(col_buff, input);

}

weight_cpu_gemm是计算权重的反向传播的；

template <typename Dtype>
void BaseConvolutionLayer<Dtype>::weight_cpu_gemm(const Dtype* input,const Dtype* output, Dtype* weights) {const Dtype* col_buff = input;if (!is_1x1_) {conv_im2col_cpu(input, col_buffer_.mutable_cpu_data());col_buff = col_buffer_.cpu_data();}for (int g = 0; g < group_; ++g) {caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasTrans, conv_out_channels_ / group_,kernel_dim_, conv_out_spatial_dim_,(Dtype)1., output + output_offset_ * g, col_buff + col_offset_ * g,(Dtype)1., weights + weight_offset_ * g);}
}

其中诸多细节，看不懂就再去看源码，一次看不懂就看多次。

3 deconv_layer.cpp

https://www.zhihu.com/question/63890195

https://buptldy.github.io/2016/10/29/2016-10-29-deconv/

卷积，就是将下图转换为上图，一个输出像素，和9个输入像素有关。反卷积则反之，计算反卷积的时候，就是把上图输入的像素乘以卷积核，然后放在下图对应的输出各个位置，移动输入像素，最后把所有相同位置的输出相加。

template <typename Dtype>
void DeconvolutionLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,const vector<Blob<Dtype>*>& top) {const Dtype* weight = this->blobs_[0]->cpu_data();for (int i = 0; i < bottom.size(); ++i) {const Dtype* bottom_data = bottom[i]->cpu_data();Dtype* top_data = top[i]->mutable_cpu_data();for (int n = 0; n < this->num_; ++n) {this->backward_cpu_gemm(bottom_data + n * this->bottom_dim_, weight,top_data + n * this->top_dim_);if (this->bias_term_) {const Dtype* bias = this->blobs_[1]->cpu_data();this->forward_cpu_bias(top_data + n * this->top_dim_, bias);}}}
}

forward直接调用了backward_cpu_gemm函数，反向的时候就直接调用forward函数，这里肯定是需要反复去理解的，一次不懂就多次。

template <typename Dtype>
void DeconvolutionLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {const Dtype* weight = this->blobs_[0]->cpu_data();Dtype* weight_diff = this->blobs_[0]->mutable_cpu_diff();for (int i = 0; i < top.size(); ++i) {const Dtype* top_diff = top[i]->cpu_diff();const Dtype* bottom_data = bottom[i]->cpu_data();Dtype* bottom_diff = bottom[i]->mutable_cpu_diff();// Bias gradient, if necessary.if (this->bias_term_ && this->param_propagate_down_[1]) {Dtype* bias_diff = this->blobs_[1]->mutable_cpu_diff();for (int n = 0; n < this->num_; ++n) {this->backward_cpu_bias(bias_diff, top_diff + n * this->top_dim_);}}if (this->param_propagate_down_[0] || propagate_down[i]) {for (int n = 0; n < this->num_; ++n) {// Gradient w.r.t. weight. Note that we will accumulate diffs.if (this->param_propagate_down_[0]) {this->weight_cpu_gemm(top_diff + n * this->top_dim_,bottom_data + n * this->bottom_dim_, weight_diff);}// Gradient w.r.t. bottom data, if necessary, reusing the column buffer// we might have just computed above.if (propagate_down[i]) {this->forward_cpu_gemm(top_diff + n * this->top_dim_, weight,bottom_diff + n * this->bottom_dim_,this->param_propagate_down_[0]);}}}

4 inner_product_layerfilter.hpp

既然卷积层已经读过了，现在该读一读全连接层了。

全连接层和卷积层的区别是什么？就是没有局部连接，每一个输出都跟所有输入有关，如果输入feature map是H*W，那么去卷积它的核也是这么大，得到的输出是一个1*1的值。

它在setup函数里面要做一些事情，其中最重要的就是设定weights的尺寸，下面就是关键代码。num_output是一个输出标量数，比如imagenet1000类，最终输出一个1000维的向量。

K是一个样本的大小，当axis=1，实际上就是把每一个输入样本压缩成一个数，C*H*W经过全连接变成1个数。

const int num_output = this->layer_param_.inner_product_param().num_output();
K_ = bottom[0]->count(axis);// Initialize the weightsvector<int> weight_shape(2);if (transpose_) {weight_shape[0] = K_;weight_shape[1] = N_;} else {weight_shape[0] = N_;weight_shape[1] = K_;}

所以，weight的大小就是N*K_。

有了这个之后，forward就跟conv_layer是一样的了。

好了，这一节虽然没有复杂的公式，但是很多东西够大家喝一壶了，得仔细推敲才能好好理解的。caffe_cpu_gemm是整节计算的核心，感兴趣的去看吧！

欢迎关注我们超过12个深度学习开源框架使用的教程和对应的开源项目

【完结】给新手的12大深度学习开源框架快速入门项目mp.weixin.qq.com

https://github.com/longpeng2008/yousan.aigithub.com

c++调用caffe ssd_【caffe教程5】caffe中的卷积相关推荐

深度学习与自然语言处理教程(8) - NLP中的卷积神经网络（NLP通关指南·完结）
作者:韩信子@ShowMeAI 教程地址:https://www.showmeai.tech/tutorials/36 本文地址:https://www.showmeai.tech/article-d ...
Caffe配置简明教程 ( Ubuntu 14.04 / CUDA 7.5 / cuDNN 5.1 )
1. 前言本教程使用的系统是Ubuntu 14.04 LTS 64-bit,使用的CUDA版本为7.5,使用的NVIDIA驱动版本为352. 如果您使用的Pascal架构显卡,如GTX1080或者新 ...
caffe学习笔记教程
1 官网:http://caffe.berkeleyvision.org/ 2 豆丁网中:http://www.docin.com/p-871820917.html 3 下载的caffe中,.../d ...
从零安装 Caffe (Ubuntu 14.04) Install Caffe in Ubuntu 14.04 from Scratch
Coldmooon's Blog HOME ABOUT CONTACT 从零安装 Caffe (Ubuntu 14.04) Install Caffe in Ubuntu 14.04 from Scr ...
【caffe解读】 caffe从数学公式到代码实现5-caffe中的卷积
文章首发于微信公众号<与有三学AI> [caffe解读] caffe从数学公式到代码实现5-caffe中的卷积今天要讲的就是跟卷积相关的一些layer了 im2col_layer.cpp ...
[caffe解读] caffe从数学公式到代码实现5-caffe中的卷积
今天要讲的就是跟卷积相关的一些layer了 im2col_layer.cpp base_conv_layer.cpp conv_layer.cpp deconv_layer.cpp inner_pro ...
学习Caffe（二）使用Caffe:Caffe加载模型+Caffe添加新层+Caffe finetune
版权声明:本文为博主原创文章,未经博主允许不得转载. https://blog.csdn.net/u014230646/article/details/51934150 如何使用Caffe Caffe ...
MATLAB调试caffe,在MATLAB下调试Caffe
Caffe本身是C++.CUDA语言编写的.在调试模型.参数时,根据运行log.snapshot很难实时反馈当前训练的权值情况,也难以捕捉算法存在的bug. MATLAB则是非常适合算法设计.快速迭代 ...
caffe的docker模型在pycharm中的使用-----笔记
预先准备: 1.下载docker 2.pull一个caffe的docker模型 3.下载一个专业版的pycharm,一定要专业版,因为社区版的pycharm里面没有docker配置的选项,尽量不要破解 ...

c++调用caffe ssd_【caffe教程5】caffe中的卷积

c++调用caffe ssd_【caffe教程5】caffe中的卷积相关推荐

最新文章

热门文章