caffe layer层详解

1、基本的layer定义，参数

如何利用caffe定义一个网络，首先要了解caffe中的基本接口，下面分别对五类layer进行介绍

Vision Layers

可视化层来自于头文件 Header: ./include/caffe/vision_layers.hpp 一般输入和输出都是图像，这一层关注图像的2维的几何结构，并根据此结构对输入进行处理，特别是，大多数可视化层都通过对一些区域的操作，产生相关的区域进行输出，相反的是其他层忽视结合结构，只是把输入当作一个一维的大规模的向量进行处理。
Convolution：
Convolution

Layer type: Convolution
CPU implementation: ./src/caffe/layers/convolution_layer.cpp

CUDA GPU implementation: ./src/caffe/layers/convolution_layer.cu
Parameters (ConvolutionParameter convolution_param)
Required：

num_output (c_o): the number of filters
//卷积的个数
kernel_size (or kernel_h and kernel_w): specifies height and width of each filter
//每个卷积的size
Strongly Recommended
weight_filler [default type: 'constant' value: 0]
Optionalbias_term [default true]: specifies whether to learn and apply a set of additive biases to the filter outputs
//偏移量
pad (or pad_h and pad_w) [default 0]: specifies the number of pixels to (implicitly) add to each side of the input
//pad是对输入图像的扩充，边缘增加的大小
stride (or stride_h and stride_w) [default 1]: specifies the intervals at which to apply the filters to the input
//定义引用卷积的区间
group (g) [default 1]: If g > 1, we restrict the connectivity of each filter to a subset of the input. Specifically, the input and output channels are separated into g groups, and the ith output group channels will be only connected to the ith input group channels.
//限定输入的连通性，输入通道被分成g组，输出和输入的联通性是一致的，第i个输出通道仅仅和第i个输入通道联通。

每个filter产生一个featuremap.
输入的大小： n∗ci(channel)∗hi(height)∗wi(weight) n*c_i(channel)*h_i(height)*w_i(weight)
输出的大小：
n∗co∗ho∗wo,whereho=(hi+2∗padh−kernelh)/strideh+1andwo likewise. n * c_o * h_o * w_o, where h_o = (h_i + 2 * pad_h - kernel_h) / stride_h + 1 and w_o\ likewise.

Pooling：
池化层的作用是压缩特征的维度，把相邻的区域变成一个值。目前的类型包括：最大化，平均，随机
参数有：
kernel_size，filter的大小
pool:类型
pad:每个输入图像的增加的边界的大小
stride:filter之间的大小
输入大小：
n∗c∗hi∗wi n * c * h_i * w_i
输出大小：
n∗c∗ho∗wo n * c * h_o * w_o, where h_o and w_o are computed in the same way as convolution.

Local Response Normalization (LRN)：
Layer type: LRN
CPU Implementation: ./src/caffe/layers/lrn_layer.cpp
CUDA GPU Implementation: ./src/caffe/layers/lrn_layer.cu
Parameters (LRNParameter lrn_param)
Optional
local_size [default 5]: the number of channels to sum over (for cross channel LRN) or the side length of the square region to sum over (for within channel LRN)
alpha [default 1]: the scaling parameter (see below)
beta [default 5]: the exponent (see below)
norm_region [default ACROSS_CHANNELS]: whether to sum over adjacent channels (ACROSS_CHANNELS) or nearby spatial locaitons (WITHIN_CHANNEL)
The local response normalization layer performs a kind of “lateral inhibition” by normalizing over local input regions. In ACROSS_CHANNELS mode, the local regions extend across nearby channels, but have no spatial extent (i.e., they have shape local_size x 1 x 1). In WITHIN_CHANNEL mode, the local regions extend spatially, but are in separate channels (i.e., they have shape 1 x local_size x local_size). Each input value is divided by (1+(α/n)∑ix2i)β (1+(\alpha/n)\sum_ix_i^2)^{\beta}, where n nis the size of each local region, and the sum is taken over the region centered at that value (zero padding is added where necessary).

im2col
图像转化为列向量

Loss Layers

损失层是网络在学习过程的依据，一般最小化一个损失函数，通过FP和梯度
softmax:
本层计算输入的多元的Logistic 损失l(θ)=−log(oy)l(\theta)=-log(o_y)其中 oy o_y是分类是y的概率.
注意与softmax-loss的区别softmax-loss其实就是把 oy o_y展开

l˜(y,z)=−log(ezy∑mj=1ezj)=log(∑j=1mezj)−zy

\widetilde{l}(y,z)=-log(\frac{e^{z_y}}{\sum_{j=1}^me^z_j})=log(\sum_{j=1}^me^{z_j})-z_y.其中 zy z_y是 zi=ωTix+bi z_i=\omega_i^Tx+b_i是第i个类别的线性预测结果。

平方和
类型： EuclideanLoss
欧式损失层计算的是两个输入向量之间的损失函数，

12N∑i=1N||x1i−x2i||22.

\frac{1}{2N}\sum_{i=1}^N||x_i^1-x_i^2||_2^2.
hinge:
类型：hingeloss
选项：L1，L2范数
输入：n*c*h*w的预测结果，n*1*1*1的label
输出：1*1*1*1的损失计算结果
样例：

# L1 Norm
layer {name: "loss"type: "HingeLoss"bottom: "pred"bottom: "label"
}# L2 Norm
layer {name: "loss"type: "HingeLoss"bottom: "pred"bottom: "label"top: "loss"hinge_loss_param {norm: L2}
}

hinge loss层计算了一个一对多的,或者是平方的损失函数
Sigmoid Cross-Entropy：
类型：

 31 template <typename Dtype>32 void SigmoidCrossEntropyLossLayer<Dtype>::Forward_cpu(33     const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {34   // The forward pass computes the sigmoid outputs.35   sigmoid_bottom_vec_[0] = bottom[0];36   sigmoid_layer_->Forward(sigmoid_bottom_vec_, sigmoid_top_vec_);37   // Compute the loss (negative log likelihood)38   const int count = bottom[0]->count();39   const int num = bottom[0]->num();40   // Stable version of loss computation from input data41   const Dtype* input_data = bottom[0]->cpu_data();42   const Dtype* target = bottom[1]->cpu_data();43   Dtype loss = 0;44   for (int i = 0; i < count; ++i) {45     loss -= input_data[i] * (target[i] - (input_data[i] >= 0)) -46         log(1 + exp(input_data[i] - 2 * input_data[i] * (input_data[i] >= 0)));47   }48   top[0]->mutable_cpu_data()[0] = loss / num;49 }50

Infogain：

 49 template <typename Dtype>50 void InfogainLossLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,51     const vector<Blob<Dtype>*>& top) {52   const Dtype* bottom_data = bottom[0]->cpu_data();53   const Dtype* bottom_label = bottom[1]->cpu_data();54   const Dtype* infogain_mat = NULL;55   if (bottom.size() < 3) {56     infogain_mat = infogain_.cpu_data();57   } else {58     infogain_mat = bottom[2]->cpu_data();59   }60   int num = bottom[0]->num();61   int dim = bottom[0]->count() / bottom[0]->num();62   Dtype loss = 0;63   for (int i = 0; i < num; ++i) {64     int label = static_cast<int>(bottom_label[i]);65     for (int j = 0; j < dim; ++j) {66       Dtype prob = std::max(bottom_data[i * dim + j], Dtype(kLOG_THRESHOLD));67       loss -= infogain_mat[label * dim + j] * log(prob);68     }69   }70   top[0]->mutable_cpu_data()[0] = loss / num;71 }72 73 template <typename Dtype>74 void InfogainLossLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,75     const vector<bool>& propagate_down,76     const vector<Blob<Dtype>*>& bottom) {77   if (propagate_down[1]) {78     LOG(FATAL) << this->type()79                << " Layer cannot backpropagate to label inputs.";80   }81   if (propagate_down.size() > 2 && propagate_down[2]) {82     LOG(FATAL) << this->type()83                << " Layer cannot backpropagate to infogain inputs.";84   }85   if (propagate_down[0]) {86     const Dtype* bottom_data = bottom[0]->cpu_data();87     const Dtype* bottom_label = bottom[1]->cpu_data();88     const Dtype* infogain_mat = NULL;89     if (bottom.size() < 3) {90       infogain_mat = infogain_.cpu_data();91     } else {92       infogain_mat = bottom[2]->cpu_data();93     }94     Dtype* bottom_diff = bottom[0]->mutable_cpu_diff();95     int num = bottom[0]->num();96     int dim = bottom[0]->count() / bottom[0]->num();97     const Dtype scale = - top[0]->cpu_diff()[0] / num;98     for (int i = 0; i < num; ++i) {99       const int label = static_cast<int>(bottom_label[i]);
100       for (int j = 0; j < dim; ++j) {
101         Dtype prob = std::max(bottom_data[i * dim + j], Dtype(kLOG_THRESHOLD));
102         bottom_diff[i * dim + j] = scale * infogain_mat[label * dim + j] / prob;
103       }
104     }
105   }
106 }
107
108 INSTANTIATE_CLASS(InfogainLossLayer);
109 REGISTER_LAYER_CLASS(InfogainLoss);
110 }  // namespace caffe

Accuracy and Top-k：

这个是对输出的结果与实际目标之间的准确率，实际上不是一个bp过程

Activation / Neuron Layers

一般激活/神经层是元操作，输入一个底层的数据blob，输出一个同样大小的顶层的blob,下面的层中，我们将忽略输入输出的大小，由于他们是同样的大小的。
Input： n∗c∗h∗w n*c*h*w
Output: n∗c∗h∗w n*c*h*w

ReLU/Rectified inner and leaky-ReLU:
Parameters (ReLUParameter relu_param)
Optional
negative_slope [default 0]: specifies whether to leak the negative part by multiplying it with the slope value rather than setting it to 0.

layer {name: "relu1"type: "ReLU"bottom: "conv1"top: "conv1"
}

ReLU函数如下定义，设输入值为X

f(x)={xnegative_slope∗xif x>0,otherwise.

\begin{equation}f(x)=\left\{ \begin{aligned} x & &if\ x>0,\\ negative\_slope*x & &otherwise.\\ \end{aligned} \right.\end{equation}
其中 negative_slope negative\_slope不是设定的，与 max(0,x) max(0,x)相等，详情见我的另外一个小博客
http://blog.csdn.net/swfa1/article/details/45601789

sigmoid层
层的类型:sigmoid
样例：

layer {name: "encode1neuron"bottom: "encode1"top: "encode1neuron"type: "Sigmoid"
}

公式：

f(x)=sigmoid(x)

f(x)=sigmoid(x)
TanH / Hyperbolic Tangent:
类型：TanH
样例：

layer {name: "layer"bottom: "in"top: "out"type: "TanH"
}

f(x)=tanh(x)

f(x)=tanh(x)
绝对值：
类型：AbsVal

layer {name: "layer"bottom: "in"top: "out"type: "AbsVal"
}

公式：

f(x)=abs(x)

f(x)=abs(x)
幂函数：
类型：Power
参数：
power [default 1]
scale [default 1]
shift [default 0]
样例：

layer {name: "layer"bottom: "in"top: "out"type: "Power"power_param {power: 1scale: 1shift: 0}
}

公式：

f(x)=(shift+scale∗x)power

f(x)=(shift + scale * x) ^ {power}

BNLL:
type：BNLL

layer {name: "layer"bottom: "in"top: "out"type: BNLL
}

公式：
The BNLL (binomial normal log likelihood) layer computes the output as

log(1+exp(x))

log(1 + exp(x))

Data Layers

Common Layers

InnerProduct
类型：InnerProduct
参数：
必须的：
num_output (c_o): the number of filters
强烈建议的：weight_filler [default type: ‘constant’ value: 0]
可选的：
bias_filler [default type: ‘constant’ value: 0]
bias_term [default true]: specifies whether to learn and apply a set of additive biases to the filter outputs
样例：

layer {name: "fc8"type: "InnerProduct"# learning rate and decay multipliers for the weightsparam { lr_mult: 1 decay_mult: 1 }# learning rate and decay multipliers for the biasesparam { lr_mult: 2 decay_mult: 0 }inner_product_param {num_output: 1000weight_filler {type: "gaussian"std: 0.01}bias_filler {type: "constant"value: 0}}bottom: "fc7"top: "fc8"
}

作用：
内积层又叫全连接层，输入当做一个以为想想，产生的输出也是以向量的形式输出，相当于blob的height 和width是1.

经过一段时间的学习之后，我发现上面的一些网络写的不是很详细，下面详细解释一下其中的
slice，ArgMaxLayer以及elementwise
slice layer
对输入进行分块处理，处理之后再进行剩下的计算，
ArgMaxLayer
Compute the index of the K K max values for each datum across all dimensions (C×H×W) (C \times H \times W) .

Intended for use after a classification layer to produce a prediction. If parameter out_max_val is set to true, output is a vector of pairs (max_ind, max_val) for each image. The axis parameter specifies an axis along which to maximise.
NOTE: does not implement Backwards operation.
elementwise
Compute elementwise operations, such as product and sum, along multiple input Blobs.

2、Alex 网络定义

3、如何增加一个新层

Add a class declaration for your layer to the appropriate one of common_layers.hpp,data_layers.hpp, loss_layers.hpp, neuron_layers.hpp, or vision_layers.hpp. Include an inline implementation of type and the *Blobs() methods to specify blob number requirements. Omit the*_gpu declarations if you’ll only be implementing CPU code.

Implement your layer in layers/your_layer.cpp.

SetUp for initialization: reading parameters, allocating buffers, etc.

Forward_cpu for the function your layer computes

Backward_cpu for its gradient

(Optional) Implement the GPU versions Forward_gpu and Backward_gpu in layers/your_layer.cu.

Add your layer to proto/caffe.proto, updating the next available ID. Also declare parameters, if needed, in this file.

Make your layer createable by adding it to layer_factory.cpp.

Write tests in test/test_your_layer.cpp. Use test/test_gradient_check_util.hpp to check that your Forward and Backward implementations are in numerical agreement.

以上是github上某大神的解答，步骤很清晰，具体说一下，比如现在要添加一个vision layer，名字叫Aaa_Layer：

1、属于哪个类型的layer，就打开哪个hpp文件，这里就打开vision_layers.hpp，然后自己添加该layer的定义，或者直接复制Convolution_Layer的相关代码来修改类名和构造函数名都改为Aaa_Layer，如果不用GPU，将*_gpu的声明都去掉。

2、实现自己的layer，编写Aaa_Layer.cpp，加入到src/caffe/layers，主要实现Setup、Forward_cpu、Backward_cpu。

3、如果需要GPU实现，那么在Aaa_Layer.cu中实现Forward_gpu和Backward_gpu。

4、修改src/caffe/proto/caffe.proto，好到LayerType，添加Aaa，并更新ID，如果Layer有参数，添加AaaParameter类。

5、在src/caffe/layer_factory.cpp中添加响应代码。

6、在src/caffe/test中写一个test_Aaa_layer.cpp，用include/caffe/test/test_gradient_check_util.hpp来检查前向后向传播是否正确。