介绍

inner_product_layer也即全连接层，如下示意图，每个输出都连接到所有的输入。

正文

成员变量

首先介绍一下几个成员变量

protected:int M_;//样本数量int K_;//单个输入特征长度int N_;//输出神经元数量bool bias_term_;//是否添加偏置，上图中的（+1）。Blob<Dtype> bias_multiplier_;//偏置的乘子

成员函数

其中的构造等成员函数基本上继承父类的，不详述，主要介绍LayerSetUp,Forward_cpu,Backward_cpu,Reshape这四个成员函数。

Reshape

template <typename Dtype>
void InnerProductLayer<Dtype>::Reshape(const vector<Blob<Dtype>*>& bottom,const vector<Blob<Dtype>*>& top) {// Figure out the dimensionsconst int axis = bottom[0]->CanonicalAxisIndex(this->layer_param_.inner_product_param().axis());const int new_K = bottom[0]->count(axis);/**这里解释一下，blob的CanonicalAxisIndex是为了标准化维度索引的输入，将一些非法维度输入转化为合法输入。*blob的count(int)是统计从某个维度开始，到结尾的总个数。这里第一个维度表示的是样本个数，也即是M_,与全连接层是独立的，其后面的是表示输入特征的个数。*/CHECK_EQ(K_, new_K)<< "Input size incompatible with inner product parameters.";// The first "axis" dimensions are independent inner products; the total// number of these is M_, the product over these dimensions.M_ = bottom[0]->count(0, axis);// The top shape will be the bottom shape with the flattened axes dropped,// and replaced by a single axis with dimension num_output (N_).vector<int> top_shape = bottom[0]->shape();top_shape.resize(axis + 1);top_shape[axis] = N_;top[0]->Reshape(top_shape);/**根据输入对输出进行reshape。*输出的shape是根据样本数和输出神经元个数确定的*/// Set up the bias multiplierif (bias_term_) {vector<int> bias_shape(1, M_);bias_multiplier_.Reshape(bias_shape);caffe_set(M_, Dtype(1), bias_multiplier_.mutable_cpu_data());}/**caffe_set(const int N, const Dtype alpha, Dtype* Y) 是用alpha的值来填充重Y开始的N个单元。*/
}

LayerSetUp

template <typename Dtype>
void InnerProductLayer<Dtype>::LayerSetUp(const vector<Blob<Dtype>*>& bottom,const vector<Blob<Dtype>*>& top) {const int num_output = this->layer_param_.inner_product_param().num_output();bias_term_ = this->layer_param_.inner_product_param().bias_term();N_ = num_output;const int axis = bottom[0]->CanonicalAxisIndex(this->layer_param_.inner_product_param().axis());K_ = bottom[0]->count(axis);/*初始化N_和K_,类似reshape()中的解释*/if (this->blobs_.size() > 0) {LOG(INFO) << "Skipping parameter initialization";} else {if (bias_term_) {this->blobs_.resize(2);} else {this->blobs_.resize(1);}/** 初始化权重W和偏置b，caffe中包含多种初始化方式，在Filler中实现将开篇在介绍。*在blobs_中，blobs_[0]存放的为W，blobs_[1]存放的是偏置b。*/vector<int> weight_shape(2);weight_shape[0] = N_;weight_shape[1] = K_;this->blobs_[0].reset(new Blob<Dtype>(weight_shape));// fill the weightsshared_ptr<Filler<Dtype> > weight_filler(GetFiller<Dtype>(this->layer_param_.inner_product_param().weight_filler()));weight_filler->Fill(this->blobs_[0].get());// If necessary, intiialize and fill the bias termif (bias_term_) {vector<int> bias_shape(1, N_);this->blobs_[1].reset(new Blob<Dtype>(bias_shape));shared_ptr<Filler<Dtype> > bias_filler(GetFiller<Dtype>(this->layer_param_.inner_product_param().bias_filler()));bias_filler->Fill(this->blobs_[1].get());}}  // parameter initializationthis->param_propagate_down_.resize(this->blobs_.size(), true);
}

Forward_cpu

template <typename Dtype>
void InnerProductLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,const vector<Blob<Dtype>*>& top) {const Dtype* bottom_data = bottom[0]->cpu_data();Dtype* top_data = top[0]->mutable_cpu_data();const Dtype* weight = this->blobs_[0]->cpu_data();caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasTrans, M_, N_, K_, (Dtype)1.,bottom_data, weight, (Dtype)0., top_data);if (bias_term_) {caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, M_, N_, 1, (Dtype)1.,bias_multiplier_.cpu_data(),this->blobs_[1]->cpu_data(), (Dtype)1., top_data);}
}

先介绍一下主要使用的到math_functions.hpp中的函数 caffe_cpu_gemm，其声明为

template <typename Dtype>
void caffe_cpu_gemm(const CBLAS_TRANSPOSE TransA,const CBLAS_TRANSPOSE TransB, const int M, const int N, const int K,const Dtype alpha, const Dtype* A, const Dtype* B, const Dtype beta,Dtype* C);

它的功能其实很直观，即C←αA×B+βC，TransA和TransB表明是否对A或B进行转置操作。A为M*K维矩阵，B为K*N维矩阵，C为M*N维矩阵。
在Forward_cpu中，先进行 y←wx，或者说是y←xw’，然后是y←y+b，总结起来是y=wx+b。

Backward_cpu

反向传播主要是为了更新W和b,其中的关键就是计算偏导，因此在这个函数中主要就是做了这三件事。首先介绍三个公式。

前面两个是根据残差更新W和b，后面一个是计算梯度的。在caffe的全连接层中没有激活函数，因此后面一个公式中的导数项为1。

template <typename Dtype>
void InnerProductLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,const vector<bool>& propagate_down,const vector<Blob<Dtype>*>& bottom) {if (this->param_propagate_down_[0]) {const Dtype* top_diff = top[0]->cpu_diff();const Dtype* bottom_data = bottom[0]->cpu_data();// Gradient with respect to weightcaffe_cpu_gemm<Dtype>(CblasTrans, CblasNoTrans, N_, K_, M_, (Dtype)1.,top_diff, bottom_data, (Dtype)1., this->blobs_[0]->mutable_cpu_diff());}
/*这里为更新W，top_diff表示的是残差\delte,bottom_data表示的是上一层的激活值a*/if (bias_term_ && this->param_propagate_down_[1]) {const Dtype* top_diff = top[0]->cpu_diff();// Gradient with respect to biascaffe_cpu_gemv<Dtype>(CblasTrans, M_, N_, (Dtype)1., top_diff,bias_multiplier_.cpu_data(), (Dtype)1.,this->blobs_[1]->mutable_cpu_diff());
/*更新b，caffe_cpu_gemv类似于caffe_cpu_gemm,只是前者用于矩阵向量相乘*/}if (propagate_down[0]) {const Dtype* top_diff = top[0]->cpu_diff();// Gradient with respect to bottom datacaffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, M_, K_, N_, (Dtype)1.,top_diff, this->blobs_[0]->cpu_data(), (Dtype)0.,bottom[0]->mutable_cpu_diff());}
/*计算梯度，参见公式即可明白各变量含义*/
}

总结

主要在看Backward_cpu，以前学习bp的时候是带激活函数的，在caffe中，为了更方便组合，其全连接层与激活函数是分开的，就这里有一点点不一样。

caffe源码理解之inner_product_layer相关推荐

Deep Compression阅读理解及Caffe源码修改
Deep Compression阅读理解及Caffe源码修改作者:may0324 更新: 没想到这篇文章写出后有这么多人关注和索要源码,有点受宠若惊.说来惭愧,这个工作当时做的很粗糙,源码修改的比 ...
caffe源码分析-layer
本文主要分析caffe layer层,主要内容如下: 从整体上说明下caffe的layer层的类别,以及作用通过proto定义与类Layer简要说明下Layer的核心成员变量; Layer类的核心成 ...
Caffe源码中blob文件分析
Caffe源码(caffe version commit: 09868ac , date: 2015.08.15)中有一些重要的头文件,这里介绍下include/caffe/blob.hpp文件的内容 ...
零基础学caffe源码 ReLU激活函数
零基础学caffe源码 ReLU激活函数原创 2016年08月03日 17:30:19 1.如何有效阅读caffe源码 1.caffe源码阅读路线最好是从src/cafffe/proto/caffe ...
深度学习框架Caffe源码解析
作者:薛云峰(https://github.com/HolidayXue),主要从事视频图像算法的研究, 本文来源微信公众号:深度学习大讲堂. 原文:深度学习框架Caffe源码解析欢迎技术投稿. ...
faster rcnn源码理解（二）之AnchorTargetLayer（网络中的rpn_data）
转载自:faster rcnn源码理解(二)之AnchorTargetLayer(网络中的rpn_data) - 野孩子的专栏 - 博客频道 - CSDN.NET http://blog.csdn.n ...
faster rcnn的源码理解（一）SmoothL1LossLayer论文与代码的结合理解
转载自:faster rcnn的源码理解(一)SmoothL1LossLayer论文与代码的结合理解 - 野孩子的专栏 - 博客频道 - CSDN.NET http://blog.csdn.net/u ...
caffe源码c++学习笔记
转载自:深度学习(七)caffe源码c++学习笔记 - hjimce的专栏 - 博客频道 - CSDN.NET http://blog.csdn.net/hjimce/article/details/ ...
深度学习（七）caffe源码c++学习笔记
caffe源码c++学习笔记原文地址:http://blog.csdn.net/hjimce/article/details/48933845 作者:hjimce 一.预测分类最近几天为了希望深入 ...

caffe源码理解之inner_product_layer

介绍

正文