c++重写卷积网络的前向计算过程，完美复现theano的测试结果

本人的需求是：

通过theano的cnn训练神经网络，将最终稳定的网络权值保存下来。c++实现cnn的前向计算过程，读取theano的权值，复现theano的测试结果

本人最终的成果是：

1、卷积神经网络的前向计算过程
2、mlp网络的前向与后向计算，也就是可以用来训练样本
需要注意的是：
如果为了复现theano的测试结果，那么隐藏层的激活函数要选用tanh；
否则，为了mlp的训练过程，激活函数要选择sigmoid

成果的展现：

下图是theano的训练以及测试结果，验证样本错误率为9.23%

下面是我的c++程序，验证错误率也是9.23%，完美复现theano的结果

简单讲难点有两个：

1.theano的权值以及测试样本与c++如何互通？

2.theano的卷积的时候，上层输入的featuremap如何组合，映射到本层的每个像素点上？

在解决上述两点的过程中，走了很多的弯路：

为了用c++重现theano的测试结果，必须让c++能够读取theano保存的权值以及测试样本。

思考分析如下：

1.theano的权值是numpy格式，而它直接与c++交互，很困难，numpy的格式不好解析，网上资料很少

2.采用python做中间转换，实现1)的要求。后看theano代码，发现读入python的训练样本，不用转换成numpy数组，用本来python就可以了。但是python经过cPickle的dump文件，加了很多格式，不适合同c++交互。

3. 用json转换，由于python和cpp都有json的接口，都转成json的格式，然后再交互。可是theano训练之后权值是numpy格式的，需要转成python数组，json才可以存到文件中。现在的问题是怎么把numpy转成python的list？

4.为了解决3，找了一天，终于找到了numpy数组的tolist接口，可以将numpy数组转换成python的list。

5.现在python和c++都可以用json了。研究jsoncpp库的使用，将python的json文件读取。通过测试发现，库 jsoncpp不适合读取大文件，很容易造成内存不足，效率极低，故不可取。

6.用c++写函数，自己解析json文件。并且通过pot文件生成训练与测试样本的时候，也直接用c++来生成，不需要转换成numpy数组的格式。

经过上述分析，解决了难点1。通过json格式实现c++与theano权值与测试样本的互通，并且自己写函数解析json文件。

对于难点2，看一个典型的cnn网络图

难点2的详细描述如下：

Theano从S2到C3的时候，如何选择S2的featuremap进行组合？每次固定选取还是根据一定的算法动态组合？
Theano从C3到S4的pooling过程，令poolsize是(2*2),如何将C3的每4个像素变成S4的一个像素？

通过大量的分析，对比验证，发现以下结论：

Theano从S2到C3的时候，选择S2的所有featuremap进行组合
Theano从C3到S4的pooling过程，令poolsize是(2*2),，对于C3的每4个像素，选取最大值作为S4的一个像素

通过以上的分析，理论上初步已经弄清楚了。下面就是要根据理论编写代码，真正耗时的是代码的调试过程，总是复现不了theano的测试结果。

曾经不止一次的认为这是不可能复现的，鬼知道theano怎么搞的。

今天终于将代码调通，很是兴奋，于是有了这篇博客。

阻碍我实现结果的bug主要有两个，一个是理论上的不足，对theano卷积的细节把握不准确；一个是自己写代码时粗心，变量初始化错误。如下：

S2到C3卷积时，theano会对卷积核旋转180度之后，才会像下图这样进行卷积（本人刚接触这块，实在是不知道啊。。。）

C3到S4取像素最大值的时候，想当然认为像素都是正的，变量初始化为0，导致最终找最大值错误（这个bug找的时间最久，血淋淋的教训。。。）

Python CNN代码：Here。

然后加入下面代码。

theano对写权值的函数，注意它保存的是卷积核旋转180度后的权值，如果权值是二维的，那么行列互换（与c++的权值表示法统一）

[python] view plain copy

def getDataJson(layers):
data = []
i = 0
for layer in layers:
w, b = layer.params
# print '..layer is', i
w, b = w.get_value(), b.get_value()
wshape = w.shape
# print '...the shape of w is', wshape
if len(wshape) == 2:
w = w.transpose()
else:
for k in xrange(wshape[0]):
for j in xrange(wshape[1]):
w[k][j] = numpy.rot90(w[k][j], 2)
w = w.reshape((wshape[0], numpy.prod(wshape[1:])))
w = w.tolist()
b = b.tolist()
data.append([w, b])
i += 1
return data
def writefile(data, name = '../../tmp/src/data/theanocnn.json'):
print ('writefile is ' + name)
f = open(name, "wb")
json.dump(data,f)
f.close()

theano读权值

[python] view plain copy

def readfile(layers, nkerns, name = '../../tmp/src/data/theanocnn.json'):
# Load the dataset
print ('readfile is ' + name)
f = open(name, 'rb')
data = json.load(f)
f.close()
readwb(data, layers, nkerns)
def readwb(data, layers, nkerns):
i = 0
kernSize = len(nkerns)
inputnum = 1
for layer in layers:
w, b = data[i]
w = numpy.array(w, dtype='float32')
b = numpy.array(b, dtype='float32')
# print '..layer is', i
# print w.shape
if i >= kernSize:
w = w.transpose()
else:
w = w.reshape((nkerns[i], inputnum, 5, 5))
for k in xrange(nkerns[i]):
for j in xrange(inputnum):
c = w[k][j]
w[k][j] = numpy.rot90(c, 2)
inputnum = nkerns[i]
# print '..readwb ，transpose and rot180'
# print w.shape
layer.W.set_value(w, borrow=True)
layer.b.set_value(b, borrow=True)
i += 1

测试样本由手写数字库mnist生成，核心代码如下：

[python] view plain copy

def mnist2json_small(cnnName = 'mnist_small.json', validNumber = 10):
dataset = '../../data/mnist.pkl'
print '... loading data', dataset
# Load the dataset
f = open(dataset, 'rb')
train_set, valid_set, test_set = cPickle.load(f)
#print test_set
f.close()
def np2listSmall(train_set, number):
trainfile = []
trains, labels = train_set
trainfile = []
#如果注释掉下面，将生成number个验证样本
number = len(labels)
for one in trains[:number]:
one = one.tolist()
trainfile.append(one)
labelfile = labels[:number].tolist()
datafile = [trainfile, labelfile]
return datafile
smallData = valid_set
print len(smallData)
valid, validlabel = np2listSmall(smallData, validNumber)
datafile = [valid, validlabel]
basedir = '../../tmp/src/data/'
# basedir = './'
json.dump(datafile, open(basedir + cnnName, 'wb'))

个人收获：

面对较难的任务，逐步分解，各个击破
解决问题的过程中，如果此路不通，要马上寻找其它思路，就像当年做数学证明题一样
态度要积极，不要轻言放弃，尽全力完成任务
代码调试时，应该首先构造较为全面的测试用例，这样可以迅速定位bug

本人的需求以及实现时的困难已经基本描述清楚，如果还有别的小问题，我相信大家花点比俺少很多很多的时间就可以解决，下面开始贴代码

如果不想自己建工程，这里有vs2008的c++代码，自己按照theano生成一下权值就可以读入运行了

C++代码

main.cpp

[cpp] view plain copy

#include <iostream>
#include "mlp.h"
#include "util.h"
#include "testinherit.h"
#include "neuralNetwork.h"
using namespace std;
/************************************************************************/
/* 本程序实现了
1、卷积神经网络的前向计算过程
2、mlp网络的前向与后向计算，也就是可以用来训练样本
需要注意的是：
如果为了复现theano的测试结果，那么隐藏层的激活函数要选用tanh；
否则，为了mlp的训练过程，激活函数要选择sigmoid
*/
/************************************************************************/
int main()
{
cout << "****cnn****" << endl;
TestCnnTheano(28 * 28, 10);
// TestMlpMnist对mlp训练样本进行测试
//TestMlpMnist(28 * 28, 500, 10);
return 0;
}

neuralNetwork.h

[cpp] view plain copy

#ifndef NEURALNETWORK_H
#define NEURALNETWORK_H
#include "mlp.h"
#include "cnn.h"
#include <vector>
using std::vector;
/************************************************************************/
/* 这是一个卷积神经网络 */
/************************************************************************/
class NeuralNetWork
{
public:
NeuralNetWork(int iInput, int iOut);
~NeuralNetWork();
void Predict(double** in_data, int n);
double CalErrorRate(const vector<double *> &vecvalid, const vector<WORD> &vecValidlabel);
void Setwb(vector< vector<double*> > &vvAllw, vector< vector<double> > &vvAllb);
void SetTrainNum(int iNum);
int Predict(double *pInputData);
// void Forward_propagation(double** ppdata, int n);
double* Forward_propagation(double *);
private:
int m_iSampleNum; //样本数量
int m_iInput; //输入维数
int m_iOut; //输出维数
vector<CnnLayer *> vecCnns;
Mlp *m_pMlp;
};
void TestCnnTheano(const int iInput, const int iOut);
#endif

neuralNetwork.cpp

[cpp] view plain copy

#include "neuralNetwork.h"
#include <iostream>
#include "util.h"
#include <iomanip>
using namespace std;
NeuralNetWork::NeuralNetWork(int iInput, int iOut):m_iSampleNum(0), m_iInput(iInput), m_iOut(iOut), m_pMlp(NULL)
{
int iFeatureMapNumber = 20, iPoolWidth = 2, iInputImageWidth = 28, iKernelWidth = 5, iInputImageNumber = 1;
CnnLayer *pCnnLayer = new CnnLayer(m_iSampleNum, iInputImageNumber, iInputImageWidth, iFeatureMapNumber, iKernelWidth, iPoolWidth);
vecCnns.push_back(pCnnLayer);
iInputImageNumber = 20;
iInputImageWidth = 12;
iFeatureMapNumber = 50;
pCnnLayer = new CnnLayer(m_iSampleNum, iInputImageNumber, iInputImageWidth, iFeatureMapNumber, iKernelWidth, iPoolWidth);
vecCnns.push_back(pCnnLayer);
const int ihiddenSize = 1;
int phidden[ihiddenSize] = {500};
// construct LogisticRegression
m_pMlp = new Mlp(m_iSampleNum, iFeatureMapNumber * 4 * 4, m_iOut, ihiddenSize, phidden);
}
NeuralNetWork::~NeuralNetWork()
{
for (vector<CnnLayer*>::iterator it = vecCnns.begin(); it != vecCnns.end(); ++it)
{
delete *it;
}
delete m_pMlp;
}
void NeuralNetWork::SetTrainNum(int iNum)
{
m_iSampleNum = iNum;
for (size_t i = 0; i < vecCnns.size(); ++i)
{
vecCnns[i]->SetTrainNum(iNum);
}
m_pMlp->SetTrainNum(iNum);
}
int NeuralNetWork::Predict(double *pdInputdata)
{
double *pdPredictData = NULL;
pdPredictData = Forward_propagation(pdInputdata);
int iResult = -1;
iResult = m_pMlp->m_pLogisticLayer->Predict(pdPredictData);
return iResult;
}
double* NeuralNetWork::Forward_propagation(double *pdInputData)
{
double *pdPredictData = pdInputData;
vector<CnnLayer*>::iterator it;
CnnLayer *pCnnLayer;
for (it = vecCnns.begin(); it != vecCnns.end(); ++it)
{
pCnnLayer = *it;
pCnnLayer->Forward_propagation(pdPredictData);
pdPredictData = pCnnLayer->GetOutputData();
}
//此时pCnnLayer指向最后一个卷积层,pdInputData是卷积层的最后输出
//暂时忽略mlp的前向计算，以后加上
pdPredictData = m_pMlp->Forward_propagation(pdPredictData);
return pdPredictData;
}
void NeuralNetWork::Setwb(vector< vector<double*> > &vvAllw, vector< vector<double> > &vvAllb)
{
for (size_t i = 0; i < vecCnns.size(); ++i)
{
vecCnns[i]->Setwb(vvAllw[i], vvAllb[i]);
}
size_t iLayerNum = vvAllw.size();
for (size_t i = vecCnns.size(); i < iLayerNum - 1; ++i)
{
int iHiddenIndex = 0;
m_pMlp->m_ppHiddenLayer[iHiddenIndex]->Setwb(vvAllw[i], vvAllb[i]);
++iHiddenIndex;
}
m_pMlp->m_pLogisticLayer->Setwb(vvAllw[iLayerNum - 1], vvAllb[iLayerNum - 1]);
}
double NeuralNetWork::CalErrorRate(const vector<double *> &vecvalid, const vector<WORD> &vecValidlabel)
{
cout << "Predict------------" << endl;
int iErrorNumber = 0, iValidNumber = vecValidlabel.size();
//iValidNumber = 1;
for (int i = 0; i < iValidNumber; ++i)
{
int iResult = Predict(vecvalid[i]);
//cout << i << ",valid is " << iResult << " label is " << vecValidlabel[i] << endl;
if (iResult != vecValidlabel[i])
{
++iErrorNumber;
}
}
cout << "the num of error is " << iErrorNumber << endl;
double dErrorRate = (double)iErrorNumber / iValidNumber;
cout << "the error rate of Train sample by softmax is " << setprecision(10) << dErrorRate * 100 << "%" << endl;
return dErrorRate;
}
/************************************************************************/
/*
测试样本采用mnist库，此cnn的结构与theano教程上的一致，即
输入是28*28图像，接下来是2个卷积层（卷积+pooling），featuremap个数分别是20和50，
然后是全连接层（500个神经元），最后输出层10个神经元
*/
/************************************************************************/
void TestCnnTheano(const int iInput, const int iOut)
{
//构建卷积神经网络
NeuralNetWork neural(iInput, iOut);
//存取theano的权值
vector< vector<double*> > vvAllw;
vector< vector<double> > vvAllb;
//存取测试样本与标签
vector<double*> vecValid;
vector<WORD> vecLabel;
//保存theano权值与测试样本的文件
const char *szTheanoWeigh = "../../data/theanocnn.json", *szTheanoTest = "../../data/mnist_validall.json";
//将每次权值的第二维（列宽）保存到vector中，用于读取json文件
vector<int> vecSecondDimOfWeigh;
vecSecondDimOfWeigh.push_back(5 * 5);
vecSecondDimOfWeigh.push_back(20 * 5 * 5);
vecSecondDimOfWeigh.push_back(50 * 4 * 4);
vecSecondDimOfWeigh.push_back(500);
cout << "loadwb ---------" << endl;
//读取权值
LoadWeighFromJson(vvAllw, vvAllb, szTheanoWeigh, vecSecondDimOfWeigh);
//将权值设置到cnn中
neural.Setwb(vvAllw, vvAllb);
//读取测试文件
LoadTestSampleFromJson(vecValid, vecLabel, szTheanoTest, iInput);
//设置测试样本的总量
int iVaildNum = vecValid.size();
neural.SetTrainNum(iVaildNum);
//前向计算cnn的错误率，输出结果
neural.CalErrorRate(vecValid, vecLabel);
//释放测试文件所申请的空间
for (vector<double*>::iterator cit = vecValid.begin(); cit != vecValid.end(); ++cit)
{
delete [](*cit);
}
}

cnn.h

[cpp] view plain copy

#ifndef CNN_H
#define CNN_H
#include "featuremap.h"
#include "poollayer.h"
#include <vector>
using std::vector;
typedef unsigned short WORD;
/**
*本卷积模拟theano的测试过程
*当输入层是num个featuremap时，本层卷积层假设有featureNum个featuremap。
*对于本层每个像素点选取，上一层num个featuremap一起组合，并且没有bias
*然后本层输出到pooling层，pooling只对poolsize内的像素取最大值，然后加上bias，总共有featuremap个bias值
*/
class CnnLayer
{
public:
CnnLayer(int iSampleNum, int iInputImageNumber, int iInputImageWidth, int iFeatureMapNumber,
int iKernelWidth, int iPoolWidth);
~CnnLayer();
void Forward_propagation(double *pdInputData);
void Back_propagation(double* , double* , double );
void Train(double *x, WORD y, double dLr);
int Predict(double *);
void Setwb(vector<double*> &vpdw, vector<double> &vdb);
void SetInputAllData(double **ppInputAllData, int iInputNum);
void SetTrainNum(int iSampleNumber);
void PrintOutputData();
double* GetOutputData();
private:
int m_iSampleNum;
FeatureMap *m_pFeatureMap;
PoolLayer *m_pPoolLayer;
//反向传播时所需值
double **m_ppdDelta;
double *m_pdInputData;
double *m_pdOutputData;
};
void TestCnn();
#endif // CNN_H

cnn.cpp

[cpp] view plain copy

#include "cnn.h"
#include "util.h"
#include <cassert>
CnnLayer::CnnLayer(int iSampleNum, int iInputImageNumber, int iInputImageWidth, int iFeatureMapNumber,
int iKernelWidth, int iPoolWidth):
m_iSampleNum(iSampleNum), m_pdInputData(NULL), m_pdOutputData(NULL)
{
m_pFeatureMap = new FeatureMap(iInputImageNumber, iInputImageWidth, iFeatureMapNumber, iKernelWidth);
int iFeatureMapWidth = iInputImageWidth - iKernelWidth + 1;
m_pPoolLayer = new PoolLayer(iFeatureMapNumber, iPoolWidth, iFeatureMapWidth);
}
CnnLayer::~CnnLayer()
{
delete m_pFeatureMap;
delete m_pPoolLayer;
}
void CnnLayer::Forward_propagation(double *pdInputData)
{
m_pFeatureMap->Convolute(pdInputData);
m_pPoolLayer->Convolute(m_pFeatureMap->GetFeatureMapValue());
m_pdOutputData = m_pPoolLayer->GetOutputData();
/************************************************************************/
/* 调试卷积过程的各阶段结果，调通后删除 */
/************************************************************************/
/*m_pFeatureMap->PrintOutputData();
m_pPoolLayer->PrintOutputData();*/
}
void CnnLayer::SetInputAllData(double **ppInputAllData, int iInputNum)
{
}
double* CnnLayer::GetOutputData()
{
assert(NULL != m_pdOutputData);
return m_pdOutputData;
}
void CnnLayer::Setwb(vector<double*> &vpdw, vector<double> &vdb)
{
m_pFeatureMap->SetWeigh(vpdw);
m_pPoolLayer->SetBias(vdb);
}
void CnnLayer::SetTrainNum( int iSampleNumber )
{
m_iSampleNum = iSampleNumber;
}
void CnnLayer::PrintOutputData()
{
m_pFeatureMap->PrintOutputData();
m_pPoolLayer->PrintOutputData();
}
void TestCnn()
{
const int iFeatureMapNumber = 2, iPoolWidth = 2, iInputImageWidth = 8, iKernelWidth = 3, iInputImageNumber = 2;
double *pdImage = new double[iInputImageWidth * iInputImageWidth * iInputImageNumber];
double arrInput[iInputImageNumber][iInputImageWidth * iInputImageWidth];
MakeCnnSample(arrInput, pdImage, iInputImageWidth, iInputImageNumber);
double *pdKernel = new double[3 * 3 * iInputImageNumber];
double arrKernel[3 * 3 * iInputImageNumber];
MakeCnnWeigh(pdKernel, iInputImageNumber) ;
CnnLayer cnn(3, iInputImageNumber, iInputImageWidth, iFeatureMapNumber, iKernelWidth, iPoolWidth);
vector <double*> vecWeigh;
vector <double> vecBias;
for (int i = 0; i < iFeatureMapNumber; ++i)
{
vecBias.push_back(1.0);
}
vecWeigh.push_back(pdKernel);
for (int i = 0; i < 3 * 3 * 2; ++i)
{
arrKernel[i] = i;
}
vecWeigh.push_back(arrKernel);
cnn.Setwb(vecWeigh, vecBias);
cnn.Forward_propagation(pdImage);
cnn.PrintOutputData();
delete []pdKernel;
delete []pdImage;
}

featuremap.h

[cpp] view plain copy

#ifndef FEATUREMAP_H
#define FEATUREMAP_H
#include <cassert>
#include <vector>
using std::vector;
class FeatureMap
{
public:
FeatureMap(int iInputImageNumber, int iInputImageWidth, int iFeatureMapNumber, int iKernelWidth);
~FeatureMap();
void Forward_propagation(double* );
void Back_propagation(double* , double* , double );
void Convolute(double *pdInputData);
int GetFeatureMapSize()
{
return m_iFeatureMapSize;
}
int GetFeatureMapWidth()
{
return m_iFeatureMapWidth;
}
double* GetFeatureMapValue()
{
assert(m_pdOutputValue != NULL);
return m_pdOutputValue;
}
void SetWeigh(const vector<double *> &vecWeigh);
void PrintOutputData();
double **m_ppdWeigh;
double *m_pdBias;
private:
int m_iInputImageNumber;
int m_iInputImageWidth;
int m_iInputImageSize;
int m_iFeatureMapNumber;
int m_iFeatureMapWidth;
int m_iFeatureMapSize;
int m_iKernelWidth;
// double m_dBias;
double *m_pdOutputValue;
};
#endif // FEATUREMAP_H

featuremap.cpp

[cpp] view plain copy

#include "featuremap.h"
#include "util.h"
#include <cassert>
FeatureMap::FeatureMap(int iInputImageNumber, int iInputImageWidth, int iFeatureMapNumber, int iKernelWidth):
m_iInputImageNumber(iInputImageNumber),
m_iInputImageWidth(iInputImageWidth),
m_iFeatureMapNumber(iFeatureMapNumber),
m_iKernelWidth(iKernelWidth)
{
m_iFeatureMapWidth = m_iInputImageWidth - m_iKernelWidth + 1;
m_iInputImageSize = m_iInputImageWidth * m_iInputImageWidth;
m_iFeatureMapSize = m_iFeatureMapWidth * m_iFeatureMapWidth;
int iKernelSize;
iKernelSize = m_iKernelWidth * m_iKernelWidth;
double dbase = 1.0 / m_iInputImageSize;
srand((unsigned)time(NULL));
m_ppdWeigh = new double*[m_iFeatureMapNumber];
m_pdBias = new double[m_iFeatureMapNumber];
for (int i = 0; i < m_iFeatureMapNumber; ++i)
{
m_ppdWeigh[i] = new double[m_iInputImageNumber * iKernelSize];
for (int j = 0; j < m_iInputImageNumber * iKernelSize; ++j)
{
m_ppdWeigh[i][j] = uniform(-dbase, dbase);
}
//m_pdBias[i] = uniform(-dbase, dbase);
//theano的卷积层貌似没有用到bias，它在pooling层使用
m_pdBias[i] = 0;
}
m_pdOutputValue = new double[m_iFeatureMapNumber * m_iFeatureMapSize];
// m_dBias = uniform(-dbase, dbase);
}
FeatureMap::~FeatureMap()
{
delete []m_pdOutputValue;
delete []m_pdBias;
for (int i = 0; i < m_iFeatureMapNumber; ++i)
{
delete []m_ppdWeigh[i];
}
delete []m_ppdWeigh;
}
void FeatureMap::SetWeigh(const vector<double *> &vecWeigh)
{
assert(vecWeigh.size() == (DWORD)m_iFeatureMapNumber);
for (int i = 0; i < m_iFeatureMapNumber; ++i)
{
delete []m_ppdWeigh[i];
m_ppdWeigh[i] = vecWeigh[i];
}
}
/*
卷积计算
pdInputData:一维向量，包含若干个输入图像
*/
void FeatureMap::Convolute(double *pdInputData)
{
for (int iMapIndex = 0; iMapIndex < m_iFeatureMapNumber; ++iMapIndex)
{
double dBias = m_pdBias[iMapIndex];
//每一个featuremap
for (int i = 0; i < m_iFeatureMapWidth; ++i)
{
for (int j = 0; j < m_iFeatureMapWidth; ++j)
{
double dSum = 0.0;
int iInputIndex, iKernelIndex, iInputIndexStart, iKernelStart, iOutIndex;
//输出向量的索引计算
iOutIndex = iMapIndex * m_iFeatureMapSize + i * m_iFeatureMapWidth + j;
//分别计算每一个输入图像
for (int k = 0; k < m_iInputImageNumber; ++k)
{
//与kernel对应的输入图像的起始位置
//iInputIndexStart = k * m_iInputImageSize + j * m_iInputImageWidth + i;
iInputIndexStart = k * m_iInputImageSize + i * m_iInputImageWidth + j;
//kernel的起始位置
iKernelStart = k * m_iKernelWidth * m_iKernelWidth;
for (int m = 0; m < m_iKernelWidth; ++m)
{
for (int n = 0; n < m_iKernelWidth; ++n)
{
//iKernelIndex = iKernelStart + n * m_iKernelWidth + m;
iKernelIndex = iKernelStart + m * m_iKernelWidth + n;
//i am not sure, is the expression of below correct?
iInputIndex = iInputIndexStart + m * m_iInputImageWidth + n;
dSum += pdInputData[iInputIndex] * m_ppdWeigh[iMapIndex][iKernelIndex];
}//end n
}//end m
}//end k
//加上偏置
//dSum += dBias;
m_pdOutputValue[iOutIndex] = dSum;
}//end j
}//end i
}//end iMapIndex
}
void FeatureMap::PrintOutputData()
{
for (int i = 0; i < m_iFeatureMapNumber; ++i)
{
cout << "featuremap " << i <<endl;
for (int m = 0; m < m_iFeatureMapWidth; ++m)
{
for (int n = 0; n < m_iFeatureMapWidth; ++n)
{
cout << m_pdOutputValue[i * m_iFeatureMapSize +m * m_iFeatureMapWidth +n] << ' ';
}
cout << endl;
}
cout <<endl;
}
}

poollayer.h

[cpp] view plain copy

#ifndef POOLLAYER_H
#define POOLLAYER_H
#include <vector>
using std::vector;
class PoolLayer
{
public:
PoolLayer(int iOutImageNumber, int iPoolWidth, int iFeatureMapWidth);
~PoolLayer();
void Convolute(double *pdInputData);
void SetBias(const vector<double> &vecBias);
double* GetOutputData();
void PrintOutputData();
private:
int m_iOutImageNumber;
int m_iPoolWidth;
int m_iFeatureMapWidth;
int m_iPoolSize;
int m_iOutImageEdge;
int m_iOutImageSize;
double *m_pdOutData;
double *m_pdBias;
};
#endif // POOLLAYER_H

poollayer.cpp

[cpp] view plain copy

#include "poollayer.h"
#include "util.h"
#include <cassert>
PoolLayer::PoolLayer(int iOutImageNumber, int iPoolWidth, int iFeatureMapWidth):
m_iOutImageNumber(iOutImageNumber),
m_iPoolWidth(iPoolWidth),
m_iFeatureMapWidth(iFeatureMapWidth)
{
m_iPoolSize = m_iPoolWidth * m_iPoolWidth;
m_iOutImageEdge = m_iFeatureMapWidth / m_iPoolWidth;
m_iOutImageSize = m_iOutImageEdge * m_iOutImageEdge;
m_pdOutData = new double[m_iOutImageNumber * m_iOutImageSize];
m_pdBias = new double[m_iOutImageNumber];
/*for (int i = 0; i < m_iOutImageNumber; ++i)
{
m_pdBias[i] = 1;
}*/
}
PoolLayer::~PoolLayer()
{
delete []m_pdOutData;
delete []m_pdBias;
}
void PoolLayer::Convolute(double *pdInputData)
{
int iFeatureMapSize = m_iFeatureMapWidth * m_iFeatureMapWidth;
for (int iOutImageIndex = 0; iOutImageIndex < m_iOutImageNumber; ++iOutImageIndex)
{
double dBias = m_pdBias[iOutImageIndex];
for (int i = 0; i < m_iOutImageEdge; ++i)
{
for (int j = 0; j < m_iOutImageEdge; ++j)
{
double dValue = 0.0;
int iInputIndex, iInputIndexStart, iOutIndex;
/************************************************************************/
/* 这里是最大的bug，dMaxPixel初始值设置为0，然后找最大值
** 问题在于像素值有负数，导致后面一系列计算错误，实在是太难找了
/************************************************************************/
double dMaxPixel = INT_MIN ;
iOutIndex = iOutImageIndex * m_iOutImageSize + i * m_iOutImageEdge + j;
iInputIndexStart = iOutImageIndex * iFeatureMapSize + (i * m_iFeatureMapWidth + j) * m_iPoolWidth;
for (int m = 0; m < m_iPoolWidth; ++m)
{
for (int n = 0; n < m_iPoolWidth; ++n)
{
// int iPoolIndex = m * m_iPoolWidth + n;
//i am not sure, the expression of below is correct?
iInputIndex = iInputIndexStart + m * m_iFeatureMapWidth + n;
if (pdInputData[iInputIndex] > dMaxPixel)
{
dMaxPixel = pdInputData[iInputIndex];
}
}//end n
}//end m
dValue = dMaxPixel + dBias;
assert(iOutIndex < m_iOutImageNumber * m_iOutImageSize);
//m_pdOutData[iOutIndex] = (dMaxPixel);
m_pdOutData[iOutIndex] = mytanh(dValue);
}//end j
}//end i
}//end iOutImageIndex
}
void PoolLayer::SetBias(const vector<double> &vecBias)
{
assert(vecBias.size() == (DWORD)m_iOutImageNumber);
for (int i = 0; i < m_iOutImageNumber; ++i)
{
m_pdBias[i] = vecBias[i];
}
}
double* PoolLayer::GetOutputData()
{
assert(NULL != m_pdOutData);
return m_pdOutData;
}
void PoolLayer::PrintOutputData()
{
for (int i = 0; i < m_iOutImageNumber; ++i)
{
cout << "pool image " << i <<endl;
for (int m = 0; m < m_iOutImageEdge; ++m)
{
for (int n = 0; n < m_iOutImageEdge; ++n)
{
cout << m_pdOutData[i * m_iOutImageSize + m * m_iOutImageEdge + n] << ' ';
}
cout << endl;
}
cout <<endl;
}
}

mlp.h

[cpp] view plain copy

#ifndef MLP_H
#define MLP_H
#include "hiddenLayer.h"
#include "logisticRegression.h"
class Mlp
{
public:
Mlp(int n, int n_i, int n_o, int nhl, int *hls);
~Mlp();
// void Train(double** in_data, double** in_label, double dLr, int epochs);
void Predict(double** in_data, int n);
void Train(double *x, WORD y, double dLr);
void TrainAllSample(const vector<double*> &vecTrain, const vector<WORD> &vectrainlabel, double dLr);
double CalErrorRate(const vector<double *> &vecvalid, const vector<WORD> &vecValidlabel);
void Writewb(const char *szName);
void Readwb(const char *szName);
void Setwb(vector< vector<double*> > &vvAllw, vector< vector<double> > &vvAllb);
void SetTrainNum(int iNum);
int Predict(double *pInputData);
// void Forward_propagation(double** ppdata, int n);
double* Forward_propagation(double *);
int* GetHiddenSize();
int GetHiddenNumber();
double *GetHiddenOutputData();
HiddenLayer **m_ppHiddenLayer;
LogisticRegression *m_pLogisticLayer;
private:
int m_iSampleNum; //样本数量
int m_iInput; //输入维数
int m_iOut; //输出维数
int m_iHiddenLayerNum; //隐层数目
int* m_piHiddenLayerSize; //中间隐层的大小 e.g. {3,4}表示有两个隐层，第一个有三个节点，第二个有4个节点
};
void mlp();
void TestMlpTheano(const int m_iInput, const int ihidden, const int m_iOut);
void TestMlpMnist(const int m_iInput, const int ihidden, const int m_iOut);
#endif

mlp.cpp

[cpp] view plain copy

#include <iostream>
#include "mlp.h"
#include "util.h"
#include <cassert>
#include <iomanip>
using namespace std;
const int m_iSamplenum = 8, innode = 3, outnode = 8;
Mlp::Mlp(int n, int n_i, int n_o, int nhl, int *hls)
{
m_iSampleNum = n;
m_iInput = n_i;
m_iOut = n_o;
m_iHiddenLayerNum = nhl;
m_piHiddenLayerSize = hls;
//构造网络结构
m_ppHiddenLayer = new HiddenLayer* [m_iHiddenLayerNum];
for(int i = 0; i < m_iHiddenLayerNum; ++i)
{
if(i == 0)
{
m_ppHiddenLayer[i] = new HiddenLayer(m_iInput, m_piHiddenLayerSize[i]);//第一个隐层
}
else
{
m_ppHiddenLayer[i] = new HiddenLayer(m_piHiddenLayerSize[i-1], m_piHiddenLayerSize[i]);//其他隐层
}
}
if (m_iHiddenLayerNum > 0)
{
m_pLogisticLayer = new LogisticRegression(m_piHiddenLayerSize[m_iHiddenLayerNum - 1], m_iOut, m_iSampleNum);//最后的softmax层
}
else
{
m_pLogisticLayer = new LogisticRegression(m_iInput, m_iOut, m_iSampleNum);//最后的softmax层
}
}
Mlp::~Mlp()
{
//二维指针分配的对象不一定是二维数组
for(int i = 0; i < m_iHiddenLayerNum; ++i)
delete m_ppHiddenLayer[i]; //删除的时候不能加[]
delete[] m_ppHiddenLayer;
//log_layer只是一个普通的对象指针，不能作为数组delete
delete m_pLogisticLayer;//删除的时候不能加[]
}
void Mlp::TrainAllSample(const vector<double *> &vecTrain, const vector<WORD> &vectrainlabel, double dLr)
{
cout << "Mlp::TrainAllSample" << endl;
for (int j = 0; j < m_iSampleNum; ++j)
{
Train(vecTrain[j], vectrainlabel[j], dLr);
}
}
void Mlp::Train(double *pdTrain, WORD usLabel, double dLr)
{
// cout << "******pdLabel****" << endl;
// printArrDouble(ppdinLabel, m_iSampleNum, m_iOut);
double *pdLabel = new double[m_iOut];
MakeOneLabel(usLabel, pdLabel, m_iOut);
//前向传播阶段
for(int n = 0; n < m_iHiddenLayerNum; ++ n)
{
if(n == 0) //第一个隐层直接输入数据
{
m_ppHiddenLayer[n]->Forward_propagation(pdTrain);
}
else //其他隐层用前一层的输出作为输入数据
{
m_ppHiddenLayer[n]->Forward_propagation(m_ppHiddenLayer[n-1]->m_pdOutdata);
}
}
//softmax层使用最后一个隐层的输出作为输入数据
m_pLogisticLayer->Forward_propagation(m_ppHiddenLayer[m_iHiddenLayerNum-1]->m_pdOutdata);
//反向传播阶段
m_pLogisticLayer->Back_propagation(m_ppHiddenLayer[m_iHiddenLayerNum-1]->m_pdOutdata, pdLabel, dLr);
for(int n = m_iHiddenLayerNum-1; n >= 1; --n)
{
if(n == m_iHiddenLayerNum-1)
{
m_ppHiddenLayer[n]->Back_propagation(m_ppHiddenLayer[n-1]->m_pdOutdata,
m_pLogisticLayer->m_pdDelta, m_pLogisticLayer->m_ppdW, m_pLogisticLayer->m_iOut, dLr);
}
else
{
double *pdInputData;
pdInputData = m_ppHiddenLayer[n-1]->m_pdOutdata;
m_ppHiddenLayer[n]->Back_propagation(pdInputData,
m_ppHiddenLayer[n+1]->m_pdDelta, m_ppHiddenLayer[n+1]->m_ppdW, m_ppHiddenLayer[n+1]->m_iOut, dLr);
}
}
//这里该怎么写？
if (m_iHiddenLayerNum > 1)
m_ppHiddenLayer[0]->Back_propagation(pdTrain,
m_ppHiddenLayer[1]->m_pdDelta, m_ppHiddenLayer[1]->m_ppdW, m_ppHiddenLayer[1]->m_iOut, dLr);
else
m_ppHiddenLayer[0]->Back_propagation(pdTrain,
m_pLogisticLayer->m_pdDelta, m_pLogisticLayer->m_ppdW, m_pLogisticLayer->m_iOut, dLr);
delete []pdLabel;
}
void Mlp::SetTrainNum(int iNum)
{
m_iSampleNum = iNum;
}
double* Mlp::Forward_propagation(double* pData)
{
double *pdForwardValue = pData;
for(int n = 0; n < m_iHiddenLayerNum; ++ n)
{
if(n == 0) //第一个隐层直接输入数据
{
pdForwardValue = m_ppHiddenLayer[n]->Forward_propagation(pData);
}
else //其他隐层用前一层的输出作为输入数据
{
pdForwardValue = m_ppHiddenLayer[n]->Forward_propagation(pdForwardValue);
}
}
return pdForwardValue;
//softmax层使用最后一个隐层的输出作为输入数据
// m_pLogisticLayer->Forward_propagation(m_ppHiddenLayer[m_iHiddenLayerNum-1]->m_pdOutdata);
// m_pLogisticLayer->Predict(m_ppHiddenLayer[m_iHiddenLayerNum-1]->m_pdOutdata);
}
int Mlp::Predict(double *pInputData)
{
Forward_propagation(pInputData);
int iResult = m_pLogisticLayer->Predict(m_ppHiddenLayer[m_iHiddenLayerNum-1]->m_pdOutdata);
return iResult;
}
void Mlp::Setwb(vector< vector<double*> > &vvAllw, vector< vector<double> > &vvAllb)
{
for (int i = 0; i < m_iHiddenLayerNum; ++i)
{
m_ppHiddenLayer[i]->Setwb(vvAllw[i], vvAllb[i]);
}
m_pLogisticLayer->Setwb(vvAllw[m_iHiddenLayerNum], vvAllb[m_iHiddenLayerNum]);
}
void Mlp::Writewb(const char *szName)
{
for(int i = 0; i < m_iHiddenLayerNum; ++i)
{
m_ppHiddenLayer[i]->Writewb(szName);
}
m_pLogisticLayer->Writewb(szName);
}
double Mlp::CalErrorRate(const vector<double *> &vecvalid, const vector<WORD> &vecValidlabel)
{
int iErrorNumber = 0, iValidNumber = vecValidlabel.size();
for (int i = 0; i < iValidNumber; ++i)
{
int iResult = Predict(vecvalid[i]);
if (iResult != vecValidlabel[i])
{
++iErrorNumber;
}
}
cout << "the num of error is " << iErrorNumber << endl;
double dErrorRate = (double)iErrorNumber / iValidNumber;
cout << "the error rate of Train sample by softmax is " << setprecision(10) << dErrorRate * 100 << "%" << endl;
return dErrorRate;
}
void Mlp::Readwb(const char *szName)
{
long dcurpos = 0, dreadsize = 0;
for(int i = 0; i < m_iHiddenLayerNum; ++i)
{
dreadsize = m_ppHiddenLayer[i]->Readwb(szName, dcurpos);
cout << "hiddenlayer " << i + 1 << " read bytes: " << dreadsize << endl;
if (-1 != dreadsize)
dcurpos += dreadsize;
else
{
cout << "read wb error from HiddenLayer" << endl;
return;
}
}
dreadsize = m_pLogisticLayer->Readwb(szName, dcurpos);
if (-1 != dreadsize)
dcurpos += dreadsize;
else
{
cout << "read wb error from sofmaxLayer" << endl;
return;
}
}
int* Mlp::GetHiddenSize()
{
return m_piHiddenLayerSize;
}
double* Mlp::GetHiddenOutputData()
{
assert(m_iHiddenLayerNum > 0);
return m_ppHiddenLayer[m_iHiddenLayerNum-1]->m_pdOutdata;
}
int Mlp::GetHiddenNumber()
{
return m_iHiddenLayerNum;
}
//double **makeLabelSample(double **label_x)
double **makeLabelSample(double label_x[][outnode])
{
double **pplabelSample;
pplabelSample = new double*[m_iSamplenum];
for (int i = 0; i < m_iSamplenum; ++i)
{
pplabelSample[i] = new double[outnode];
}
for (int i = 0; i < m_iSamplenum; ++i)
{
for (int j = 0; j < outnode; ++j)
pplabelSample[i][j] = label_x[i][j];
}
return pplabelSample;
}
double **maken_train(double train_x[][innode])
{
double **ppn_train;
ppn_train = new double*[m_iSamplenum];
for (int i = 0; i < m_iSamplenum; ++i)
{
ppn_train[i] = new double[innode];
}
for (int i = 0; i < m_iSamplenum; ++i)
{
for (int j = 0; j < innode; ++j)
ppn_train[i][j] = train_x[i][j];
}
return ppn_train;
}
void TestMlpMnist(const int m_iInput, const int ihidden, const int m_iOut)
{
const int ihiddenSize = 1;
int phidden[ihiddenSize] = {ihidden};
// construct LogisticRegression
Mlp neural(m_iSamplenum, m_iInput, m_iOut, ihiddenSize, phidden);
vector<double*> vecTrain, vecvalid;
vector<WORD> vecValidlabel, vectrainlabel;
LoadTestSampleFromJson(vecvalid, vecValidlabel, "../../data/mnist.json", m_iInput);
LoadTestSampleFromJson(vecTrain, vectrainlabel, "../../data/mnisttrain.json", m_iInput);
// test
int itrainnum = vecTrain.size();
neural.SetTrainNum(itrainnum);
const int iepochs = 1;
const double dLr = 0.1;
neural.CalErrorRate(vecvalid, vecValidlabel);
for (int i = 0; i < iepochs; ++i)
{
neural.TrainAllSample(vecTrain, vectrainlabel, dLr);
neural.CalErrorRate(vecvalid, vecValidlabel);
}
for (vector<double*>::iterator cit = vecTrain.begin(); cit != vecTrain.end(); ++cit)
{
delete [](*cit);
}
for (vector<double*>::iterator cit = vecvalid.begin(); cit != vecvalid.end(); ++cit)
{
delete [](*cit);
}
}
void TestMlpTheano(const int m_iInput, const int ihidden, const int m_iOut)
{
const int ihiddenSize = 1;
int phidden[ihiddenSize] = {ihidden};
// construct LogisticRegression
Mlp neural(m_iSamplenum, m_iInput, m_iOut, ihiddenSize, phidden);
vector<double*> vecTrain, vecw;
vector<double> vecb;
vector<WORD> vecLabel;
vector< vector<double*> > vvAllw;
vector< vector<double> > vvAllb;
const char *pcfilename = "../../data/theanomlp.json";
vector<int> vecSecondDimOfWeigh;
vecSecondDimOfWeigh.push_back(m_iInput);
vecSecondDimOfWeigh.push_back(ihidden);
LoadWeighFromJson(vvAllw, vvAllb, pcfilename, vecSecondDimOfWeigh);
LoadTestSampleFromJson(vecTrain, vecLabel, "../../data/mnist_validall.json", m_iInput);
cout << "loadwb ---------" << endl;
int itrainnum = vecTrain.size();
neural.SetTrainNum(itrainnum);
neural.Setwb(vvAllw, vvAllb);
cout << "Predict------------" << endl;
neural.CalErrorRate(vecTrain, vecLabel);
for (vector<double*>::iterator cit = vecTrain.begin(); cit != vecTrain.end(); ++cit)
{
delete [](*cit);
}
}
void mlp()
{
//输入样本
double X[m_iSamplenum][innode]= {
{0,0,0},{0,0,1},{0,1,0},{0,1,1},{1,0,0},{1,0,1},{1,1,0},{1,1,1}
};
double Y[m_iSamplenum][outnode]={
{1, 0, 0, 0, 0, 0, 0, 0},
{0, 1, 0, 0, 0, 0, 0, 0},
{0, 0, 1, 0, 0, 0, 0, 0},
{0, 0, 0, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 1, 0, 0, 0},
{0, 0, 0, 0, 0, 1, 0, 0},
{0, 0, 0, 0, 0, 0, 1, 0},
{0, 0, 0, 0, 0, 0, 0, 1},
};
WORD pdLabel[outnode] = {0, 1, 2, 3, 4, 5, 6, 7};
const int ihiddenSize = 2;
int phidden[ihiddenSize] = {5, 5};
//printArr(phidden, 1);
Mlp neural(m_iSamplenum, innode, outnode, ihiddenSize, phidden);
double **train_x, **ppdLabel;
train_x = maken_train(X);
//printArrDouble(train_x, m_iSamplenum, innode);
ppdLabel = makeLabelSample(Y);
for (int i = 0; i < 3500; ++i)
{
for (int j = 0; j < m_iSamplenum; ++j)
{
neural.Train(train_x[j], pdLabel[j], 0.1);
}
}
cout<<"trainning complete..."<<endl;
for (int i = 0; i < m_iSamplenum; ++i)
neural.Predict(train_x[i]);
//szName存放权值
// const char *szName = "mlp55new.wb";
// neural.Writewb(szName);
// Mlp neural2(m_iSamplenum, innode, outnode, ihiddenSize, phidden);
// cout<<"Readwb start..."<<endl;
// neural2.Readwb(szName);
// cout<<"Readwb end..."<<endl;
// cout << "----------after readwb________" << endl;
// for (int i = 0; i < m_iSamplenum; ++i)
// neural2.Forward_propagation(train_x[i]);
for (int i = 0; i != m_iSamplenum; ++i)
{
delete []train_x[i];
delete []ppdLabel[i];
}
delete []train_x;
delete []ppdLabel;
cout<<endl;
}

hiddenLayer.h

[cpp] view plain copy

#ifndef HIDDENLAYER_H
#define HIDDENLAYER_H
#include "neuralbase.h"
class HiddenLayer: public NeuralBase
{
public:
HiddenLayer(int n_i, int n_o);
~HiddenLayer();
double* Forward_propagation(double* input_data);
void Back_propagation(double *pdInputData, double *pdNextLayerDelta,
double** ppdnextLayerW, int iNextLayerOutNum, double dLr);
};
#endif

hiddenLayer.cpp

[cpp] view plain copy

#include <cmath>
#include <cassert>
#include <cstdlib>
#include <ctime>
#include <iostream>
#include "hiddenLayer.h"
#include "util.h"
using namespace std;
HiddenLayer::HiddenLayer(int n_i, int n_o): NeuralBase(n_i, n_o, 0)
{
}
HiddenLayer::~HiddenLayer()
{
}
/************************************************************************/
/* 需要注意的是：
如果为了复现theano的测试结果，那么隐藏层的激活函数要选用tanh；
否则，为了mlp的训练过程，激活函数要选择sigmoid */
/************************************************************************/
double* HiddenLayer::Forward_propagation(double* pdInputData)
{
NeuralBase::Forward_propagation(pdInputData);
for(int i = 0; i < m_iOut; ++i)
{
// m_pdOutdata[i] = sigmoid(m_pdOutdata[i]);
m_pdOutdata[i] = mytanh(m_pdOutdata[i]);
}
return m_pdOutdata;
}
void HiddenLayer::Back_propagation(double *pdInputData, double *pdNextLayerDelta,
double** ppdnextLayerW, int iNextLayerOutNum, double dLr)
{
/*
pdInputData 为输入数据
*pdNextLayerDelta 为下一层的残差值delta,是一个大小为iNextLayerOutNum的数组
**ppdnextLayerW 为此层到下一层的权值
iNextLayerOutNum 实际上就是下一层的n_out
dLr 为学习率learning rate
m_iSampleNum 为训练样本总数
*/
//sigma元素个数应与本层单元个数一致，而网上代码有误
//作者是没有自己测试啊，测试啊
//double* sigma = new double[iNextLayerOutNum];
double* sigma = new double[m_iOut];
//double sigma[10];
for(int i = 0; i < m_iOut; ++i)
sigma[i] = 0.0;
for(int i = 0; i < iNextLayerOutNum; ++i)
{
for(int j = 0; j < m_iOut; ++j)
{
sigma[j] += ppdnextLayerW[i][j] * pdNextLayerDelta[i];
}
}
//计算得到本层的残差delta
for(int i = 0; i < m_iOut; ++i)
{
m_pdDelta[i] = sigma[i] * m_pdOutdata[i] * (1 - m_pdOutdata[i]);
}
//调整本层的权值w
for(int i = 0; i < m_iOut; ++i)
{
for(int j = 0; j < m_iInput; ++j)
{
m_ppdW[i][j] += dLr * m_pdDelta[i] * pdInputData[j];
}
m_pdBias[i] += dLr * m_pdDelta[i];
}
delete[] sigma;
}

logisticRegression.h

[cpp] view plain copy

#ifndef LOGISTICREGRESSIONLAYER
#define LOGISTICREGRESSIONLAYER
#include "neuralbase.h"
typedef unsigned short WORD;
class LogisticRegression: public NeuralBase
{
public:
LogisticRegression(int n_i, int i_o, int);
~LogisticRegression();
double* Forward_propagation(double* input_data);
void Softmax(double* x);
void Train(double *pdTrain, WORD usLabel, double dLr);
void SetOldWb(double ppdWeigh[][3], double arriBias[8]);
int Predict(double *);
void MakeLabels(int* piMax, double (*pplabels)[8]);
};
void Test_lr();
void Testwb();
void Test_theano(const int m_iInput, const int m_iOut);
#endif

logisticRegression.cpp

[cpp] view plain copy

#include <cmath>
#include <cassert>
#include <iomanip>
#include <ctime>
#include <iostream>
#include "logisticRegression.h"
#include "util.h"
using namespace std;
LogisticRegression::LogisticRegression(int n_i, int n_o, int n_t): NeuralBase(n_i, n_o, n_t)
{
}
LogisticRegression::~LogisticRegression()
{
}
void LogisticRegression::Softmax(double* x)
{
double _max = 0.0;
double _sum = 0.0;
for(int i = 0; i < m_iOut; ++i)
{
if(_max < x[i])
_max = x[i];
}
for(int i = 0; i < m_iOut; ++i)
{
x[i] = exp(x[i]-_max);
_sum += x[i];
}
for(int i = 0; i < m_iOut; ++i)
{
x[i] /= _sum;
}
}
double* LogisticRegression::Forward_propagation(double* pdinputdata)
{
NeuralBase::Forward_propagation(pdinputdata);
/************************************************************************/
/* 调试 */
/************************************************************************/
//cout << "Forward_propagation from LogisticRegression" << endl;
//PrintOutputData();
//cout << "over\n";
Softmax(m_pdOutdata);
return m_pdOutdata;
}
int LogisticRegression::Predict(double *pdtest)
{
Forward_propagation(pdtest);
/************************************************************************/
/* 调试使用 */
/************************************************************************/
//PrintOutputData();
int iResult = getMaxIndex(m_pdOutdata, m_iOut);
return iResult;
}
void LogisticRegression::Train(double *pdTrain, WORD usLabel, double dLr)
{
Forward_propagation(pdTrain);
double *pdLabel = new double[m_iOut];
MakeOneLabel(usLabel, pdLabel);
Back_propagation(pdTrain, pdLabel, dLr);
delete []pdLabel;
}
//double LogisticRegression::CalErrorRate(const vector<double*> &vecvalid, const vector<WORD> &vecValidlabel)
//{
// int iErrorNumber = 0, iValidNumber = vecValidlabel.size();
// for (int i = 0; i < iValidNumber; ++i)
// {
// int iResult = Predict(vecvalid[i]);
// if (iResult != vecValidlabel[i])
// {
// ++iErrorNumber;
// }
// }
// cout << "the num of error is " << iErrorNumber << endl;
// double dErrorRate = (double)iErrorNumber / iValidNumber;
// cout << "the error rate of Train sample by softmax is " << setprecision(10) << dErrorRate * 100 << "%" << endl;
// return dErrorRate;
//}
void LogisticRegression::SetOldWb(double ppdWeigh[][3], double arriBias[8])
{
for (int i = 0; i < m_iOut; ++i)
{
for (int j = 0; j < m_iInput; ++j)
m_ppdW[i][j] = ppdWeigh[i][j];
m_pdBias[i] = arriBias[i];
}
cout << "Setwb----------" << endl;
printArrDouble(m_ppdW, m_iOut, m_iInput);
printArr(m_pdBias, m_iOut);
}
//void LogisticRegression::TrainAllSample(const vector<double*> &vecTrain, const vector<WORD> &vectrainlabel, double dLr)
//{
// for (int j = 0; j < m_iSamplenum; ++j)
// {
// Train(vecTrain[j], vectrainlabel[j], dLr);
// }
//}
void LogisticRegression::MakeLabels(int* piMax, double (*pplabels)[8])
{
for (int i = 0; i < m_iSamplenum; ++i)
{
for (int j = 0; j < m_iOut; ++j)
pplabels[i][j] = 0;
int k = piMax[i];
pplabels[i][k] = 1.0;
}
}
void Test_theano(const int m_iInput, const int m_iOut)
{
// construct LogisticRegression
LogisticRegression classifier(m_iInput, m_iOut, 0);
vector<double*> vecTrain, vecvalid, vecw;
vector<double> vecb;
vector<WORD> vecValidlabel, vectrainlabel;
LoadTestSampleFromJson(vecvalid, vecValidlabel, "../.../../data/mnist.json", m_iInput);
LoadTestSampleFromJson(vecTrain, vectrainlabel, "../.../../data/mnisttrain.json", m_iInput);
// test
int itrainnum = vecTrain.size();
classifier.m_iSamplenum = itrainnum;
const int iepochs = 5;
const double dLr = 0.1;
for (int i = 0; i < iepochs; ++i)
{
classifier.TrainAllSample(vecTrain, vectrainlabel, dLr);
if (i % 2 == 0)
{
cout << "Predict------------" << i + 1 << endl;
classifier.CalErrorRate(vecvalid, vecValidlabel);
}
}
for (vector<double*>::iterator cit = vecTrain.begin(); cit != vecTrain.end(); ++cit)
{
delete [](*cit);
}
for (vector<double*>::iterator cit = vecvalid.begin(); cit != vecvalid.end(); ++cit)
{
delete [](*cit);
}
}
void Test_lr()
{
srand(0);
double learning_rate = 0.1;
double n_epochs = 200;
int test_N = 2;
const int trainNum = 8, m_iInput = 3, m_iOut = 8;
//int m_iOut = 2;
double train_X[trainNum][m_iInput] = {
{1, 1, 1},
{1, 1, 0},
{1, 0, 1},
{1, 0, 0},
{0, 1, 1},
{0, 1, 0},
{0, 0, 1},
{0, 0, 0}
};
//sziMax存储的是最大值的下标
int sziMax[trainNum];
for (int i = 0; i < trainNum; ++i)
sziMax[i] = trainNum - i - 1;
// construct LogisticRegression
LogisticRegression classifier(m_iInput, m_iOut, trainNum);
// Train online
for(int epoch=0; epoch<n_epochs; epoch++) {
for(int i=0; i<trainNum; i++) {
//classifier.trainEfficient(train_X[i], train_Y[i], learning_rate);
classifier.Train(train_X[i], sziMax[i], learning_rate);
}
}
const char *pcfile = "test.wb";
classifier.Writewb(pcfile);
LogisticRegression logistic(m_iInput, m_iOut, trainNum);
logistic.Readwb(pcfile, 0);
// test data
double test_X[2][m_iOut] = {
{1, 0, 1},
{0, 0, 1}
};
// test
cout << "before Readwb ---------" << endl;
for(int i=0; i<test_N; i++) {
classifier.Predict(test_X[i]);
cout << endl;
}
cout << "after Readwb ---------" << endl;
for(int i=0; i<trainNum; i++) {
logistic.Predict(train_X[i]);
cout << endl;
}
cout << "*********\n";
}
void Testwb()
{
// int test_N = 2;
const int trainNum = 8, m_iInput = 3, m_iOut = 8;
//int m_iOut = 2;
double train_X[trainNum][m_iInput] = {
{1, 1, 1},
{1, 1, 0},
{1, 0, 1},
{1, 0, 0},
{0, 1, 1},
{0, 1, 0},
{0, 0, 1},
{0, 0, 0}
};
double arriBias[m_iOut] = {1, 2, 3, 3, 3, 3, 2, 1};
// construct LogisticRegression
LogisticRegression classifier(m_iInput, m_iOut, trainNum);
classifier.SetOldWb(train_X, arriBias);
const char *pcfile = "test.wb";
classifier.Writewb(pcfile);
LogisticRegression logistic(m_iInput, m_iOut, trainNum);
logistic.Readwb(pcfile, 0);
}

neuralbase.h

[cpp] view plain copy

#ifndef NEURALBASE_H
#define NEURALBASE_H
#include <vector>
using std::vector;
typedef unsigned short WORD;
class NeuralBase
{
public:
NeuralBase(int , int , int);
virtual ~NeuralBase();
virtual double* Forward_propagation(double* );
virtual void Back_propagation(double* , double* , double );
virtual void Train(double *x, WORD y, double dLr);
virtual int Predict(double *);
void Callbackwb();
void MakeOneLabel(int iMax, double *pdLabel);
void TrainAllSample(const vector<double*> &vecTrain, const vector<WORD> &vectrainlabel, double dLr);
double CalErrorRate(const vector<double*> &vecvalid, const vector<WORD> &vecValidlabel);
void Printwb();
void Writewb(const char *szName);
long Readwb(const char *szName, long);
void Setwb(vector<double*> &vpdw, vector<double> &vdb);
void PrintOutputData();
int m_iInput;
int m_iOut;
int m_iSamplenum;
double** m_ppdW;
double* m_pdBias;
//本层前向传播的输出值，也是最终的预测值
double* m_pdOutdata;
//反向传播时所需值
double* m_pdDelta;
private:
void _callbackwb();
};
#endif // NEURALBASE_H

neuralbase.cpp

[cpp] view plain copy

#include "neuralbase.h"
#include <cmath>
#include <cassert>
#include <ctime>
#include <iomanip>
#include <iostream>
#include "util.h"
using namespace std;
NeuralBase::NeuralBase(int n_i, int n_o, int n_t):m_iInput(n_i), m_iOut(n_o), m_iSamplenum(n_t)
{
m_ppdW = new double* [m_iOut];
for(int i = 0; i < m_iOut; ++i)
{
m_ppdW[i] = new double [m_iInput];
}
m_pdBias = new double [m_iOut];
double a = 1.0 / m_iInput;
srand((unsigned)time(NULL));
for(int i = 0; i < m_iOut; ++i)
{
for(int j = 0; j < m_iInput; ++j)
m_ppdW[i][j] = uniform(-a, a);
m_pdBias[i] = uniform(-a, a);
}
m_pdDelta = new double [m_iOut];
m_pdOutdata = new double [m_iOut];
}
NeuralBase::~NeuralBase()
{
Callbackwb();
delete[] m_pdOutdata;
delete[] m_pdDelta;
}
void NeuralBase::Callbackwb()
{
_callbackwb();
}
double NeuralBase::CalErrorRate(const vector<double *> &vecvalid, const vector<WORD> &vecValidlabel)
{
int iErrorNumber = 0, iValidNumber = vecValidlabel.size();
for (int i = 0; i < iValidNumber; ++i)
{
int iResult = Predict(vecvalid[i]);
if (iResult != vecValidlabel[i])
{
++iErrorNumber;
}
}
cout << "the num of error is " << iErrorNumber << endl;
double dErrorRate = (double)iErrorNumber / iValidNumber;
cout << "the error rate of Train sample by softmax is " << setprecision(10) << dErrorRate * 100 << "%" << endl;
return dErrorRate;
}
int NeuralBase::Predict(double *)
{
cout << "NeuralBase::Predict(double *)" << endl;
return 0;
}
void NeuralBase::_callbackwb()
{
for(int i=0; i < m_iOut; i++)
delete []m_ppdW[i];
delete[] m_ppdW;
delete[] m_pdBias;
}
void NeuralBase::Printwb()
{
cout << "'****m_ppdW****\n";
for(int i = 0; i < m_iOut; ++i)
{
for(int j = 0; j < m_iInput; ++j)
cout << m_ppdW[i][j] << ' ';
cout << endl;
}
cout << "'****m_pdBias****\n";
for(int i = 0; i < m_iOut; ++i)
{
cout << m_pdBias[i] << ' ';
}
cout << endl;
cout << "'****output****\n";
for(int i = 0; i < m_iOut; ++i)
{
cout << m_pdOutdata[i] << ' ';
}
cout << endl;
}
double* NeuralBase::Forward_propagation(double* input_data)
{
for(int i = 0; i < m_iOut; ++i)
{
m_pdOutdata[i] = 0.0;
for(int j = 0; j < m_iInput; ++j)
{
m_pdOutdata[i] += m_ppdW[i][j]*input_data[j];
}
m_pdOutdata[i] += m_pdBias[i];
}
return m_pdOutdata;
}
void NeuralBase::Back_propagation(double* input_data, double* pdlabel, double dLr)
{
for(int i = 0; i < m_iOut; ++i)
{
m_pdDelta[i] = pdlabel[i] - m_pdOutdata[i] ;
for(int j = 0; j < m_iInput; ++j)
{
m_ppdW[i][j] += dLr * m_pdDelta[i] * input_data[j] / m_iSamplenum;
}
m_pdBias[i] += dLr * m_pdDelta[i] / m_iSamplenum;
}
}
void NeuralBase::MakeOneLabel(int imax, double *pdlabel)
{
for (int j = 0; j < m_iOut; ++j)
pdlabel[j] = 0;
pdlabel[imax] = 1.0;
}
void NeuralBase::Writewb(const char *szName)
{
savewb(szName, m_ppdW, m_pdBias, m_iOut, m_iInput);
}
long NeuralBase::Readwb(const char *szName, long dstartpos)
{
return loadwb(szName, m_ppdW, m_pdBias, m_iOut, m_iInput, dstartpos);
}
void NeuralBase::Setwb(vector<double*> &vpdw, vector<double> &vdb)
{
assert(vpdw.size() == (DWORD)m_iOut);
for (int i = 0; i < m_iOut; ++i)
{
delete []m_ppdW[i];
m_ppdW[i] = vpdw[i];
m_pdBias[i] = vdb[i];
}
}
void NeuralBase::TrainAllSample(const vector<double *> &vecTrain, const vector<WORD> &vectrainlabel, double dLr)
{
for (int j = 0; j < m_iSamplenum; ++j)
{
Train(vecTrain[j], vectrainlabel[j], dLr);
}
}
void NeuralBase::Train(double *x, WORD y, double dLr)
{
(void)x;
(void)y;
(void)dLr;
cout << "NeuralBase::Train(double *x, WORD y, double dLr)" << endl;
}
void NeuralBase::PrintOutputData()
{
for (int i = 0; i < m_iOut; ++i)
{
cout << m_pdOutdata[i] << ' ';
}
cout << endl;
}

util.h

[cpp] view plain copy

#ifndef UTIL_H
#define UTIL_H
#include <iostream>
#include <cstdio>
#include <cstdlib>
#include <ctime>
#include <vector>
using namespace std;
typedef unsigned char BYTE;
typedef unsigned short WORD;
typedef unsigned int DWORD;
double sigmoid(double x);
double mytanh(double dx);
typedef struct stShapeWb
{
stShapeWb(int w, int h):width(w), height(h){}
int width;
int height;
}ShapeWb_S;
void MakeOneLabel(int iMax, double *pdLabel, int m_iOut);
double uniform(double _min, double _max);
//void printArr(T *parr, int num);
//void printArrDouble(double **pparr, int row, int col);
void initArr(double *parr, int num);
int getMaxIndex(double *pdarr, int num);
void Printivec(const vector<int> &ivec);
void savewb(const char *szName, double **m_ppdW, double *m_pdBias,
int irow, int icol);
long loadwb(const char *szName, double **m_ppdW, double *m_pdBias,
int irow, int icol, long dstartpos);
void TestLoadJson(const char *pcfilename);
bool LoadvtFromJson(vector<double*> &vecTrain, vector<WORD> &vecLabel, const char *filename, const int m_iInput);
bool LoadwbFromJson(vector<double*> &vecTrain, vector<double> &vecLabel, const char *filename, const int m_iInput);
bool LoadTestSampleFromJson(vector<double*> &vecTrain, vector<WORD> &vecLabel, const char *filename, const int m_iInput);
bool LoadwbByByte(vector<double*> &vecTrain, vector<double> &vecLabel, const char *filename, const int m_iInput);
bool LoadallwbByByte(vector< vector<double*> > &vvAllw, vector< vector<double> > &vvAllb, const char *filename,
const int m_iInput, const int ihidden, const int m_iOut);
bool LoadWeighFromJson(vector< vector<double*> > &vvAllw, vector< vector<double> > &vvAllb,
const char *filename, const vector<int> &vecSecondDimOfWeigh);
void MakeCnnSample(double arr[2][64], double *pdImage, int iImageWidth, int iNumOfImage );
void MakeCnnWeigh(double *, int iNumOfKernel);
template <typename T>
void printArr(T *parr, int num)
{
cout << "****printArr****" << endl;
for (int i = 0; i < num; ++i)
cout << parr[i] << ' ';
cout << endl;
}
template <typename T>
void printArrDouble(T **pparr, int row, int col)
{
cout << "****printArrDouble****" << endl;
for (int i = 0; i < row; ++i)
{
for (int j = 0; j < col; ++j)
{
cout << pparr[i][j] << ' ';
}
cout << endl;
}
}
#endif

util.cpp

[cpp] view plain copy

#include "util.h"
#include <iostream>
#include <ctime>
#include <cmath>
#include <cassert>
#include <fstream>
#include <cstring>
#include <stack>
#include <iomanip>
using namespace std;
int getMaxIndex(double *pdarr, int num)
{
double dmax = -1;
int iMax = -1;
for(int i = 0; i < num; ++i)
{
if (pdarr[i] > dmax)
{
dmax = pdarr[i];
iMax = i;
}
}
return iMax;
}
double sigmoid(double dx)
{
return 1.0/(1.0+exp(-dx));
}
double mytanh(double dx)
{
double e2x = exp(2 * dx);
return (e2x - 1) / (e2x + 1);
}
double uniform(double _min, double _max)
{
return rand()/(RAND_MAX + 1.0) * (_max - _min) + _min;
}
void initArr(double *parr, int num)
{
for (int i = 0; i < num; ++i)
parr[i] = 0.0;
}
void savewb(const char *szName, double **m_ppdW, double *m_pdBias,
int irow, int icol)
{
FILE *pf;
if( (pf = fopen(szName, "ab" )) == NULL )
{
printf( "File coulkd not be opened " );
return;
}
int isizeofelem = sizeof(double);
for (int i = 0; i < irow; ++i)
{
if (fwrite((const void*)m_ppdW[i], isizeofelem, icol, pf) != icol)
{
fputs ("Writing m_ppdW error",stderr);
return;
}
}
if (fwrite((const void*)m_pdBias, isizeofelem, irow, pf) != irow)
{
fputs ("Writing m_ppdW error",stderr);
return;
}
fclose(pf);
}
long loadwb(const char *szName, double **m_ppdW, double *m_pdBias,
int irow, int icol, long dstartpos)
{
FILE *pf;
long dtotalbyte = 0, dreadsize;
if( (pf = fopen(szName, "rb" )) == NULL )
{
printf( "File coulkd not be opened " );
return -1;
}
//让文件指针偏移到正确位置
fseek(pf, dstartpos , SEEK_SET);
int isizeofelem = sizeof(double);
for (int i = 0; i < irow; ++i)
{
dreadsize = fread((void*)m_ppdW[i], isizeofelem, icol, pf);
if (dreadsize != icol)
{
fputs ("Reading m_ppdW error",stderr);
return -1;
}
//每次成功读取，都要加到dtotalbyte中，最后返回
dtotalbyte += dreadsize;
}
dreadsize = fread(m_pdBias, isizeofelem, irow, pf);
if (dreadsize != irow)
{
fputs ("Reading m_pdBias error",stderr);
return -1;
}
dtotalbyte += dreadsize;
dtotalbyte *= isizeofelem;
fclose(pf);
return dtotalbyte;
}
void Printivec(const vector<int> &ivec)
{
for (vector<int>::const_iterator it = ivec.begin(); it != ivec.end(); ++it)
{
cout << *it << ' ';
}
cout << endl;
}
void TestLoadJson(const char *pcfilename)
{
vector<double *> vpdw;
vector<double> vdb;
vector< vector<double*> > vvAllw;
vector< vector<double> > vvAllb;
int m_iInput = 28 * 28, ihidden = 500, m_iOut = 10;
LoadallwbByByte(vvAllw, vvAllb, pcfilename, m_iInput, ihidden, m_iOut );
}
//read vt from mnist, format is [[[], [],..., []],[1, 3, 5,..., 7]]
bool LoadvtFromJson(vector<double*> &vecTrain, vector<WORD> &vecLabel, const char *filename, const int m_iInput)
{
cout << "loadvtFromJson" << endl;
const int ciStackSize = 10;
const int ciFeaturesize = m_iInput;
int arriStack[ciStackSize], iTop = -1;
ifstream ifs;
ifs.open(filename, ios::in);
assert(ifs.is_open());
BYTE ucRead, ucLeftbrace, ucRightbrace, ucComma, ucSpace;
ucLeftbrace = '[';
ucRightbrace = ']';
ucComma = ',';
ucSpace = '0';
ifs >> ucRead;
assert(ucRead == ucLeftbrace);
//栈中全部存放左括号，用1代表,0说明清除
arriStack[++iTop] = 1;
//样本train开始
ifs >> ucRead;
assert(ucRead == ucLeftbrace);
arriStack[++iTop] = 1;//iTop is 1
int iIndex;
bool isdigit = false;
double dread, *pdvt;
//load vt sample
while (iTop > 0)
{
if (isdigit == false)
{
ifs >> ucRead;
isdigit = true;
if (ucRead == ucComma)
{
//next char is space or leftbrace
// ifs >> ucRead;
isdigit = false;
continue;
}
if (ucRead == ucSpace)
{
//if pdvt is null, next char is leftbrace;
//else next char is double value
if (pdvt == NULL)
isdigit = false;
continue;
}
if (ucRead == ucLeftbrace)
{
pdvt = new double[ciFeaturesize];
memset(pdvt, 0, ciFeaturesize * sizeof(double));
//iIndex数组下标
iIndex = 0;
arriStack[++iTop] = 1;
continue;
}
if (ucRead == ucRightbrace)
{
if (pdvt != NULL)
{
assert(iIndex == ciFeaturesize);
vecTrain.push_back(pdvt);
pdvt = NULL;
}
isdigit = false;
arriStack[iTop--] = 0;
continue;
}
}
else
{
ifs >> dread;
pdvt[iIndex++] = dread;
isdigit = false;
}
};
//next char is dot
ifs >> ucRead;
assert(ucRead == ucComma);
cout << vecTrain.size() << endl;
//读取label
WORD usread;
isdigit = false;
while (iTop > -1 && ifs.eof() == false)
{
if (isdigit == false)
{
ifs >> ucRead;
isdigit = true;
if (ucRead == ucComma)
{
//next char is space or leftbrace
// ifs >> ucRead;
// isdigit = false;
continue;
}
if (ucRead == ucSpace)
{
//if pdvt is null, next char is leftbrace;
//else next char is double value
if (pdvt == NULL)
isdigit = false;
continue;
}
if (ucRead == ucLeftbrace)
{
arriStack[++iTop] = 1;
continue;
}
//右括号的下一个字符是右括号（最后一个字符）
if (ucRead == ucRightbrace)
{
isdigit = false;
arriStack[iTop--] = 0;
continue;
}
}
else
{
ifs >> usread;
vecLabel.push_back(usread);
isdigit = false;
}
};
assert(vecLabel.size() == vecTrain.size());
assert(iTop == -1);
ifs.close();
return true;
}
bool testjsonfloat(const char *filename)
{
vector<double> vecTrain;
cout << "testjsondouble" << endl;
const int ciStackSize = 10;
int arriStack[ciStackSize], iTop = -1;
ifstream ifs;
ifs.open(filename, ios::in);
assert(ifs.is_open());
BYTE ucRead, ucLeftbrace, ucRightbrace, ucComma;
ucLeftbrace = '[';
ucRightbrace = ']';
ucComma = ',';
ifs >> ucRead;
assert(ucRead == ucLeftbrace);
//栈中全部存放左括号，用1代表,0说明清除
arriStack[++iTop] = 1;
//样本train开始
ifs >> ucRead;
assert(ucRead == ucLeftbrace);
arriStack[++iTop] = 1;//iTop is 1
double fread;
bool isdigit = false;
while (iTop > -1)
{
if (isdigit == false)
{
ifs >> ucRead;
isdigit = true;
if (ucRead == ucComma)
{
//next char is space or leftbrace
// ifs >> ucRead;
isdigit = false;
continue;
}
if (ucRead == ' ')
continue;
if (ucRead == ucLeftbrace)
{
arriStack[++iTop] = 1;
continue;
}
if (ucRead == ucRightbrace)
{
isdigit = false;
//右括号的下一个字符是右括号（最后一个字符）
arriStack[iTop--] = 0;
continue;
}
}
else
{
ifs >> fread;
vecTrain.push_back(fread);
isdigit = false;
}
}
ifs.close();
return true;
}
bool LoadwbFromJson(vector<double*> &vecTrain, vector<double> &vecLabel, const char *filename, const int m_iInput)
{
cout << "loadvtFromJson" << endl;
const int ciStackSize = 10;
const int ciFeaturesize = m_iInput;
int arriStack[ciStackSize], iTop = -1;
ifstream ifs;
ifs.open(filename, ios::in);
assert(ifs.is_open());
BYTE ucRead, ucLeftbrace, ucRightbrace, ucComma, ucSpace;
ucLeftbrace = '[';
ucRightbrace = ']';
ucComma = ',';
ucSpace = '0';
ifs >> ucRead;
assert(ucRead == ucLeftbrace);
//栈中全部存放左括号，用1代表,0说明清除
arriStack[++iTop] = 1;
//样本train开始
ifs >> ucRead;
assert(ucRead == ucLeftbrace);
arriStack[++iTop] = 1;//iTop is 1
int iIndex;
bool isdigit = false;
double dread, *pdvt;
//load vt sample
while (iTop > 0)
{
if (isdigit == false)
{
ifs >> ucRead;
isdigit = true;
if (ucRead == ucComma)
{
//next char is space or leftbrace
// ifs >> ucRead;
isdigit = false;
continue;
}
if (ucRead == ucSpace)
{
//if pdvt is null, next char is leftbrace;
//else next char is double value
if (pdvt == NULL)
isdigit = false;
continue;
}
if (ucRead == ucLeftbrace)
{
pdvt = new double[ciFeaturesize];
memset(pdvt, 0, ciFeaturesize * sizeof(double));
//iIndex数组下标
iIndex = 0;
arriStack[++iTop] = 1;
continue;
}
if (ucRead == ucRightbrace)
{
if (pdvt != NULL)
{
assert(iIndex == ciFeaturesize);
vecTrain.push_back(pdvt);
pdvt = NULL;
}
isdigit = false;
arriStack[iTop--] = 0;
continue;
}
}
else
{
ifs >> dread;
pdvt[iIndex++] = dread;
isdigit = false;
}
};
//next char is dot
ifs >> ucRead;
assert(ucRead == ucComma);
cout << vecTrain.size() << endl;
//读取label
double usread;
isdigit = false;
while (iTop > -1 && ifs.eof() == false)
{
if (isdigit == false)
{
ifs >> ucRead;
isdigit = true;
if (ucRead == ucComma)
{
//next char is space or leftbrace
// ifs >> ucRead;
// isdigit = false;
continue;
}
if (ucRead == ucSpace)
{
//if pdvt is null, next char is leftbrace;
//else next char is double value
if (pdvt == NULL)
isdigit = false;
continue;
}
if (ucRead == ucLeftbrace)
{
arriStack[++iTop] = 1;
continue;
}
//右括号的下一个字符是右括号（最后一个字符）
if (ucRead == ucRightbrace)
{
isdigit = false;
arriStack[iTop--] = 0;
continue;
}
}
else
{
ifs >> usread;
vecLabel.push_back(usread);
isdigit = false;
}
};
assert(vecLabel.size() == vecTrain.size());
assert(iTop == -1);
ifs.close();
return true;
}
bool vec2double(vector<BYTE> &vecDigit, double &dvalue)
{
if (vecDigit.empty())
return false;
int ivecsize = vecDigit.size();
const int iMaxlen = 50;
char szdigit[iMaxlen];
assert(iMaxlen > ivecsize);
memset(szdigit, 0, iMaxlen);
int i;
for (i = 0; i < ivecsize; ++i)
szdigit[i] = vecDigit[i];
szdigit[i++] = '\0';
vecDigit.clear();
dvalue = atof(szdigit);
return true;
}
bool vec2short(vector<BYTE> &vecDigit, WORD &usvalue)
{
if (vecDigit.empty())
return false;
int ivecsize = vecDigit.size();
const int iMaxlen = 50;
char szdigit[iMaxlen];
assert(iMaxlen > ivecsize);
memset(szdigit, 0, iMaxlen);
int i;
for (i = 0; i < ivecsize; ++i)
szdigit[i] = vecDigit[i];
szdigit[i++] = '\0';
vecDigit.clear();
usvalue = atoi(szdigit);
return true;
}
void readDigitFromJson(ifstream &ifs, vector<double*> &vecTrain, vector<WORD> &vecLabel,
vector<BYTE> &vecDigit, double *&pdvt, int &iIndex,
const int ciFeaturesize, int *arrStack, int &iTop, bool bFirstlist)
{
BYTE ucRead;
WORD usvalue;
double dvalue;
const BYTE ucLeftbrace = '[', ucRightbrace = ']', ucComma = ',', ucSpace = ' ';
ifs.read((char*)(&ucRead), 1);
switch (ucRead)
{
case ucLeftbrace:
{
if (bFirstlist)
{
pdvt = new double[ciFeaturesize];
memset(pdvt, 0, ciFeaturesize * sizeof(double));
iIndex = 0;
}
arrStack[++iTop] = 1;
break;
}
case ucComma:
{
//next char is space or leftbrace
if (bFirstlist)
{
if (vecDigit.empty() == false)
{
vec2double(vecDigit, dvalue);
pdvt[iIndex++] = dvalue;
}
}
else
{
if(vec2short(vecDigit, usvalue))
vecLabel.push_back(usvalue);
}
break;
}
case ucSpace:
break;
case ucRightbrace:
{
if (bFirstlist)
{
if (pdvt != NULL)
{
vec2double(vecDigit, dvalue);
pdvt[iIndex++] = dvalue;
vecTrain.push_back(pdvt);
pdvt = NULL;
}
assert(iIndex == ciFeaturesize);
}
else
{
if(vec2short(vecDigit, usvalue))
vecLabel.push_back(usvalue);
}
arrStack[iTop--] = 0;
break;
}
default:
{
vecDigit.push_back(ucRead);
break;
}
}
}
void readDoubleFromJson(ifstream &ifs, vector<double*> &vecTrain, vector<double> &vecLabel,
vector<BYTE> &vecDigit, double *&pdvt, int &iIndex,
const int ciFeaturesize, int *arrStack, int &iTop, bool bFirstlist)
{
BYTE ucRead;
double dvalue;
const BYTE ucLeftbrace = '[', ucRightbrace = ']', ucComma = ',', ucSpace = ' ';
ifs.read((char*)(&ucRead), 1);
switch (ucRead)
{
case ucLeftbrace:
{
if (bFirstlist)
{
pdvt = new double[ciFeaturesize];
memset(pdvt, 0, ciFeaturesize * sizeof(double));
iIndex = 0;
}
arrStack[++iTop] = 1;
break;
}
case ucComma:
{
//next char is space or leftbrace
if (bFirstlist)
{
if (vecDigit.empty() == false)
{
vec2double(vecDigit, dvalue);
pdvt[iIndex++] = dvalue;
}
}
else
{
if(vec2double(vecDigit, dvalue))
vecLabel.push_back(dvalue);
}
break;
}
case ucSpace:
break;
case ucRightbrace:
{
if (bFirstlist)
{
if (pdvt != NULL)
{
vec2double(vecDigit, dvalue);
pdvt[iIndex++] = dvalue;
vecTrain.push_back(pdvt);
pdvt = NULL;
}
assert(iIndex == ciFeaturesize);
}
else
{
if(vec2double(vecDigit, dvalue))
vecLabel.push_back(dvalue);
}
arrStack[iTop--] = 0;
break;
}
default:
{
vecDigit.push_back(ucRead);
break;
}
}
}
bool LoadallwbByByte(vector< vector<double*> > &vvAllw, vector< vector<double> > &vvAllb, const char *filename,
const int m_iInput, const int ihidden, const int m_iOut)
{
cout << "LoadallwbByByte" << endl;
const int szistsize = 10;
int ciFeaturesize = m_iInput;
const BYTE ucLeftbrace = '[', ucRightbrace = ']', ucComma = ',', ucSpace = ' ';
int arrStack[szistsize], iTop = -1, iIndex = 0;
ifstream ifs;
ifs.open(filename, ios::in | ios::binary);
assert(ifs.is_open());
double *pdvt;
BYTE ucRead;
ifs.read((char*)(&ucRead), 1);
assert(ucRead == ucLeftbrace);
//栈中全部存放左括号，用1代表,0说明清除
arrStack[++iTop] = 1;
ifs.read((char*)(&ucRead), 1);
assert(ucRead == ucLeftbrace);
arrStack[++iTop] = 1;//iTop is 1
ifs.read((char*)(&ucRead), 1);
assert(ucRead == ucLeftbrace);
arrStack[++iTop] = 1;//iTop is 2
vector<BYTE> vecDigit;
vector<double *> vpdw;
vector<double> vdb;
while (iTop > 1 && ifs.eof() == false)
{
readDoubleFromJson(ifs, vpdw, vdb, vecDigit, pdvt, iIndex, m_iInput, arrStack, iTop, true);
};
//next char is dot
ifs.read((char*)(&ucRead), 1);;
assert(ucRead == ucComma);
cout << vpdw.size() << endl;
//next char is space
while (iTop > 0 && ifs.eof() == false)
{
readDoubleFromJson(ifs, vpdw, vdb, vecDigit, pdvt, iIndex, m_iInput, arrStack, iTop, false);
};
assert(vpdw.size() == vdb.size());
assert(iTop == 0);
vvAllw.push_back(vpdw);
vvAllb.push_back(vdb);
//clear vpdw and pdb 's contents
vpdw.clear();
vdb.clear();
//next char is comma
ifs.read((char*)(&ucRead), 1);;
assert(ucRead == ucComma);
//next char is space
ifs.read((char*)(&ucRead), 1);;
assert(ucRead == ucSpace);
ifs.read((char*)(&ucRead), 1);
assert(ucRead == ucLeftbrace);
arrStack[++iTop] = 1;//iTop is 1
ifs.read((char*)(&ucRead), 1);
assert(ucRead == ucLeftbrace);
arrStack[++iTop] = 1;//iTop is 2
while (iTop > 1 && ifs.eof() == false)
{
readDoubleFromJson(ifs, vpdw, vdb, vecDigit, pdvt, iIndex, ihidden, arrStack, iTop, true);
};
//next char is dot
ifs.read((char*)(&ucRead), 1);;
assert(ucRead == ucComma);
cout << vpdw.size() << endl;
//next char is space
while (iTop > -1 && ifs.eof() == false)
{
readDoubleFromJson(ifs, vpdw, vdb, vecDigit, pdvt, iIndex, ihidden, arrStack, iTop, false);
};
assert(vpdw.size() == vdb.size());
assert(iTop == -1);
vvAllw.push_back(vpdw);
vvAllb.push_back(vdb);
//clear vpdw and pdb 's contents
vpdw.clear();
vdb.clear();
//close file
ifs.close();
return true;
}
bool LoadWeighFromJson(vector< vector<double*> > &vvAllw, vector< vector<double> > &vvAllb,
const char *filename, const vector<int> &vecSecondDimOfWeigh)
{
cout << "LoadWeighFromJson" << endl;
const int szistsize = 10;
const BYTE ucLeftbrace = '[', ucRightbrace = ']', ucComma = ',', ucSpace = ' ';
int arrStack[szistsize], iTop = -1, iIndex = 0;
ifstream ifs;
ifs.open(filename, ios::in | ios::binary);
assert(ifs.is_open());
double *pdvt;
BYTE ucRead;
ifs.read((char*)(&ucRead), 1);
assert(ucRead == ucLeftbrace);
//栈中全部存放左括号，用1代表,0说明清除
arrStack[++iTop] = 1;
ifs.read((char*)(&ucRead), 1);
assert(ucRead == ucLeftbrace);
arrStack[++iTop] = 1;//iTop is 1
ifs.read((char*)(&ucRead), 1);
assert(ucRead == ucLeftbrace);
arrStack[++iTop] = 1;//iTop is 2
int iVecWeighSize = vecSecondDimOfWeigh.size();
vector<BYTE> vecDigit;
vector<double *> vpdw;
vector<double> vdb;
//读取iVecWeighSize个[w,b]
for (int i = 0; i < iVecWeighSize; ++i)
{
int iDimesionOfWeigh = vecSecondDimOfWeigh[i];
while (iTop > 1 && ifs.eof() == false)
{
readDoubleFromJson(ifs, vpdw, vdb, vecDigit, pdvt, iIndex, iDimesionOfWeigh, arrStack, iTop, true);
};
//next char is dot
ifs.read((char*)(&ucRead), 1);;
assert(ucRead == ucComma);
cout << vpdw.size() << endl;
//next char is space
while (iTop > 0 && ifs.eof() == false)
{
readDoubleFromJson(ifs, vpdw, vdb, vecDigit, pdvt, iIndex, iDimesionOfWeigh, arrStack, iTop, false);
};
assert(vpdw.size() == vdb.size());
assert(iTop == 0);
vvAllw.push_back(vpdw);
vvAllb.push_back(vdb);
//clear vpdw and pdb 's contents
vpdw.clear();
vdb.clear();
//如果最后一对[w,b]读取完毕，就退出，下一个字符是右括号']'
if (i >= iVecWeighSize - 1)
{
break;
}
//next char is comma
ifs.read((char*)(&ucRead), 1);;
assert(ucRead == ucComma);
//next char is space
ifs.read((char*)(&ucRead), 1);;
assert(ucRead == ucSpace);
ifs.read((char*)(&ucRead), 1);
assert(ucRead == ucLeftbrace);
arrStack[++iTop] = 1;//iTop is 1
ifs.read((char*)(&ucRead), 1);
assert(ucRead == ucLeftbrace);
arrStack[++iTop] = 1;//iTop is 2
}
ifs.read((char*)(&ucRead), 1);;
assert(ucRead == ucRightbrace);
--iTop;
assert(iTop == -1);
//close file
ifs.close();
return true;
}
//read vt from mnszist, format is [[[], [],..., []],[1, 3, 5,..., 7]]
bool LoadTestSampleFromJson(vector<double*> &vecTrain, vector<WORD> &vecLabel, const char *filename, const int m_iInput)
{
cout << "LoadTestSampleFromJson" << endl;
const int szistsize = 10;
const int ciFeaturesize = m_iInput;
const BYTE ucLeftbrace = '[', ucRightbrace = ']', ucComma = ',', ucSpace = ' ';
int arrStack[szistsize], iTop = -1, iIndex = 0;
ifstream ifs;
ifs.open(filename, ios::in | ios::binary);
assert(ifs.is_open());
double *pdvt;
BYTE ucRead;
ifs.read((char*)(&ucRead), 1);
assert(ucRead == ucLeftbrace);
//栈中全部存放左括号，用1代表,0说明清除
arrStack[++iTop] = 1;
ifs.read((char*)(&ucRead), 1);
assert(ucRead == ucLeftbrace);
arrStack[++iTop] = 1;//iTop is 1
vector<BYTE> vecDigit;
while (iTop > 0 && ifs.eof() == false)
{
readDigitFromJson(ifs, vecTrain, vecLabel, vecDigit, pdvt, iIndex, ciFeaturesize, arrStack, iTop, true);
};
//next char is dot
ifs >> ucRead;
assert(ucRead == ucComma);
cout << vecTrain.size() << endl;
//next char is space
// ifs.read((char*)(&ucRead), 1);
// ifs.read((char*)(&ucRead), 1);
// assert(ucRead == ucLeftbrace);
while (iTop > -1 && ifs.eof() == false)
{
readDigitFromJson(ifs, vecTrain, vecLabel, vecDigit, pdvt, iIndex, ciFeaturesize, arrStack, iTop, false);
};
assert(vecLabel.size() == vecTrain.size());
assert(iTop == -1);
ifs.close();
return true;
}
void MakeOneLabel(int iMax, double *pdLabel, int m_iOut)
{
for (int j = 0; j < m_iOut; ++j)
pdLabel[j] = 0;
pdLabel[iMax] = 1.0;
}
void MakeCnnSample(double arrInput[2][64], double *pdImage, int iImageWidth, int iNumOfImage)
{
int iImageSize = iImageWidth * iImageWidth;
for (int k = 0; k < iNumOfImage; ++k)
{
int iStart = k *iImageSize;
for (int i = 0; i < iImageWidth; ++i)
{
for (int j = 0; j < iImageWidth; ++j)
{
int iIndex = iStart + i * iImageWidth + j;
pdImage[iIndex] = 1;
pdImage[iIndex] += i + j;
if (k > 0)
pdImage[iIndex] -= 1;
arrInput[k][i * iImageWidth +j] = pdImage[iIndex];
//pdImage[iIndex] /= 15.0 ;
}
}
}
cout << "input image is\n";
for (int k = 0; k < iNumOfImage; ++k)
{
int iStart = k *iImageSize;
cout << "k is " << k <<endl;
for (int i = 0; i < iImageWidth; ++i)
{
for (int j = 0; j < iImageWidth; ++j)
{
int iIndex = i * iImageWidth + j;
double dValue = arrInput[k][iIndex];
cout << dValue << ' ';
}
cout << endl;
}
cout << endl;
}
cout << endl;
}
void MakeCnnWeigh(double *pdKernel, int iNumOfKernel)
{
const int iKernelWidth = 3;
double iSum = 0;
double arrKernel[iKernelWidth][iKernelWidth] = {{4, 7, 1},
{3, 8, 5},
{3, 2, 3}};
double arr2[iKernelWidth][iKernelWidth] = {{6, 5, 4},
{5, 4, 3},
{4, 3, 2}};
for (int k = 0; k < iNumOfKernel; ++k)
{
int iStart = k * iKernelWidth * iKernelWidth;
for (int i = 0; i < iKernelWidth; ++i)
{
for (int j = 0; j < iKernelWidth; ++j)
{
int iIndex = i * iKernelWidth + j + iStart;
pdKernel[iIndex] = i + j + 2;
if (k > 0)
pdKernel[iIndex] = arrKernel[i][j];
iSum += pdKernel[iIndex];
}
}
}
cout << "sum is " << iSum << endl;
for (int k = 0; k < iNumOfKernel; ++k)
{
cout << "kernel :" << k << endl;
int iStart = k * iKernelWidth * iKernelWidth;
for (int i = 0; i < iKernelWidth; ++i)
{
for (int j = 0; j < iKernelWidth; ++j)
{
int iIndex = i * iKernelWidth + j + iStart;
//pdKernel[iIndex] /= (double)iSum;
cout << pdKernel[iIndex] << ' ';
}
cout << endl;
}
cout << endl;
}
cout << endl;
}

训练两轮，生成theano权值的代码

cnn_mlp_theano.py

[python] view plain copy

#coding=utf-8
import cPickle
import gzip
import os
import sys
import time
import json
import numpy
import theano
import theano.tensor as T
from theano.tensor.signal import downsample
from theano.tensor.nnet import conv
from logistic_sgd import LogisticRegression, load_data
from mlp import HiddenLayer
class LeNetConvPoolLayer(object):
"""Pool Layer of a convolutional network """
def __init__(self, rng, input, filter_shape, image_shape, poolsize=(2, 2)):
"""
Allocate a LeNetConvPoolLayer with shared variable internal parameters.
:type rng: numpy.random.RandomState
:param rng: a random number generator used to initialize weights
:type input: theano.tensor.dtensor4
:param input: symbolic image tensor, of shape image_shape
:type filter_shape: tuple or list of length 4
:param filter_shape: (number of filters, num input feature maps,
filter height,filter width)
:type image_shape: tuple or list of length 4
:param image_shape: (batch size, num input feature maps,
image height, image width)
:type poolsize: tuple or list of length 2
:param poolsize: the downsampling (pooling) factor (#rows,#cols)
"""
assert image_shape[1] == filter_shape[1]
self.input = input
# there are "num input feature maps * filter height * filter width"
# inputs to each hidden unit
fan_in = numpy.prod(filter_shape[1:])
# each unit in the lower layer receives a gradient from:
# "num output feature maps * filter height * filter width" /
# pooling size
fan_out = (filter_shape[0] * numpy.prod(filter_shape[2:]) /
numpy.prod(poolsize))
# initialize weights with random weights
W_bound = numpy.sqrt(6. / (fan_in + fan_out))
self.W = theano.shared(numpy.asarray(
rng.uniform(low=-W_bound, high=W_bound, size=filter_shape),
dtype=theano.config.floatX),
borrow=True)
# the bias is a 1D tensor -- one bias per output feature map
b_values = numpy.zeros((filter_shape[0],), dtype=theano.config.floatX)
self.b = theano.shared(value=b_values, borrow=True)
# convolve input feature maps with filters
conv_out = conv.conv2d(input=input, filters=self.W,
filter_shape=filter_shape, image_shape=image_shape)
# downsample each feature map individually, using maxpooling
pooled_out = downsample.max_pool_2d(input=conv_out,
ds=poolsize, ignore_border=True)
# add the bias term. Since the bias is a vector (1D array), we first
# reshape it to a tensor of shape (1,n_filters,1,1). Each bias will
# thus be broadcasted across mini-batches and feature map
# width & height
self.output = T.tanh(pooled_out + self.b.dimshuffle('x', 0, 'x', 'x'))
# store parameters of this layer
self.params = [self.W, self.b]
def getDataNumpy(layers):
data = []
for layer in layers:
wb = layer.params
w, b = wb[0].get_value(), wb[1].get_value()
data.append([w, b])
return data
def getDataJson(layers):
data = []
i = 0
for layer in layers:
w, b = layer.params
# print '..layer is', i
w, b = w.get_value(), b.get_value()
wshape = w.shape
# print '...the shape of w is', wshape
if len(wshape) == 2:
w = w.transpose()
else:
for k in xrange(wshape[0]):
for j in xrange(wshape[1]):
w[k][j] = numpy.rot90(w[k][j], 2)
w = w.reshape((wshape[0], numpy.prod(wshape[1:])))
w = w.tolist()
b = b.tolist()
data.append([w, b])
i += 1
return data
def writefile(data, name = '../../tmp/src/data/theanocnn.json'):
print ('writefile is ' + name)
f = open(name, "wb")
json.dump(data,f)
f.close()
def readfile(layers, nkerns, name = '../../tmp/src/data/theanocnn.json'):
# Load the dataset
print ('readfile is ' + name)
f = open(name, 'rb')
data = json.load(f)
f.close()
readwb(data, layers, nkerns)
def readwb(data, layers, nkerns):
i = 0
kernSize = len(nkerns)
inputnum = 1
for layer in layers:
w, b = data[i]
w = numpy.array(w, dtype='float32')
b = numpy.array(b, dtype='float32')
# print '..layer is', i
# print w.shape
if i >= kernSize:
w = w.transpose()
else:
w = w.reshape((nkerns[i], inputnum, 5, 5))
for k in xrange(nkerns[i]):
for j in xrange(inputnum):
c = w[k][j]
w[k][j] = numpy.rot90(c, 2)
inputnum = nkerns[i]
# print '..readwb ，transpose and rot180'
# print w.shape
layer.W.set_value(w, borrow=True)
layer.b.set_value(b, borrow=True)
i += 1
def loadwb(classifier, name='theanocnn.json'):
data = json.load(open(name, 'rb'))
w, b = data
print type(w)
w = numpy.array(w, dtype='float32').transpose()
classifier.W.set_value(w, borrow=True)
classifier.b.set_value(b, borrow=True)
def savewb(classifier, name='theanocnn.json'):
w, b = classifier.params
w = w.get_value().transpose().tolist()
b = b.get_value().tolist()
data = [w, b]
json.dump(data, open(name, 'wb'))
def evaluate_lenet5(learning_rate=0.1, n_epochs=2,
dataset='../../data/mnist.pkl',
nkerns=[20, 50], batch_size=500):
""" Demonstrates lenet on MNIST dataset
:type learning_rate: float
:param learning_rate: learning rate used (factor for the stochastic
gradient)
:type n_epochs: int
:param n_epochs: maximal number of epochs to run the optimizer
:type dataset: string
:param dataset: path to the dataset used for training /testing (MNIST here)
:type nkerns: list of ints
:param nkerns: number of kernels on each layer
"""
rng = numpy.random.RandomState(23455)
datasets = load_data(dataset)
train_set_x, train_set_y = datasets[0]
valid_set_x, valid_set_y = datasets[1]
test_set_x, test_set_y = datasets[2]
# compute number of minibatches for training, validation and testing
n_train_batches = train_set_x.get_value(borrow=True).shape[0]
n_valid_batches = valid_set_x.get_value(borrow=True).shape[0]
n_test_batches = test_set_x.get_value(borrow=True).shape[0]
n_train_batches /= batch_size
n_valid_batches /= batch_size
n_test_batches /= batch_size
# allocate symbolic variables for the data
index = T.lscalar() # index to a [mini]batch
x = T.matrix('x') # the data is presented as rasterized images
y = T.ivector('y') # the labels are presented as 1D vector of
# [int] labels
ishape = (28, 28) # this is the size of MNIST images
######################
# BUILD ACTUAL MODEL #
######################
print '... building the model'
# Reshape matrix of rasterized images of shape (batch_size,28*28)
# to a 4D tensor, compatible with our LeNetConvPoolLayer
layer0_input = x.reshape((batch_size, 1, 28, 28))
# Construct the first convolutional pooling layer:
# filtering reduces the image size to (28-5+1,28-5+1)=(24,24)
# maxpooling reduces this further to (24/2,24/2) = (12,12)
# 4D output tensor is thus of shape (batch_size,nkerns[0],12,12)
layer0 = LeNetConvPoolLayer(rng, input=layer0_input,
image_shape=(batch_size, 1, 28, 28),
filter_shape=(nkerns[0], 1, 5, 5), poolsize=(2, 2))
# Construct the second convolutional pooling layer
# filtering reduces the image size to (12-5+1,12-5+1)=(8,8)
# maxpooling reduces this further to (8/2,8/2) = (4,4)
# 4D output tensor is thus of shape (nkerns[0],nkerns[1],4,4)
layer1 = LeNetConvPoolLayer(rng, input=layer0.output,
image_shape=(batch_size, nkerns[0], 12, 12),
filter_shape=(nkerns[1], nkerns[0], 5, 5), poolsize=(2, 2))
# the TanhLayer being fully-connected, it operates on 2D matrices of
# shape (batch_size,num_pixels) (i.e matrix of rasterized images).
# This will generate a matrix of shape (20,32*4*4) = (20,512)
layer2_input = layer1.output.flatten(2)
# construct a fully-connected sigmoidal layer
layer2 = HiddenLayer(rng, input=layer2_input, n_in=nkerns[1] * 4 * 4,
n_out=500, activation=T.tanh)
# classify the values of the fully-connected sigmoidal layer
layer3 = LogisticRegression(input=layer2.output, n_in=500, n_out=10)
# the cost we minimize during training is the NLL of the model
cost = layer3.negative_log_likelihood(y)
# create a function to compute the mistakes that are made by the model
test_model = theano.function([index], layer3.errors(y),
givens={
x: test_set_x[index * batch_size: (index + 1) * batch_size],
y: test_set_y[index * batch_size: (index + 1) * batch_size]})
validate_model = theano.function([index], layer3.errors(y),
givens={
x: valid_set_x[index * batch_size: (index + 1) * batch_size],
y: valid_set_y[index * batch_size: (index + 1) * batch_size]})
# create a list of all model parameters to be fit by gradient descent
params = layer3.params + layer2.params + layer1.params + layer0.params
# create a list of gradients for all model parameters
grads = T.grad(cost, params)
layers = [layer0, layer1, layer2, layer3]
# train_model is a function that updates the model parameters by
# SGD Since this model has many parameters, it would be tedious to
# manually create an update rule for each model parameter. We thus
# create the updates list by automatically looping over all
# (params[i],grads[i]) pairs.
updates = []
for param_i, grad_i in zip(params, grads):
updates.append((param_i, param_i - learning_rate * grad_i))
train_model = theano.function([index], cost, updates=updates,
givens={
x: train_set_x[index * batch_size: (index + 1) * batch_size],
y: train_set_y[index * batch_size: (index + 1) * batch_size]})
###############
# TRAIN MODEL #
###############
print '... training'
# early-stopping parameters
patience = 10000 # look as this many examples regardless
patience_increase = 2 # wait this much longer when a new best is
# found
improvement_threshold = 0.995 # a relative improvement of this much is
# considered significant
validation_frequency = min(n_train_batches, patience / 2)
# go through this many
# minibatche before checking the network
# on the validation set; in this case we
# check every epoch
best_params = None
best_validation_loss = numpy.inf
best_iter = 0
test_score = 0.
start_time = time.clock()
epoch = 0
done_looping = False
while (epoch < n_epochs) and (not done_looping):
epoch = epoch + 1
print '...epoch is', epoch, 'writefile'
writefile(getDataJson(layers))
for minibatch_index in xrange(n_train_batches):
iter = (epoch - 1) * n_train_batches + minibatch_index
if iter % 100 == 0:
print 'training @ iter = ', iter
cost_ij = train_model(minibatch_index)
if (iter + 1) % validation_frequency == 0:
# compute zero-one loss on validation set
validation_losses = [validate_model(i) for i
in xrange(n_valid_batches)]
this_validation_loss = numpy.mean(validation_losses)
print('epoch %i, minibatch %i/%i, validation error %f %%' % \
(epoch, minibatch_index + 1, n_train_batches, \
this_validation_loss * 100.))
# if we got the best validation score until now
if this_validation_loss < best_validation_loss:
#improve patience if loss improvement is good enough
if this_validation_loss < best_validation_loss * \
improvement_threshold:
patience = max(patience, iter * patience_increase)
# save best validation score and iteration number
best_validation_loss = this_validation_loss
best_iter = iter
# test it on the test set
test_losses = [test_model(i) for i in xrange(n_test_batches)]
test_score = numpy.mean(test_losses)
print((' epoch %i, minibatch %i/%i, test error of best '
'model %f %%') %
(epoch, minibatch_index + 1, n_train_batches,
test_score * 100.))
if patience <= iter:
done_looping = True
break
end_time = time.clock()
print('Optimization complete.')
print('Best validation score of %f %% obtained at iteration %i,'\
'with test performance %f %%' %
(best_validation_loss * 100., best_iter + 1, test_score * 100.))
print >> sys.stderr, ('The code for file ' +
os.path.split(__file__)[1] +
' ran for %.2fm' % ((end_time - start_time) / 60.))
'''''
'''
readfile(layers, nkerns)
validation_losses = [validate_model(i) for i
in xrange(n_valid_batches)]
this_validation_loss = numpy.mean(validation_losses)
print('validation error %f %%' % \
(this_validation_loss * 100.))
if __name__ == '__main__':
evaluate_lenet5()

c++重写卷积网络的前向计算过程，完美复现theano的测试结果相关推荐

c++重写卷积网络的前向计算过程，复现theano的测试结果
转自http://blog.csdn.net/qiaofangjie/article/details/18042407 本人的需求是: 通过theano的cnn训练神经网络,将最终稳定的网络权值保存下 ...
论文翻译：搜索人脸活体检测的中心差异卷积网络及实现代码
搜索人脸活体检测的中心差异卷积网络摘要 1. 绪论 2. 相关工作人脸活体检测卷积运算符神经架构搜索 3. 方法论 3.1 中心差分卷积基本卷积基本卷积结合中心差分操作中心差分卷积的实现 ...
DNN：LSTM的前向计算和参数训练
原文-LSTM的反向传播:深度学习(6)-长短期网路:此处仅摘抄一小段,建议拜访全文. LSTM的参数训练:https://www.jianshu.com/p/dcec3f07d3b5:LSTM的参数 ...
卷积神经网络中卷积运算的前向传播与反向传播推导
文章作者:Tyan 博客:noahsnail.com | CSDN | [简书](http://www.jianshu.com/users/7731e83f3a4e/latest_articl ...
Lesson 16.1016.1116.1216.13 卷积层的参数量计算，1x1卷积核分组卷积与深度可分离卷积全连接层 nn.Sequential全局平均池化，NiN网络复现
二架构对参数量/计算量的影响在自建架构的时候,除了模型效果之外,我们还需要关注模型整体的计算效率.深度学习模型天生就需要大量数据进行训练,因此每次训练中的参数量和计算量就格外关键,因此在设计卷积网 ...
吴恩达神经网络和深度学习-学习笔记-30-相关符号和计算+单层卷积网络+简单卷积网络示例
卷积相关的符号和计算单层卷积网络首先执行线性函数,然后所有元素相乘做卷积. 具体做法是运用线性函数,再加上偏差,然后应用激活函数Relu. 也即是说,通过神经网络的一层,把一个6×6×3维度的a[ ...
卷积神经网络卷积计算,卷积网络计算公式
卷积运算的过程是什么?卷积计算的矩阵是怎么来的,如下图,这个卷积运算示意图怎么理解? 首先,卷积核相同,输入相同,输出的特征是一样的.只不过将输出的矩阵形式换成了列向量的形式. 实质上一般卷积运算与矩 ...
STGCN时空图卷积网络:用于交通预测的深度学习框架
时空图卷积网络:用于交通预测的深度学习框架及时准确的交通预测对城市交通控制和引导至关重要.由于交通流的高度非线性和复杂性,传统的方法不能满足中长期预测任务的要求,往往忽略了空间和时间的相关性.本文提 ...
理解图卷积网络的节点分类
理解图卷积网络的节点分类在过去的十年中,神经网络取得了巨大的成功.但是,只能使用常规或欧几里得数据来实现神经网络的早期变体,而现实世界中的许多数据都具有非欧几里得的底层图形结构.数据结构的不规则性导 ...

c++重写卷积网络的前向计算过程，完美复现theano的测试结果

c++重写卷积网络的前向计算过程，完美复现theano的测试结果相关推荐

最新文章

热门文章