1. 数据集准备

详细信息见：　Caffe: LMDB 及其数据转换

mnist是一个手写数字库，由DL大牛Yan LeCun进行维护。mnist最初用于支票上的手写数字识别, 现在成了DL的入门练习库。征对mnist识别的专门模型是Lenet，算是最早的cnn模型了。

mnist数据训练样本为60000张，测试样本为10000张，每个样本为28*28大小的黑白图片，手写数字为0-9，因此分为10类。

１）数据可以从MNIST官网上下载）

２）或者执行如下命令

$CAFFE_ROOT表示源代码的根目录

cd $CAFFE_ROOT
./data/mnist/get_mnist.sh运行成功后，在 data/mnist/目录下有四个文件：
train-images-idx3-ubyte:  训练集样本 (9912422 bytes)
train-labels-idx1-ubyte:  训练集对应标注 (28881 bytes)
t10k-images-idx3-ubyte:   测试集图片 (1648877 bytes)
t10k-labels-idx1-ubyte:   测试集对应标注 (4542 bytes)这些数据不能在caffe中直接使用，需要转换成LMDB数据
./examples/mnist/create_mnist.sh

下载成功会有如下两个数据集：/data/mnist-train-leveldb 和 /data/mnist-test-leveldb.

２．LeNet: MNIST 分类模型的训练和测试

2.1 LeNet分类模型

使用LeNet模型网络来训练，是数字识别的好模型。The design of LeNet contains the essence of CNNs that are still used in larger models such as the ones in ImageNet. In general, it consists of a convolutional layer followed by a pooling layer, another convolution layer followed by a pooling layer, and then two fully connected layers similar to the conventional multilayer perceptrons. We have defined the layers in `$CAFFE_ROOT/examples/mnist/lenet_train_test.prototxt`.

2.2 MNIST网络结构定义

此处解释MNIST手写数字识别LeNet模型的定义 `lenet_train_test.prototxt`，Caffe中使用的protobuf 定义在 `$CAFFE_ROOT/src/caffe/proto/caffe.proto`中.接下来将会写一个protobuf定义：`caffe::NetParameter` (或者Ｐｙｔｈｏｎ形式, `caffe.proto.caffe_pb2.NetParameter`) .

　开始，先定义一个网络名字：
name: "LeNet"

2.2.1 数据层

demo中从ｌｍｄｂ创建的MNIST数据，通过一个数据层定义:
    layer {
      name: "mnist"
      type: "Data"
      transform_param {
        scale: 0.00390625
      }
      data_param {
        source: "mnist_train_lmdb"
        backend: LMDB
        batch_size: 64
      }
      top: "data"
      top: "label"
    }

本层有一个属性name `mnist`, type `data`,数据读取自 lmdb. batch的大小为64, 并且会对incoming pixels进行ｓｃａｌｅ保证范围在 [0,1). 为什么是 0.00390625?这个值等于 1除以256. 最终, 该层生成两个blobs, 一个是 `data` blob, 另一个是 `label` blob.

2.2.2 卷积层

卷机层定义:
    layer {
      name: "conv1"
      type: "Convolution"
      param { lr_mult: 1 }
      param { lr_mult: 2 }
      convolution_param {
        num_output: 20
        kernel_size: 5
        stride: 1
        weight_filler {
          type: "xavier"
        }
        bias_filler {
          type: "constant"
        }
      }
      bottom: "data"
      top: "conv1"
    }

本层使用数据层提供的 `data` blob , 并产生 `conv1` 层. 产生 20个通道的输出, 卷积核大小5 and carried out with stride 1.
The fillers允许可以随机初始化weights和bias的值. 对于 weight filler, 使用`xavier` 算法给予神经元的输入和输出的数量自动地决定初始化的scale.对于 bias filler, 将会简单的初始化为常量t,默认为0.
`lr_mult`为层中参数的学习率调节器.当前权因子的学习率和运行时solver给定的学习率一样, bias的学习率为权因子学习率的两倍可以获得较好的收敛速度.

2.2.3 Pooling 层

Pooling是较容易定义的:

layer {
      name: "pool1"
      type: "Pooling"
      pooling_param {
        kernel_size: 2
        stride: 2
        pool: MAX
      }
      bottom: "conv1"
      top: "pool1"
    }
max pooling 核的大小为2，stride 为 2 (所以相邻的pooling区域没有重叠).
类似的，可以写出第二个 convolution 和 pooling 层. 详见 `$CAFFE_ROOT/examples/mnist/lenet_train_test.prototxt`

2.2.4 全连接层（Fully Connected Layer）

写一个全连接层很简单:

layer {
      name: "ip1"
      type: "InnerProduct"
      param { lr_mult: 1 }
      param { lr_mult: 2 }
      inner_product_param {
        num_output: 500
        weight_filler {
          type: "xavier"
        }
        bias_filler {
          type: "constant"
        }
      }
      bottom: "pool2"
      top: "ip1"
    }

T这样定义了一个全链接层(Caffe中称为`InnerProduct`层) 有 500个输出.所有其他行和之前的很相似对么？

2.2.5 ReLU 层

ReLU 层也很简单:

layer {
      name: "relu1"
      type: "ReLU"
      bottom: "ip1"
      top: "ip1"
    }

因为 ReLU是一个element-wise 操作, 我们可以做 *in-place* 操作节省内存. This is achieved by simply giving the same name to the bottom and top blobs. 当然, 不要使用重复的blob名字 for other layer types!

ReLU 层后, 我们将会写另一个 innerproduct layer:

layer {
      name: "ip2"
      type: "InnerProduct"
      param { lr_mult: 1 }
      param { lr_mult: 2 }
      inner_product_param {
        num_output: 10
        weight_filler {
          type: "xavier"
        }
        bias_filler {
          type: "constant"
        }
      }
      bottom: "ip1"
      top: "ip2"
    }

2.2.6 Loss 层

最后, loss层

layer {
      name: "loss"
      type: "SoftmaxWithLoss"
      bottom: "ip2"
      bottom: "label"
    }

`softmax_loss` 层移植了 softmax 和 multinomial logistic loss (that saves time and improves numerical stability). 使用了两个blobs, 第一个being the prediction and第二个being the `label` provided by the data layer (remember it?).　这并不产生任何输出，只是用来计算损失函数的值, report it when backpropagation starts, and initiates the gradient with respect to `ip2`. This is where all magic starts.

2.2.7 备注：写层的规则

Layer 的定义可以包含是否或者什么时候被包含在网络定义中的规则, 例如下面这个例子:

layer {
      // ...layer definition...
      include: { phase: TRAIN }
    }

这是一个规则, 基于网络的state controls layer inclusion in the network.
You can refer to `$CAFFE_ROOT/src/caffe/proto/caffe.proto` for more information about layer rules and model schema.

上面的这个例子, 该层只会被包含在 `TRAIN` phase.
如果改变 `TRAIN` 为 `TEST`,该层只会使用在test phase.

默认的, 层是没有规则的,一个层通常包含在网络中.
Thus, `lenet_train_test.prototxt` has two `DATA` layers defined (with different `batch_size`), one for the training phase and one for the testing phase.
Also, there is an `Accuracy` layer which is included only in `TEST` phase for reporting the model accuracy every 100 iteration, as defined in `lenet_solver.prototxt`.

整体解释如下：

#网络名称
name: "LeNet"
#train数据层
#输入源：mnist_train_ldmb，batch_size：64
#输出：data blob，label blob
#数据变换：scale归一化，0.00390625=1/255
layer {name: "mnist"type: "Data"top: "data"top: "label"include {phase: TRAIN}transform_param {scale: 0.00390625}data_param {source: "examples/mnist/mnist_train_lmdb"batch_size: 64backend: LMDB}
}
#test数据层
#输入源：mnist_test_ldmb，batch_size：100
#输出：data blob，label blob
#数据变换：scale归一化
layer {name: "mnist"type: "Data"top: "data"top: "label"include {phase: TEST}transform_param {scale: 0.00390625}data_param {source: "examples/mnist/mnist_test_lmdb"batch_size: 100backend: LMDB}
}
#卷积层conv1
#输入数据：data blob
#输出数据:conv1 blob
#卷积层参数：20个5*5的特征卷积核，步长为1，卷积核的权重初始化方式为xavier，偏置的初始化方式为constant，常量默认为0
#该层学习率：权重学习率为基学习率base_lr的1倍，偏置学习率为base_lr的两倍
layer {name: "conv1"type: "Convolution"bottom: "data"top: "conv1"param {lr_mult: 1}param {lr_mult: 2}convolution_param {num_output: 20kernel_size: 5stride: 1weight_filler {type: "xavier"}bias_filler {type: "constant"}}
}
#池化层pool1
#输入数据：conv1 blob
#输出数据：pool1 blob
#池化方式及参数：Max pool，2*2的池化核，步长为2
layer {name: "pool1"type: "Pooling"bottom: "conv1"top: "pool1"pooling_param {pool: MAXkernel_size: 2stride: 2}
}layer {name: "conv2"type: "Convolution"bottom: "pool1"top: "conv2"param {lr_mult: 1}param {lr_mult: 2}convolution_param {num_output: 50kernel_size: 5stride: 1weight_filler {type: "xavier"}bias_filler {type: "constant"}}
}
layer {name: "pool2"type: "Pooling"bottom: "conv2"top: "pool2"pooling_param {pool: MAXkernel_size: 2stride: 2}
}#全连接层ip1
#输入数据：pool2 blob
#输出数据：ip1 blob
#全连接层参数：500个节点，权值初始化方式为xavier，偏置初始化方式为constant，默认为0
#学习率：权重学习率为base_lr，偏置学习率为base_lr*2layer {name: "ip1"type: "InnerProduct"bottom: "pool2"top: "ip1"param {lr_mult: 1}param {lr_mult: 2}inner_product_param {num_output: 500weight_filler {type: "xavier"}bias_filler {type: "constant"}}
}
#非线性激活层relu1
#输入数据：ip1 blob
#输出数据：ip1 blob（注意仍然是ip1，由于relu是对每个点操作，输出也是对应个点的值，这样做便于省内存）
layer {name: "relu1"type: "ReLU"bottom: "ip1"top: "ip1"
}
#全连接层ip2
#输入数据：ip1 blob
#输出数据：ip2 blob（此网络输出也即用于最终预测的输出）
layer {name: "ip2"type: "InnerProduct"bottom: "ip1"top: "ip2"param {lr_mult: 1}param {lr_mult: 2}inner_product_param {num_output: 10weight_filler {type: "xavier"}bias_filler {type: "constant"}}
}
#精确度计算层accuracy，只在test阶段有用
#输入数据：ip2 blob，label accuracy
#输出数据：accuracy
layer {name: "accuracy"type: "Accuracy"bottom: "ip2"bottom: "label"top: "accuracy"include {phase: TEST}
}
# 损失函数层：SoftmaxWithLoss类型的loss
# 数据输入：ip2 blob， label blob
# 数据输出：loss blob，最终损失
layer {name: "loss"type: "SoftmaxWithLoss"bottom: "ip2"bottom: "label"top: "loss"
}

３定义 MNIST Solver

Check out the comments explaining each line in the prototxt `$CAFFE_ROOT/examples/mnist/lenet_solver.prototxt`:

# The train/test net protocol buffer definition
    net: "examples/mnist/lenet_train_test.prototxt"
    # test_iter specifies how many forward passes the test should carry out.
    # In the case of MNIST, we have test batch size 100 and 100 test iterations,
    # covering the full 10,000 testing images.
    test_iter: 100
    # Carry out testing every 500 training iterations.
    test_interval: 500
    # The base learning rate, momentum and the weight decay of the network.
    base_lr: 0.01
    momentum: 0.9
    weight_decay: 0.0005
    # The learning rate policy
    lr_policy: "inv"
    gamma: 0.0001
    power: 0.75
    # Display every 100 iterations
    display: 100
    # The maximum number of iterations
    max_iter: 10000
    # snapshot intermediate results
    snapshot: 5000
    snapshot_prefix: "examples/mnist/lenet"
    # solver mode: CPU or GPU
    solver_mode: GPU

解释如下：

# 网络结构
net: "examples/mnist/lenet_train_test.prototxt"
# 此时validation总样本数test_iter*batch_size
test_iter: 100
# 每500次训练迭代进行依次validation迭代
test_interval: 500
#初始学习率，基学习率
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
# 学习率变化策略
lr_policy: "inv"
gamma: 0.0001
power: 0.75
# 每100次迭代显示一次loss等参数，包括训练与验证
display: 100
# 最多迭代次数
max_iter: 10000
# 每5000次迭代进行一次模型存储，防止意外中断
snapshot: 5000
snapshot_prefix: "examples/mnist/lenet"
# 训练和测试模式
solver_mode: GPU

４训练和测试模型

写完network definition protobuf and solver protobuf后. 直接运行 `train_lenet.sh`, 或者按照如下命令:

cd $CAFFE_ROOT
./examples/mnist/train_lenet.sh

`train_lenet.sh` 是一个简单的脚本, but here is a quick explanation: the main tool for training is `caffe` with action `train` and the solver protobuf text file as its argument.

Ｗhen you run the code, you will see a lot of messages flying by like this:I1203 net.cpp:66] Creating Layer conv1I1203 net.cpp:76] conv1 <- dataI1203 net.cpp:101] conv1 -> conv1I1203 net.cpp:116] Top shape: 20 24 24I1203 net.cpp:127] conv1 needs backward computation.These messages tell you the details about each layer, its connections and its output shape, which may be helpful in debugging. After the initialization, the training will start:I1203 net.cpp:142] Network initialization done.I1203 solver.cpp:36] Solver scaffolding done.I1203 solver.cpp:44] Solving LeNetBased on the solver setting, we will print the training loss function every 100 iterations, and test the network every 500 iterations. You will see messages like this:I1203 solver.cpp:204] Iteration 100, lr = 0.00992565I1203 solver.cpp:66] Iteration 100, loss = 0.26044...I1203 solver.cpp:84] Testing netI1203 solver.cpp:111] Test score #0: 0.9785I1203 solver.cpp:111] Test score #1: 0.0606671For each training iteration, `lr` is the learning rate of that iteration, and `loss` is the training function. For the output of the testing phase, score 0 is the accuracy, and score 1 is the testing loss function.And after a few minutes, you are done!I1203 solver.cpp:84] Testing netI1203 solver.cpp:111] Test score #0: 0.9897I1203 solver.cpp:111] Test score #1: 0.0324599I1203 solver.cpp:126] Snapshotting to lenet_iter_10000I1203 solver.cpp:133] Snapshotting solver state to lenet_iter_10000.solverstateI1203 solver.cpp:78] Optimization Done.The final model, stored as a binary protobuf file, is stored atlenet_iter_10000which you can deploy as a trained model in your application, if you are training on a real-world application dataset.

如果想在GPU下运行计算，只需要修改lenet_solver.prototxt文件的solver_mode即可，0是CPU，1是GPU。

    # solver mode: CPU or GPUsolver_mode: CPU

How to reduce the learning rate at fixed steps?
Look at lenet_multistep_solver.prototxt

训练MNIST数据集模型相关推荐

体验paddle2.0rc版本API-Model--实现Mnist数据集模型训练
原文链接:体验paddle2.0rc版本API-Model–实现Mnist数据集模型训练:https://blog.csdn.net/weixin_44604887/article/details/1 ...
ubuntu16.04caffe训练mnist数据集
搭好了环境,下面就该训练模型了呀!实践才是真理的唯一标准!大多数情况下,新接触caffe的小白们第一个训练的模型一定是Mnist数据集吧.这篇文章就以mnist数据集为例介绍下如何训练模型吧!(训 ...
深度学习基础: BP神经网络训练MNIST数据集
BP 神经网络训练MNIST数据集不用任何深度学习框架,一起写一个神经网络训练MNIST数据集本文试图让您通过手写一个简单的demo来讨论 1. 导包 import numpy as np imp ...
实践详细篇-Windows下使用VS2015编译的Caffe训练mnist数据集
上一篇记录的是学习caffe前的环境准备以及如何创建好自己需要的caffe版本.这一篇记录的是如何使用编译好的caffe做训练mnist数据集,步骤编号延用上一篇 <实践详细篇-Windows下 ...
TensorFlow精进之路（十四）：RNN训练MNIST数据集
1.概述前面介绍了RNN,这一节就用tensorflow的RNN来训练MNIST数据集,看看准确率如何. 2.代码实现 2.1.导入数据集 # encoding:utf-8 import tenso ...
Pytorch 实现全连接神经网络/卷积神经网络训练MNIST数据集，并将训练好的模型在制作自己的手写图片数据集上测试
使用教程代码下载地址:点我下载模型在训练过程中会自动显示训练进度,如果您的pytorch是CPU版本的,代码会自动选择CPU训练,如果有cuda,则会选择GPU训练. 项目目录说明: CNN文件夹 ...
caffe（ubuntu14.04）学习笔记1——运行MNIST数据集模型
MNIST数据集简介: MNIST数据集是一个大型的手写体数据库,广泛用于机器学习领域的训练和测试,它是由纽约大学的Yann LeCun教授整理的,包括60000个训练样本和10000个测试样本,其图 ...
使用全连接神经网络训练MNIST数据分类模型
(一) 实验目的使用简单的全连接层神经网络对MNIST手写数字图片进行分类.通过本次实验,可以掌握如下知识点: 学习 TensorFlow2 神经网络模型构建方式: 学习 tf.keras.laye ...
[caffe(一)]使用caffe训练mnist数据集
1.数据集的下载与转换 1)我们在mnist数据集上做测试,MNIST handwritten digit database, Yann LeCun, Corinna Cortes and Chris ...

训练MNIST数据集模型

２．LeNet: MNIST 分类模型的训练和测试

2.2 MNIST网络结构定义

2.2.1 数据层

2.2.2 卷积层

2.2.3 Pooling 层

2.2.4 全连接层（Fully Connected Layer）

2.2.5 ReLU 层

2.2.6 Loss 层

2.2.7 备注：写层的规则

３定义 MNIST Solver

４训练和测试模型

训练MNIST数据集模型相关推荐

最新文章

热门文章

训练MNIST数据集模型

２．LeNet: MNIST 分类模型的训练和测试

2.2 MNIST网络结构定义

2.2.1 数据层

2.2.2 卷积层

2.2.3 Pooling 层

2.2.4 全连接层（Fully Connected Layer）

2.2.5 ReLU 层

2.2.6 Loss 层

2.2.7 备注：写层的规则

３ 定义 MNIST Solver

４ 训练和测试模型

训练MNIST数据集模型相关推荐

最新文章

热门文章

３定义 MNIST Solver

４训练和测试模型