1. 数据集准备

详细信息见: Caffe: LMDB 及其数据转换

mnist是一个手写数字库,由DL大牛Yan LeCun进行维护。mnist最初用于支票上的手写数字识别, 现在成了DL的入门练习库。征对mnist识别的专门模型是Lenet,算是最早的cnn模型了。

mnist数据训练样本为60000张,测试样本为10000张,每个样本为28*28大小的黑白图片,手写数字为0-9,因此分为10类。

1)数据可以从MNIST官网上下载)

2)或者执行如下命令

$CAFFE_ROOT表示源代码的根目录

cd $CAFFE_ROOT
./data/mnist/get_mnist.sh运行成功后,在 data/mnist/目录下有四个文件:
train-images-idx3-ubyte:  训练集样本 (9912422 bytes)
train-labels-idx1-ubyte:  训练集对应标注 (28881 bytes)
t10k-images-idx3-ubyte:   测试集图片 (1648877 bytes)
t10k-labels-idx1-ubyte:   测试集对应标注 (4542 bytes)这些数据不能在caffe中直接使用,需要转换成LMDB数据
./examples/mnist/create_mnist.sh

下载成功会有如下两个数据集:/data/mnist-train-leveldb 和 /data/mnist-test-leveldb.

2.LeNet: MNIST 分类模型的训练和测试

2.1 LeNet分类模型

使用LeNet模型网络来训练,是数字识别的好模型。The design of LeNet contains the essence of CNNs that are still used in larger models such as the ones in ImageNet. In general, it consists of a convolutional layer followed by a pooling layer, another convolution layer followed by a pooling layer, and then two fully connected layers similar to the conventional multilayer perceptrons. We have defined the layers in `$CAFFE_ROOT/examples/mnist/lenet_train_test.prototxt`.

2.2 MNIST网络结构定义

此处解释MNIST手写数字识别LeNet模型的定义 `lenet_train_test.prototxt`,Caffe中使用的protobuf 定义在 `$CAFFE_ROOT/src/caffe/proto/caffe.proto`中.接下来将会写一个protobuf定义:`caffe::NetParameter` (或者Python形式, `caffe.proto.caffe_pb2.NetParameter`) .

 开始,先定义一个网络名字:
    name: "LeNet"

2.2.1 数据层

demo中从lmdb创建的MNIST数据,通过一个数据层定义:
    layer {
      name: "mnist"
      type: "Data"
      transform_param {
        scale: 0.00390625
      }
      data_param {
        source: "mnist_train_lmdb"
        backend: LMDB
        batch_size: 64
      }
      top: "data"
      top: "label"
    }

本层有一个属性name `mnist`, type `data`,数据读取自 lmdb. batch的大小为64, 并且会对incoming pixels进行scale保证范围在 [0,1). 为什么是 0.00390625?这个值等于 1除以256. 最终, 该层生成两个blobs, 一个是 `data` blob, 另一个是 `label` blob.

2.2.2 卷积层

卷机层定义:
    layer {
      name: "conv1"
      type: "Convolution"
      param { lr_mult: 1 }
      param { lr_mult: 2 }
      convolution_param {
        num_output: 20
        kernel_size: 5
        stride: 1
        weight_filler {
          type: "xavier"
        }
        bias_filler {
          type: "constant"
        }
      }
      bottom: "data"
      top: "conv1"
    }

本层使用数据层提供的 `data` blob , 并产生 `conv1` 层. 产生 20个通道的输出, 卷积核大小5 and carried out with stride 1.
The fillers允许可以随机初始化weights和bias的值. 对于 weight filler, 使用`xavier` 算法给予神经元的输入和输出的数量自动地决定初始化的scale.对于 bias filler, 将会简单的初始化为常量t,默认为0.
`lr_mult`为层中参数的学习率调节器.当前权因子的学习率和运行时solver给定的学习率一样, bias的学习率为权因子学习率的两倍可以获得较好的收敛速度.


2.2.3 Pooling 层

Pooling是较容易定义的:

layer {
      name: "pool1"
      type: "Pooling"
      pooling_param {
        kernel_size: 2
        stride: 2
        pool: MAX
      }
      bottom: "conv1"
      top: "pool1"
    }
max pooling 核的大小为2,stride 为 2 (所以相邻的pooling区域没有重叠).
类似的,可以写出第二个 convolution 和 pooling 层. 详见 `$CAFFE_ROOT/examples/mnist/lenet_train_test.prototxt`

2.2.4 全连接层(Fully Connected Layer)

写一个全连接层很简单:

layer {
      name: "ip1"
      type: "InnerProduct"
      param { lr_mult: 1 }
      param { lr_mult: 2 }
      inner_product_param {
        num_output: 500
        weight_filler {
          type: "xavier"
        }
        bias_filler {
          type: "constant"
        }
      }
      bottom: "pool2"
      top: "ip1"
    }

T这样定义了一个全链接层(Caffe中称为`InnerProduct`层) 有 500个输出.所有其他行和之前的很相似对么?

2.2.5 ReLU 层

ReLU 层也很简单:

layer {
      name: "relu1"
      type: "ReLU"
      bottom: "ip1"
      top: "ip1"
    }

因为 ReLU是一个element-wise 操作, 我们可以做 *in-place* 操作节省内存. This is achieved by simply giving the same name to the bottom and top blobs. 当然, 不要使用重复的blob名字 for other layer types!

ReLU 层后, 我们将会写另一个 innerproduct layer:

layer {
      name: "ip2"
      type: "InnerProduct"
      param { lr_mult: 1 }
      param { lr_mult: 2 }
      inner_product_param {
        num_output: 10
        weight_filler {
          type: "xavier"
        }
        bias_filler {
          type: "constant"
        }
      }
      bottom: "ip1"
      top: "ip2"
    }

2.2.6 Loss 层

最后, loss层

layer {
      name: "loss"
      type: "SoftmaxWithLoss"
      bottom: "ip2"
      bottom: "label"
    }

`softmax_loss` 层移植了 softmax 和  multinomial logistic loss (that saves time and improves numerical stability). 使用了两个blobs, 第一个being the prediction and第二个being the `label` provided by the data layer (remember it?). 这并不产生任何输出,只是用来计算损失函数的值, report it when backpropagation starts, and initiates the gradient with respect to `ip2`. This is where all magic starts.

2.2.7 备注:写层的规则

Layer 的定义可以包含是否或者什么时候被包含在网络定义中的规则, 例如下面这个例子:

layer {
      // ...layer definition...
      include: { phase: TRAIN }
    }

这是一个规则, 基于网络的state controls layer inclusion in the network.
You can refer to `$CAFFE_ROOT/src/caffe/proto/caffe.proto` for more information about layer rules and model schema.

上面的这个例子, 该层只会被包含在 `TRAIN` phase.
如果改变 `TRAIN` 为 `TEST`,该层只会使用在test phase.

默认的, 层是没有规则的,一个层通常包含在网络中.
Thus, `lenet_train_test.prototxt` has two `DATA` layers defined (with different `batch_size`), one for the training phase and one for the testing phase.
Also, there is an `Accuracy` layer which is included only in `TEST` phase for reporting the model accuracy every 100 iteration, as defined in `lenet_solver.prototxt`.

整体解释如下:

#网络名称
name: "LeNet"
#train数据层
#输入源:mnist_train_ldmb,batch_size:64
#输出:data blob,label blob
#数据变换:scale归一化,0.00390625=1/255
layer {name: "mnist"type: "Data"top: "data"top: "label"include {phase: TRAIN}transform_param {scale: 0.00390625}data_param {source: "examples/mnist/mnist_train_lmdb"batch_size: 64backend: LMDB}
}
#test数据层
#输入源:mnist_test_ldmb,batch_size:100
#输出:data blob,label blob
#数据变换:scale归一化
layer {name: "mnist"type: "Data"top: "data"top: "label"include {phase: TEST}transform_param {scale: 0.00390625}data_param {source: "examples/mnist/mnist_test_lmdb"batch_size: 100backend: LMDB}
}
#卷积层conv1
#输入数据:data blob
#输出数据:conv1 blob
#卷积层参数:20个5*5的特征卷积核,步长为1,卷积核的权重初始化方式为xavier,偏置的初始化方式为constant,常量默认为0
#该层学习率:权重学习率为基学习率base_lr的1倍,偏置学习率为base_lr的两倍
layer {name: "conv1"type: "Convolution"bottom: "data"top: "conv1"param {lr_mult: 1}param {lr_mult: 2}convolution_param {num_output: 20kernel_size: 5stride: 1weight_filler {type: "xavier"}bias_filler {type: "constant"}}
}
#池化层pool1
#输入数据:conv1 blob
#输出数据:pool1 blob
#池化方式及参数:Max pool,2*2的池化核,步长为2
layer {name: "pool1"type: "Pooling"bottom: "conv1"top: "pool1"pooling_param {pool: MAXkernel_size: 2stride: 2}
}layer {name: "conv2"type: "Convolution"bottom: "pool1"top: "conv2"param {lr_mult: 1}param {lr_mult: 2}convolution_param {num_output: 50kernel_size: 5stride: 1weight_filler {type: "xavier"}bias_filler {type: "constant"}}
}
layer {name: "pool2"type: "Pooling"bottom: "conv2"top: "pool2"pooling_param {pool: MAXkernel_size: 2stride: 2}
}#全连接层ip1
#输入数据:pool2 blob
#输出数据:ip1 blob
#全连接层参数:500个节点,权值初始化方式为xavier,偏置初始化方式为constant,默认为0
#学习率:权重学习率为base_lr,偏置学习率为base_lr*2layer {name: "ip1"type: "InnerProduct"bottom: "pool2"top: "ip1"param {lr_mult: 1}param {lr_mult: 2}inner_product_param {num_output: 500weight_filler {type: "xavier"}bias_filler {type: "constant"}}
}
#非线性激活层relu1
#输入数据:ip1 blob
#输出数据:ip1 blob(注意仍然是ip1,由于relu是对每个点操作,输出也是对应个点的值,这样做便于省内存)
layer {name: "relu1"type: "ReLU"bottom: "ip1"top: "ip1"
}
#全连接层ip2
#输入数据:ip1 blob
#输出数据:ip2 blob(此网络输出也即用于最终预测的输出)
layer {name: "ip2"type: "InnerProduct"bottom: "ip1"top: "ip2"param {lr_mult: 1}param {lr_mult: 2}inner_product_param {num_output: 10weight_filler {type: "xavier"}bias_filler {type: "constant"}}
}
#精确度计算层accuracy,只在test阶段有用
#输入数据:ip2 blob,label accuracy
#输出数据:accuracy
layer {name: "accuracy"type: "Accuracy"bottom: "ip2"bottom: "label"top: "accuracy"include {phase: TEST}
}
# 损失函数层:SoftmaxWithLoss类型的loss
# 数据输入:ip2 blob, label blob
# 数据输出:loss blob,最终损失
layer {name: "loss"type: "SoftmaxWithLoss"bottom: "ip2"bottom: "label"top: "loss"
}

3 定义 MNIST Solver

Check out the comments explaining each line in the prototxt `$CAFFE_ROOT/examples/mnist/lenet_solver.prototxt`:

# The train/test net protocol buffer definition
    net: "examples/mnist/lenet_train_test.prototxt"
    # test_iter specifies how many forward passes the test should carry out.
    # In the case of MNIST, we have test batch size 100 and 100 test iterations,
    # covering the full 10,000 testing images.
    test_iter: 100
    # Carry out testing every 500 training iterations.
    test_interval: 500
    # The base learning rate, momentum and the weight decay of the network.
    base_lr: 0.01
    momentum: 0.9
    weight_decay: 0.0005
    # The learning rate policy
    lr_policy: "inv"
    gamma: 0.0001
    power: 0.75
    # Display every 100 iterations
    display: 100
    # The maximum number of iterations
    max_iter: 10000
    # snapshot intermediate results
    snapshot: 5000
    snapshot_prefix: "examples/mnist/lenet"
    # solver mode: CPU or GPU
    solver_mode: GPU

解释如下:

# 网络结构
net: "examples/mnist/lenet_train_test.prototxt"
# 此时validation总样本数test_iter*batch_size
test_iter: 100
# 每500次训练迭代进行依次validation迭代
test_interval: 500
#初始学习率,基学习率
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
# 学习率变化策略
lr_policy: "inv"
gamma: 0.0001
power: 0.75
# 每100次迭代显示一次loss等参数,包括训练与验证
display: 100
# 最多迭代次数
max_iter: 10000
# 每5000次迭代进行一次模型存储,防止意外中断
snapshot: 5000
snapshot_prefix: "examples/mnist/lenet"
# 训练和测试模式
solver_mode: GPU

4 训练和测试模型

写完network definition protobuf and solver protobuf后. 直接运行 `train_lenet.sh`, 或者按照如下命令:

cd $CAFFE_ROOT
    ./examples/mnist/train_lenet.sh

`train_lenet.sh` 是一个简单的脚本, but here is a quick explanation: the main tool for training is `caffe` with action `train` and the solver protobuf text file as its argument.

When you run the code, you will see a lot of messages flying by like this:I1203 net.cpp:66] Creating Layer conv1I1203 net.cpp:76] conv1 <- dataI1203 net.cpp:101] conv1 -> conv1I1203 net.cpp:116] Top shape: 20 24 24I1203 net.cpp:127] conv1 needs backward computation.These messages tell you the details about each layer, its connections and its output shape, which may be helpful in debugging. After the initialization, the training will start:I1203 net.cpp:142] Network initialization done.I1203 solver.cpp:36] Solver scaffolding done.I1203 solver.cpp:44] Solving LeNetBased on the solver setting, we will print the training loss function every 100 iterations, and test the network every 500 iterations. You will see messages like this:I1203 solver.cpp:204] Iteration 100, lr = 0.00992565I1203 solver.cpp:66] Iteration 100, loss = 0.26044...I1203 solver.cpp:84] Testing netI1203 solver.cpp:111] Test score #0: 0.9785I1203 solver.cpp:111] Test score #1: 0.0606671For each training iteration, `lr` is the learning rate of that iteration, and `loss` is the training function. For the output of the testing phase, score 0 is the accuracy, and score 1 is the testing loss function.And after a few minutes, you are done!I1203 solver.cpp:84] Testing netI1203 solver.cpp:111] Test score #0: 0.9897I1203 solver.cpp:111] Test score #1: 0.0324599I1203 solver.cpp:126] Snapshotting to lenet_iter_10000I1203 solver.cpp:133] Snapshotting solver state to lenet_iter_10000.solverstateI1203 solver.cpp:78] Optimization Done.The final model, stored as a binary protobuf file, is stored atlenet_iter_10000which you can deploy as a trained model in your application, if you are training on a real-world application dataset.

如果想在GPU下运行计算,只需要修改lenet_solver.prototxt文件的solver_mode即可,0是CPU,1是GPU。

    # solver mode: CPU or GPUsolver_mode: CPU


 How to reduce the learning rate at fixed steps?

Look at lenet_multistep_solver.prototxt

训练MNIST数据集模型相关推荐

  1. 体验paddle2.0rc版本API-Model--实现Mnist数据集模型训练

    原文链接:体验paddle2.0rc版本API-Model–实现Mnist数据集模型训练:https://blog.csdn.net/weixin_44604887/article/details/1 ...

  2. ubuntu16.04caffe训练mnist数据集

      搭好了环境,下面就该训练模型了呀!实践才是真理的唯一标准!大多数情况下,新接触caffe的小白们第一个训练的模型一定是Mnist数据集吧.这篇文章就以mnist数据集为例介绍下如何训练模型吧!(训 ...

  3. 深度学习基础: BP神经网络训练MNIST数据集

    BP 神经网络训练MNIST数据集 不用任何深度学习框架,一起写一个神经网络训练MNIST数据集 本文试图让您通过手写一个简单的demo来讨论 1. 导包 import numpy as np imp ...

  4. 实践详细篇-Windows下使用VS2015编译的Caffe训练mnist数据集

    上一篇记录的是学习caffe前的环境准备以及如何创建好自己需要的caffe版本.这一篇记录的是如何使用编译好的caffe做训练mnist数据集,步骤编号延用上一篇 <实践详细篇-Windows下 ...

  5. TensorFlow精进之路(十四):RNN训练MNIST数据集

    1.概述 前面介绍了RNN,这一节就用tensorflow的RNN来训练MNIST数据集,看看准确率如何. 2.代码实现 2.1.导入数据集 # encoding:utf-8 import tenso ...

  6. Pytorch 实现全连接神经网络/卷积神经网络训练MNIST数据集,并将训练好的模型在制作自己的手写图片数据集上测试

    使用教程 代码下载地址:点我下载 模型在训练过程中会自动显示训练进度,如果您的pytorch是CPU版本的,代码会自动选择CPU训练,如果有cuda,则会选择GPU训练. 项目目录说明: CNN文件夹 ...

  7. caffe(ubuntu14.04)学习笔记1——运行MNIST数据集模型

    MNIST数据集简介: MNIST数据集是一个大型的手写体数据库,广泛用于机器学习领域的训练和测试,它是由纽约大学的Yann LeCun教授整理的,包括60000个训练样本和10000个测试样本,其图 ...

  8. 使用 全连接神经网络 训练MNIST数据分类模型

    (一) 实验目的 使用简单的全连接层神经网络对MNIST手写数字图片进行分类.通过本次实验,可以掌握如下知识点: 学习 TensorFlow2 神经网络模型构建方式: 学习 tf.keras.laye ...

  9. [caffe(一)]使用caffe训练mnist数据集

    1.数据集的下载与转换 1)我们在mnist数据集上做测试,MNIST handwritten digit database, Yann LeCun, Corinna Cortes and Chris ...

最新文章

  1. 21.等值线图(Counter Plot)、Contour Demo、Creating a “meshgrid”、Calculation of the Values、等
  2. laravel允许所有网站进行跨域操作
  3. linux安装程序乱码,linux远程桌面乱码解决及引起的相关问题、字库安装
  4. Python学习笔记:用Python获取数据(本地数据与网络数据)
  5. 如何攻克 C++ 中复杂的类型转换?
  6. 终端服务器超出了最大允许连接数
  7. 运行CrossOver应用程序的四种方法
  8. android 东软pda扫描适配_东软数字化医院解决方案
  9. DTCC 2018大会归来
  10. 使用Office2013打开文档时,弹出“正在与服务器联系以获取信息”对话框问题
  11. 四通一达归于阿里后就涨价,证明资本的目标就是以垄断攫取利润
  12. 数据结构——二叉树错题集
  13. 腾讯云国外服务器2核4G服务器新用户全攻略
  14. OpenGL字体绘制
  15. 源码分析 There is no getter for property named '*' in 'class java.lang.String
  16. 工控攻击,黑客组织GhostSec 称入侵以色列55 家Berghof PLC
  17. 分享一个ESP32中继
  18. html+css+js制作一个超炫酷的雪花特效
  19. oracle之查询某一列是否含有英文字符
  20. CY7C68013的slave fifo的时序分析(附FPGA代码)

热门文章

  1. Linux screen如何加载用户配置
  2. C++中std::function和std::bind
  3. leetcode算法题--二叉树中的列表★
  4. java 9999 符号_java 9999(示例代码)
  5. python没用_大部分Python资料都没有说到的重点-用实战教你解决问题的思路
  6. 压力测试工具gatling安装和介绍
  7. ansible-playbook jdk安装
  8. Linux操作系统下查看硬件信息的命令总结
  9. linux中的NFS服务器配置及/etc/exports
  10. 原理分析之:从JDBC到Mybatis