Solver Prototxt

https://github.com/BVLC/caffe/wiki/Solver-Prototxt
caffe.proto: BVLC/caffe/src/caffe/proto/caffe.proto

The solver.prototxt is a configuration file used to tell caffe how you want the network trained.

配置文件

solver ['sɒlvə]：n. 解决者，解算机，求解程序
proto [prəʊtə]：n. 原型，样机，典型
Berkeley Vision and Learning Center，BVLC
Berkeley Artificial Intelligence Research，BAIR
Convolutional Architecture for Fast Feature Embedding，Caffe

1. caffe/examples/mnist/lenet_solver.prototxt

# The train/test net protocol buffer definition
net: "examples/mnist/lenet_train_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 100
# Carry out testing every 500 training iterations.
test_interval: 500
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
# The learning rate policy
lr_policy: "inv"
gamma: 0.0001
power: 0.75
# Display every 100 iterations
display: 100
# The maximum number of iterations
max_iter: 10000
# snapshot intermediate results
snapshot: 5000
snapshot_prefix: "examples/mnist/lenet"
# solver mode: CPU or GPU
solver_mode: GPU

2. Parameters

base_lr

基础学习率

This parameter indicates the base (beginning) learning rate of the network. The value is a real number (floating point).

The base learning rate.

基础学习率，在参数梯度下降优化的过程中，学习率会有所调整。调整的策略可以通过 lr_policy 参数设置。

lr_policy

学习率变化规律

This parameter indicates how the learning rate should change over time. This value is a quoted string.

Options include:

“step” - drop the learning rate in step sizes indicated by the gamma parameter.
“multistep” - drop the learning rate in step size indicated by the gamma at each specified stepvalue.
“fixed” - the learning rate does not change. (保持 base_lr 不变。)
“exp” - base_lr * gamma^iter. (iter 为当前迭代次数。)
“poly” - the effective learning rate follows a polynomial decay, to be zero by the max_iter.
base_lr * (1 - iter/max_iter) ^ (power) (学习率依照多项式衰减。)
“sigmoid” - the effective learning rate follows a sigmod decay.
base_lr * ( 1/(1 + exp(-gamma * (iter - stepsize)))) (学习率依照 sigmod 衰减。)

where base_lr, max_iter, gamma, step, stepvalue and power are defined in the solver parameter protocol buffer, and iter is the current iteration.

  // The learning rate decay policy. The currently implemented learning rate// policies are as follows://    - fixed: always return base_lr.//    - step: return base_lr * gamma ^ (floor(iter / step))//    - exp: return base_lr * gamma ^ iter//    - inv: return base_lr * (1 + gamma * iter) ^ (- power)//    - multistep: similar to step but it allows non uniform steps defined by//      stepvalue//    - poly: the effective learning rate follows a polynomial decay, to be//      zero by the max_iter. return base_lr (1 - iter/max_iter) ^ (power)//    - sigmoid: the effective learning rate follows a sigmod decay//      return base_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))//// where base_lr, max_iter, gamma, step, stepvalue and power are defined// in the solver parameter protocol buffer, and iter is the current iteration.

学习率变化规律设置为随着迭代次数的增加，慢慢降低。

quote [kwəʊt]：vt. 报价，引述，举证 vi. 报价，引用，引证 n. 引用
sigmoid  ['sɪgmɒɪd]：adj. 乙状结肠的，C 形的，S 形的 n. 乙状结肠，S 状弯曲
polynomial [,pɒlɪ'nəʊmɪəl]：n. 多项式，由 2 字以上组成的学名 adj. 多项式的，多词学名
decay  [dɪ'keɪ]：vi. 衰退，衰减，腐烂，腐朽 n. 衰退，衰减，腐烂，腐朽 vt. 使腐烂，使腐败，使衰退，使衰落
protocol ['prəʊtəkɒl]：n. 协议，草案，礼仪 vt. 拟定 vi. 拟定
multistep ['mʌlti,step]：adj. 多级的

gamma

学习率变化指数

This parameter indicates how much the learning rate should change every time we reach the next “step.” The value is a real number, and can be thought of as multiplying the current learning rate by said number to gain a new learning rate.

stepsize

学习率变化间隔

This parameter indicates how often (at some iteration count) that we should move onto the next “step” of training. This value is a positive integer.

stepsize 太小会导致学习率越来越小，达不到充分收敛的效果。

stepvalue

This parameter indicates one of potentially many iteration counts that we should move onto the next “step” of training. This value is a positive integer. There are often more than one of these parameters present, each one indicated the next step iteration.

potentially [pə'tɛnʃəli]：adv. 可能地，潜在地

max_iter

最大迭代次数

This parameter indicates when the network should stop training. The value is an integer indicate which iteration should be the last.

The maximum number of iterations.

iteration (iter) 相当于一次权重更新。一次前向和一次后向，然后更新权重。

momentum

动量

This parameter indicates how much of the previous weight will be retained in the new calculation. This value is a real fraction.

calculation [kælkjʊ'leɪʃ(ə)n]：n. 计算，估计，计算的结果，深思熟虑
retain  [rɪ'teɪn]：vt. 保持，雇，记住
momentum  [mə'mentəm]：n. 势头，动量，动力，冲力
Newton's laws of motion：牛顿运动定律
First law：第一定律
Second law：第二定律
Third law：第三定律

保留上一次权重的比率。

为寻优加入了惯性的影响，当误差曲面中存在平坦区的时候，SGD 可以以更快的速度学习。

牛顿第一运动定律 (牛顿第一定律、惯性定律、惰性定律)：任何物体都要保持匀速直线运动或静止状态，直到外力迫使它改变运动状态为止。

weight_decay

权重衰减

This parameter indicates the factor of (regularization) penalization of large weights. This value is a often a real fraction.

regularization [,rɛɡjʊlərɪ'zeʃən]：n. 规则化，调整，合法化
penalization ['penəlai,zeiʃn]：n. 惩罚，处罚
fraction  ['frækʃ(ə)n]：n. 分数，部分，小部分，稍微

权重衰减，防止过拟合。

random_seed

A random seed used by the solver and the network (for example, in dropout layer).

solver_mode

CPU / GPU 模式

This parameter indicates which mode will be used in solving the network.

Options include:

snapshot

保存模型的间隔

This parameter indicates how often caffe should output a model and solverstate. This value is a positive integer.

snapshot_prefix

保存模型的前缀 / 路径

This parameter indicates how a snapshot output’s model and solverstate’s name should be prefixed. This value is a double quoted string.

prefix ['priːfɪks]：n. 前缀 vt. 加前缀，将某事物加在前面

net

训练或者测试配置文件 (train_net or test_net)

This parameter indicates the location of the network to be trained (path to prototxt). This value is a double quoted string.

iter_size

Accumulate gradients across batches through the iter_size solver field. With this setting batch_size: 16 with iter_size: 1 and batch_size: 4 with iter_size: 4 are equivalent.

Accumulate gradients over iter_size x batch_size instances.

accumulate [ə'kjuːmjʊleɪt]：vi. 累积，积聚 vt. 积攒
gradient ['greɪdɪənt]：n. 梯度，坡度，倾斜度
equivalent [ɪ'kwɪv(ə)l(ə)nt]：adj. 等价的，相等的，同意义的 n. 等价物，相等物

test_iter

测试迭代次数

This parameter indicates how many test iterations should occur per test_interval. This value is a positive integer.

The number of iterations for each test net.

occur [ə'kɜː]：vi. 发生，出现，存在

test_iter: 100
test_iter specifies how many forward passes the test should carry out.
In the case of MNIST, we have test batch size 100 and 100 test iterations, covering the full 10,000 testing images.

test_interval

测试间隔

This parameter indicates how often the test phase of the network will be executed.

The number of iterations between two testing phases.

test_interval: 500
Carry out testing every 500 training iterations.

test_interval 表示网络迭代 test_interval 次进行一次测试。可以设置为网络训练完一个 epoch，进行一次测试。一个 epoch 为 5000 次迭代时，可以设置为 test_interval=5000。

display

屏幕显示间隔

This parameter indicates how often caffe should output results to the screen. This value is a positive integer and specifies an iteration count.

The number of iterations between displaying info. If display = 0, no info will be displayed.

type

This parameter indicates the back propagation algorithm used to train the network. This value is a quoted string.

Options include:

Stochastic Gradient Descent “SGD”
AdaDelta “AdaDelta”
Adaptive Gradient “AdaGrad”
Adam “Adam”
Nesterov’s Accelerated Gradient “Nesterov”
RMSprop “RMSProp”

loss function 可能是非凸的，没有解析解，需要通过优化方法来求解。Caffe 提供 6 种优化算法来求解最优参数，在 solver 配置文件中，通过设置 type 类型来选择。
solver 的主要作用是交替调用前向 (forward) 算法和后向 (backward) 算法来更新参数，从而最小化 loss，是一种迭代的优化算法。

batchsize 表示每迭代一次，网络训练图片的数量。如果 batchsize=256，则网络每迭代一次，训练 256 张图片。如果总图片张数为 2560000 张，将所有的图片通过网络训练一次，则需要 2560000/256=10000 次迭代。

epoch 表示将所有的训练图像全部通过网络训练一次。例如 5000 iteration 为一个 epoch，如果你想要网络训练 100 代时，则总的迭代次数为max_iteration=5000*100=500000次；

Solver Prototxt - 参数说明相关推荐

solver.prototxt参数说明（三）
http://www.mamicode.com/info-detail-1368127.html solver.prototxt net: "models/bvlc_alexnet/trai ...
solver.prototxt参数说明（二）
http://www.cnblogs.com/denny402/p/5074049.html solver算是caffe的核心的核心,它协调着整个模型的运作.caffe程序运行必带的一个参数就是sol ...
[转]caffe中solver.prototxt参数说明
https://www.cnblogs.com/denny402/p/5074049.html solver算是caffe的核心的核心,它协调着整个模型的运作.caffe程序运行必带的一个参数就是so ...
solver.prototxt参数说明（一）
http://blog.csdn.net/achaoluo007/article/details/43773903 在Deep Learning中,往往loss function是非凸的,没有解析解, ...
【AI】caffe使用步骤（三）：编写求解文件solver.prototxt
[一]参考博客 caffe solver 配置详解:http://www.mamicode.com/info-detail-2620709.html Caffe学习系列(7):solver及其配置:h ...
solver.prototxt文件里面参数含义及其设置
solver 是caffe的核心之重,它是整个模型运行的参数配置文件.运行代码一般为: #caffe train --solver=*_solver.prototxt 在DL中,损失函数(loss f ...
caffe中solver.prototxt文件参数解释
在训练或者微调网络时我们需要设置一些参数,在caffe中这些参数保存在sovler.prototxt文件中(当然这只是一个文件名,你也可以随意换成其他的名称).在下面的代码中以注释的形式解释每一个参数 ...
caffe 中solver.prototxt
关于cifar-10和mnist的weight_decay和momentum也是相当的重要:就是出现一次把cifar-10的两个值直接用在mnist上,发现错误很大. 转载于:https://www. ...
Caffe入门：lr_mult和decay_mult参数说明
一.Caffe网络定义:lr_mult和decay_mult 通常在 Caffe 的网络定义中,某些 layer 会有如下参数: param {lr_mult: xdecay_mult: y } 当令 ...
Caffe源码中Solver文件分析
Caffe源码(caffe version commit: 09868ac , date: 2015.08.15)中有一些重要的头文件,这里介绍下include/caffe/solver.hpp文件的 ...

Solver Prototxt - 参数说明

Solver Prototxt

1. caffe/examples/mnist/lenet_solver.prototxt

2. Parameters

base_lr

lr_policy

gamma

stepsize

stepvalue

max_iter

momentum

weight_decay

random_seed

solver_mode

snapshot

snapshot_prefix

net

iter_size

test_iter

test_interval

display

type

Solver Prototxt - 参数说明相关推荐

最新文章

热门文章