【AI】caffe使用步骤（三）：编写求解文件solver.prototxt

【一】参考博客

caffe solver 配置详解：http://www.mamicode.com/info-detail-2620709.html
Caffe学习系列(7)：solver及其配置：https://www.cnblogs.com/denny402/p/5074049.html

【二】solver求解文件详解

1、solver求解文件例子如下

net: "examples/resnet/train_test.prototxt"
test_iter: 100
test_interval: 2000base_lr: 0.1
lr_policy: "multistep"
gamma: 0.1stepvalue: 16000
stepvalue: 24000
stepvalue: 28000
max_iter: 28000display: 100
momentum: 0.9
weight_decay: 0.0005
snapshot: 1000
snapshot_prefix: "examples/resnet/train_test"solver_mode: CPU

2、solver的配置参数详解，总共有42个

1> net

训练网络用的prototxt文件，该文件可能包含不止一个的测试网络，通常不与train_net和test_net同时定义；

2> test_iter

测试网络前向推理的迭代次数，注意每测试迭代一次是一个测试网络定义的batch size大小，test_iter 与 batch_size 的乘积应为整个测试集的大小；

3> test_interval

训练时每隔多少次迭代则进行一次测试，默认为0即每次训练完后都会进行一次测试，应该要配置该参数，否则训练速度超级慢；

4> test_compute_loss

测试时是否计算损失值，默认为假 false，通常用于 debug 分析用；

5> test_initialization

在第一次训练迭代之前先运行一次测试，用于确保内存够用和打印初始的loss值，默认为真 true；

6> base_lr

初始的学习率

7> display

训练迭代多少次后显示相关信息到终端，如果置0则不会有任何有效信息打印；

8> average_loss

显示上一次迭代平均损失值的间隔，默认为1，通常不设定；

9> max_iter

用于计算 lr_policy 学习率调整策略为 poly 时的学习速率的变化

10> iter_size

用于多少个batch_size后再更新梯度，通常在GPU内存不足时用于扩展batch_size，真实的batch_size为iter_size*batch_size大小；

11> lr_policy

学习率调整策略，取值及说明如下

fixed：保持学习速率为初始学习速率 base_lr 不变.
step：如果设置为step，则还需要设置一个 stepsize，返回
```
base_lr * gamma ^ (floor(iter / stepsize))
```
其中iter表示当前的迭代次数，floor函数功能是向下取整。
exp：返回
```
base_lr * gamma ^ iter
```
iter为当前迭代次数
inv：如果设置为inv，还需要设置一个power，返回
```
base_lr * (1 + gamma * iter) ^ (- power)
```
multistep：如果设置为multistep，则还需要设置一个 stepvalue。这个参数和step很相似，step是均匀等间隔变化，而multstep则是根据stepvalue值变化
poly：学习率进行多项式误差，返回
```
base_lr * (1 - iter/max_iter) ^ (power)
```

sigmoid：学习率进行sigmod衰减，返回

 base_lr * ( 1/(1 + exp(-gamma * (iter - stepsize))))

12> gamma

用于计算学习率的参数，lr_policy 为step、exp、inv、sigmoid时会使用到；

13> power

用于计算学习率的参数，lr_policy 为inv、poly时会使用到；

14> momentum

上一次梯度值的权重，用来加权之前梯度方向对现在梯度下降方向的影响，一般取值在0.5–0.99之间。通常设为0.9，momentum可以让使用SGD的深度学习方法更加稳定以及快速。

15> weight_decay

权重衰减参数，用于防止模型过拟合；

16> regularization_type

正则化方式，默认为L2正则化，可选的有L0、L1及L2，用于防止模型过拟合；

17> stepsize

lr_policy 为“step”时，经过多少次训练迭代才会进行调整学习率；

18> stepvalue

lr_policy 为“multistep”时，经过多少次训练迭代会进行调整学习率，该参数可设置多个以用于多次学习率调整；

19> clip_gradients

限定梯度的最大值，用于防止梯度过大导致梯度爆炸；

20> snapshot

保存模型的间隔，即每隔多少次训练迭代保存一次模型快照，默认为0，即不保存；

21> snapshot_prefix

模型保存的路径及路径名，但无后缀扩展类型，如果不设定，则使用无扩展的prototxt路径和文件名来作为模型保存文件的路径和文件名；

22> snapshot_diff

是否保存梯度值，默认不保存，如果保存可帮助调试但会增大保存文件的大小；

23> snapshot_format

模型保存的类型，有“HDF5”和“BINARYPROTO”两种，默认为后者BINARYPROTO；

24> snapshot_after_train

默认为真，即训练后按照模型保存设定的参数来进行快照，否则直到训练结束都不会保存模型；

25> solver_mode

训练时使用CPU还是GPU，默认为GPU；

26> device_id

使用GPU时的设备id号，默认为0；

27> random_seed

随机种子，默认为-1，使用系统时钟作为随机种子；

28> type

优化算法选择。默认是SGD。共六种：SGD、AdaDelta、AdaGrad（Adaptive Gradient）、Adam 、Nesterov、RMSprop

29> delta

当优化算法为 RMSProp、AdaGrad、AdaDelta及Adam 计算值为0时的最小限定值，用于防止分母为0等溢出错误；

30> momentum2

“Adam”优化器的权重参数；

31> rms_decay

“RMSProp”优化器的衰减参数，其计算方式为

MeanSquare(t) = rms_decay*MeanSquare(t-1) + (1-rms_decay)*SquareGradient(t)

32> debug_info

默认为假，如果置真，则会打印模型网络学习过程中的状态信息，可用于分析调试；

33> solver_type

同“type”，已弃用

34> layer_wise_reduce

用于并行化训练数据，默认为真；

35> weights

用于微调时，加载预训练模型，和caffe命令行参数“–weights”功能一样，如果命令行也有定义“–weights”则其优先级更高将会覆盖掉solver文件中该参数的配置，如果存在多个权重模型用于加载，可使用逗号进行分离表示；

36> net_param

内联的训练网络prototxt定义，可能定义有不止一个的测试网络，通常忽略；

37> train_net

训练网络用的prototxt文件，通常不与net同时定义；

38> test_net

测试网络用的prototxt文件，通常不与net同时定义；

39> train_net_param

内联的训练网络prototxt定义，通常忽略；

40> test_net_param

内联的训练网络prototxt定义，通常忽略；

41> train_state

训练状态定义，默认为TRAIN，否则按照模型网络prototxt定义的来运行；

42> test_state

测试状态定义，默认为TEST并在测试集上进行测试，否则按照模型网络prototxt定义的来运行；

【三】solver的配置参数官方说明

参见：caffe/src/caffe/proto/caffe.proto中message SolverParameter

// NOTE
// Update the next available ID when you add a new SolverParameter field.
// 在添加新的SolverParameter字段时更新下一个可用ID。
// SolverParameter next available ID: 43 (last added: weights)
message SolverParameter {//// Specifying the train and test networks 指定train和test网络//// Exactly one train net must be specified using one of the following fields: 必须使用下列字段之一指定一个 train 网络://     train_net_param, train_net, net_param, net// One or more test nets may be specified using any of the following fields: 一个或多个测试网可使用下列任何一项指定://     test_net_param, test_net, net_param, net// If more than one test net field is specified (e.g., both net and test_net are specified), // they will be evaluated in the field order given above: (1) test_net_param, (2) test_net, (3) net_param/net.// A test_iter must be specified for each test_net.// A test_level and/or a test_stage may also be specified for each test_net.//// 如果指定了多个测试网络字段(例如，同时指定了net和test_net)，它们将按照上面给出的字段顺序进行计算:(1)test_net_param， (2) test_net， (3) net_param/net。// 必须为每个test_net指定一个test_iter。还可以为每个test_net指定test_level和/或test_stage。//// Proto filename for the train net, possibly combined with one or more test nets. train网络的原始文件名，可能与一个或多个测试网组合。optional string net = 24;// Inline train net param, possibly combined with one or more test nets. train网络内部参数，可能与一个或多个测试网相结合。optional NetParameter net_param = 25;optional string train_net = 1; // Proto filename for the train net.repeated string test_net = 2; // Proto filenames for the test nets.optional NetParameter train_net_param = 21; // Inline train net params.repeated NetParameter test_net_param = 22; // Inline test net params.// The states for the train/test nets. Must be unspecified or specified once per net.// By default, train_state will have phase = TRAIN, and all test_state's will have phase = TEST.// Other defaults are set according to the NetState defaults.// train/测试网的状态。必须在每个网络中指定或未指定一次。// 默认情况下，train_state将有phase = TRAIN，所有test_state的都将有phase = TEST。// 其他默认值是根据NetState默认值设置的。optional NetState train_state = 26;repeated NetState test_state = 27;// The number of iterations for each test net. 每个测试网络的迭代次数。repeated int32 test_iter = 3;// The number of iterations between two testing phases. 两次测试阶段之间的迭代次数。optional int32 test_interval = 4 [default = 0];optional bool test_compute_loss = 19 [default = false];// If true, run an initial test pass before the first iteration, ensuring memory availability and printing the starting value of the loss.// 如果为真，在第一次迭代之前运行一个初始测试通过，确保内存可用性并打印损失的初始值。optional bool test_initialization = 32 [default = true];optional float base_lr = 5; // The base learning rate// the number of iterations between displaying info. If display = 0, no info will be displayed.// 显示信息之间的迭代次数。如果display = 0，则不显示任何信息。optional int32 display = 6;// Display the loss averaged over the last average_loss iterations// 显示上次average_loss迭代的平均损失optional int32 average_loss = 33 [default = 1];optional int32 max_iter = 7; // the maximum number of iterations 最大迭代次数// accumulate gradients over `iter_size` x `batch_size` instances// 积累梯度 = `iter_size` x `batch_size`optional int32 iter_size = 36 [default = 1];// The learning rate decay policy. The currently implemented learning rate policies are as follows:// 学习率衰减政策。目前实施的学习率政策如下://    - fixed: always return base_lr.//    - step: return base_lr * gamma ^ (floor(iter / step))//    - exp: return base_lr * gamma ^ iter//    - inv: return base_lr * (1 + gamma * iter) ^ (- power)//    - multistep: similar to step but it allows non uniform steps defined by stepvalue//    - poly: the effective learning rate follows a polynomial decay, to be zero by the max_iter. return base_lr (1 - iter/max_iter) ^ (power)//    - sigmoid: the effective learning rate follows a sigmod decay return base_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))//// where base_lr, max_iter, gamma, step, stepvalue and power are defined in the solver parameter protocol buffer, and iter is the current iteration.// 其中base_lr、max_iter、gamma、step、stepvalue和power定义在求解器参数协议缓冲区中，iter为当前迭代。optional string lr_policy = 8;optional float gamma = 9;  // The parameter to compute the learning rate.optional float power = 10; // The parameter to compute the learning rate.optional float momentum = 11; // The momentum value.optional float weight_decay = 12; // The weight decay.// regularization types supported: L1 and L2 controlled by weight_decay 支持正则化类型:L1和L2由 weight_decay 衰减控制optional string regularization_type = 29 [default = "L2"];// the stepsize for learning rate policy "step" 学习速率策略的步长“step”optional int32 stepsize = 13;// the stepsize for learning rate policy "multistep" 学习速率策略“多步”的步长repeated int32 stepvalue = 34;// Set clip_gradients to >= 0 to clip parameter gradients to that L2 norm, whenever their actual L2 norm is larger.// 将clip_gradients设置为>= 0，以便在实际L2范数较大时将参数梯度剪切到该L2范数。optional float clip_gradients = 35 [default = -1];optional int32 snapshot = 14 [default = 0]; // The snapshot interval 快照时间间隔// The prefix for the snapshot.快照的前缀。// If not set then is replaced by prototxt file path without extension. 如果没有设置，则用不带扩展名的prototxt文件路径替换。// If is set to directory then is augmented by prototxt file name without extention. 如果将其设置为目录，则使用不带扩展名的prototxt文件名进行扩展。optional string snapshot_prefix = 15;// whether to snapshot diff in the results or not. Snapshotting diff will help debugging but the final protocol buffer size will be much larger.// 是否在结果中快照差异。快照差异将有助于调试，但最终协议缓冲区的大小将大得多。optional bool snapshot_diff = 16 [default = false];enum SnapshotFormat {HDF5 = 0;BINARYPROTO = 1;}optional SnapshotFormat snapshot_format = 37 [default = BINARYPROTO];// the mode solver will use: 0 for CPU and 1 for GPU. Use GPU in default.enum SolverMode {CPU = 0;GPU = 1;}optional SolverMode solver_mode = 17 [default = GPU];// the device_id will that be used in GPU mode. Use device_id = 0 in default. device_id将在GPU模式下使用。默认情况下使用device_id = 0。optional int32 device_id = 18 [default = 0];// If non-negative, the seed with which the Solver will initialize the Caffe random number generator -- useful for reproducible results. Otherwise, (and by default) initialize using a seed derived from the system clock.// 如果非负，求解器将使用种子初始化Caffe随机数生成器——这对于可重现的结果非常有用。否则，(默认情况下)使用从系统时钟派生的种子初始化。optional int64 random_seed = 20 [default = -1];// type of the solver 求解器的类型optional string type = 40 [default = "SGD"];// numerical stability for RMSProp, AdaGrad and AdaDelta and Adam RMSProp、AdaGrad、AdaDelta和Adam的数值稳定性optional float delta = 31 [default = 1e-8];// parameters for the Adam solver Adam求解器的参数optional float momentum2 = 39 [default = 0.999];// RMSProp decay value RMSProp衰减值// MeanSquare(t) = rms_decay*MeanSquare(t-1) + (1-rms_decay)*SquareGradient(t)optional float rms_decay = 38 [default = 0.99];// If true, print information about the state of the net that may help with debugging learning problems.// 如果是true，打印有关网络状态的信息，这些信息可能有助于调试学习问题。optional bool debug_info = 23 [default = false];// If false, don't save a snapshot after training finishes. 如果为false，培训结束后不要保存快照。optional bool snapshot_after_train = 28 [default = true];// DEPRECATED: old solver enum types, use string insteadenum SolverType {SGD = 0;NESTEROV = 1;ADAGRAD = 2;RMSPROP = 3;ADADELTA = 4;ADAM = 5;}// DEPRECATED: use type instead of solver_typeoptional SolverType solver_type = 30 [default = SGD];// Overlap compute and communication for data parallel trainingoptional bool layer_wise_reduce = 41 [default = true];// Path to caffemodel file(s) with pretrained weights to initialize finetuning.// Tha same as command line --weights parameter for caffe train command.// If command line --weights parameter is specified, it has higher priority// and overwrites this one(s).// If --snapshot command line parameter is specified, this one(s) are ignored.// If several model files are expected, they can be listed in a one // weights parameter separated by ',' (like in a command string) or// in repeated weights parameters separately.// 使用预先训练的权重初始化finetuning的caffemodel文件的路径。repeated string weights = 42;
}