Intel Caffe int8 推理校准工具

西风东渐，青出于蓝。各位看官可移步 BUG1989/caffe-int8-convert-tools。另推荐一篇原理介绍：TensorRT(5)-INT8校准原理。

Intel int8 校准的代码源自 Ristretto，虽然没有 8-bit Inference with TensorRT 和 [MXNET-133] Model Quantization with Calibration 中使用 KL散度（Kullback-Leibler Divergence）的功能高级，但仍可作为 Caffe 用户的福音。待 Tencent/ncnn
int8 成熟之后，边缘计算的速度又可再获提升。

QuantizationParameter

// Message for layers with reduced word with arithmetic
message QuantizationParameter{enum Precision {DYNAMIC_FIXED_POINT = 0;}optional Precision precision = 1 [default = DYNAMIC_FIXED_POINT];enum Rounding {NEAREST = 0;}optional Rounding rounding_scheme = 2 [default = NEAREST];// Dynamic fixed point word widthoptional uint32 bw_layer_in = 3 [default = 32];optional uint32 bw_layer_out = 4 [default = 32];optional uint32 bw_params = 5 [default = 32];repeated int32 fl_layer_in = 6;repeated int32 fl_layer_out = 7;repeated int32 fl_params = 8;repeated float scale_in = 20;repeated float scale_out = 21;repeated float scale_params = 22;
}

calibrator.py

Created with Raphaël 2.2.0 calibrator model,weights check_existence setup_env accuracy_blob_name_parser generate_sample get_the_accuracy tuning_quantized_topology quantized_prototxt End

解析参数。

if __name__ == '__main__':usage_string = 'Usage: 1.Build the caffe\n ' \'2.cd /path/to/caffe/scripts\n ' \'3.python calibrator.py ' \' -r /path/to/caffe/build ' \' -w pre-trained-fp32 weights ' \' -m typology ' \' -i iterations ' \' -l acceptable accuracy loss value, the default value is 0.01 stands for one percent' \' -d 1(0 means classification while 1 means detection, the default value is 0' \' -n blob name which means accuracy' \' -c scaling mode, the default value is single' \' -s sampling iterations'parser = argparse.ArgumentParser(add_help=False)parser.add_argument('-h', '--help', action='help', help=usage_string)parser.add_argument('-i', '--iterations', action='store', dest='iterations', default=10,help='equal to the number to complete one epoch.')parser.add_argument('-w', '--weights', action='store', dest='weights', default='',help='pre-trained-fp32-weights.')parser.add_argument('-m', '--model', action='store', dest='model', default='',help='topology definition prototxt.')parser.add_argument('-l', '--accuracy_loss', action='store', dest='loss', default=0.01,help='the acceptable accuracy loss that raised by 8-Bit quantization, ''default value is 0.01(one percent).')parser.add_argument('-d', '--detection', action='store', dest='is_detection', default=0,help='0 for classification while 1 for detection, default value is 0.')parser.add_argument('-r', '--root', action='store', dest='root', default='',help='caffe build path')parser.add_argument('-n', '--blob_name', action='store', dest='blob_name', default='',help='top blob name which stands for accuracy')parser.add_argument('-c', '--weights_channel', action='store', dest='scaling_mode', default='single',help='the scaling mode for weights')parser.add_argument('-s', '--sampling_iterations', action='store', dest='sampling_iterations', default=10,help='iteration number of sampling, the default value is 10.')params = parser.parse_args()

如果未指定 caffe 的根路径则退出。

    if not check_existence(params.root):print 'Please check the {} existence.'.format(params.root)sys.exit(-1)

在系统中加入 pycaffe 的路径，导入 caffe。

    pycaffe_path = os.path.abspath(os.path.dirname(os.path.abspath(params.root))) + os.path.sep + 'python'if not check_existence(pycaffe_path):print "Please check the pycaffe existence.Suggest to rebuild pycaffe via 'make pycaffe'"sys.path.insert(0, pycaffe_path)import caffefrom caffe.proto import caffe_pb2

设置迭代次数。

    try:user_input_iterations = int(params.iterations)except:print 'Set the iterations to the default value 1000'user_input_iterations = 1000else:if user_input_iterations < 1:print 'Invalid iterations!The value should be larger than zero.'sys.exit(-1)

设置采样次数。

    try:user_sampling_iteration = int(params.sampling_iterations)except:print 'Set the sampling iteration to the default value 10'user_sampling_iteration = 10else:if user_sampling_iteration < 1:print 'Invalid sampling iteration!The value should be larger than zero.'sys.exit(-1)

设置权重的缩放模式。

    if params.scaling_mode != 'multipe' and params.scaling_mode != 'single':user_scaling_mode = 'single'else:user_scaling_mode = params.scaling_mode

设置损失的容忍度。

    try:toleration = float(params.loss)if toleration >= 1 or toleration < 0:toleration = 0.01except:print 'Set the toleration to 1%.'toleration = 0.01

设置检测标志。

    try:detection_flag = 1 if int(params.is_detection) == 1 else 0except:print 'Set the test type to classification.'detection_flag = 0

检查模型 prototxt 是否存在。

    model = os.path.abspath(params.model)if not check_existence(model):print 'Please check model: {} existence.'.format(model)sys.exit(-1)

检查权重是否存在。

    user_input_weights = os.path.abspath(params.weights)if not check_existence(user_input_weights):print 'Please check weights: {} existence.'.format(user_input_weights)sys.exit(-1)

检查tools路径下的采样程序是否存在。

    sample = os.path.abspath(params.root + os.path.sep + 'tools/sample')if not check_existence(sample):print 'Please check sample: {} existence.'.format(sample)sys.exit(-1)

检查 caffe 可执行程序是否存在。

    caffe_bin_path = os.path.abspath(params.root + os.path.sep + 'tools/caffe')if not check_existence(caffe_bin_path):print 'Please check model/weights/sample existence.'sys.exit(-1)

setup_env() 改变当前工作目录到指定的路径，设置 caffe 模式为 cpu。

 setup_env()

如果未指定target_blob_name且不是检测，则调用accuracy_blob_name_parser函数。
对于检查任务，需要指定用于预期精度或检测输出值的 blob 名称。
accuracy_blob_name_parser函数读取模型描述文件，并试图找到Accuracy层的输出。

    target_blob_name = params.blob_nameif not target_blob_name and not detection_flag:target_blob_name = accuracy_blob_name_parser(model)

如果未指定target_blob_name或模型描述文件中没有target_blob_name，则退出。

    if not target_blob_name or not check_blob_name_existence(model, target_blob_name):print 'Please specify valid blob name and rerun the script.'sys.exit(-1)

str.rsplit使用sep作为分隔符字符串，返回字符串中单词的列表。
生成一个新的网络描述文件名和权重文件名。
generate_sample通过 shell 脚本的形式调用 sample 程序。

    quantized_prototxt = model.rsplit('.')[0] + '_quantized.prototxt'quantized_weights = user_input_weights.rsplit('.')[0] + '_quantized.caffemodel'enable_power_of_2 = 0print 'Sampling...'generate_sample(sample, model, user_input_weights, quantized_prototxt, detection_flag, user_scaling_mode,user_sampling_iteration, 100 * toleration, enable_power_of_2)print 'Sampling done'

get_the_accuracy 通过 shell 脚本的形式调用 test 程序获得模型准确率。

    print 'Generating the FP32 accuracy...'top_1 = get_the_accuracy(caffe_bin_path, model, user_input_weights, user_input_iterations, detection_flag,target_blob_name)print 'FP32 accuracy is: {}'.format(top_1)

tuning_quantized_topology 从前往后取消卷积层的量化，直至准确率满足要求。

    tuning_quantized_topology(top_1, quantized_prototxt, caffe_bin_path, user_input_weights, user_input_iterations,enable_power_of_2, toleration, detection_flag, target_blob_name)print 'Updated prototxt {} is generated.'.format(quantized_prototxt)

setup_env()

os.chdir() 方法用于改变当前工作目录到指定的路径。
os.path.dirname(path) 返回路径名路径的目录名称。这是通过将路径传递给函数split()返回的对中的第一个元素。

def setup_env():os.chdir(os.path.dirname(os.path.abspath(params.root)))caffe.set_mode_cpu()

accuracy_blob_name_parser

read_prototxt通过google.protobuf.text_format解析模型描述文件，得到网络的定义。

    net = read_prototxt(prototxt)if not net:print 'Please check the model prototxt integrity.'sys.exit(-1)

尝试寻找网络的Accuracy层，如果该层有accuracy_param参数则设置 top_k；否则将输出存储到res。

    res = {}for i in net.layer:if i.type == 'Accuracy':if i.HasField('accuracy_param'):res[i.accuracy_param.top_k] = i.top[0]else:res[1] = i.top[0]

对res的键进行排序，返回第一个键的值。

    return res[sorted(res.keys())[0]] if res else ''

generate_sample

os.system 在子 shell 中执行命令（字符串）。这是通过调用标准C 函数system()来实现的，并且具有相同的限制。对sys.stdin等的更改不会反映在已执行命令的环境中。

def generate_sample(sample_path, input_model, weights,quantized_model, detection, scaling_mode, iterations=1, error_margin=1, power=0):cmd = '{0} quantize -model {1} -weights {2} -model_quantized {3} -iterations {4} -error_margin {5} -power {6}' \' -scaling {7} -trimming_mode dynamic_fixed_point'.format(sample_path, input_model, weights, quantized_model,iterations, error_margin, power, scaling_mode)if detection:cmd += ' --detection=1'os.system(cmd)

get_the_accuracy

在子 shell 中调用 test。tee命令将数据重定向到文件。

    output_log_name = 'calibrator_log.txt'cmd = '{} test -model {} -weights {} -iterations {}'.format(caffe_bin, model_def, model_weights, iterations)if detection:cmd += ' -detection'cmd += ' 2>&1|tee {}'.format(output_log_name)os.system(cmd)

tuning_quantized_topology

如果使用定点，调用 transform_convolutions。
get_compiled_net通过 shell 脚本的形式调用 test 程序得到输出文件。

    if is_floating_point == 0:print 'Updating quantization parameter...'transform_convolutions(prototxt, get_compiled_net(caffe_bin, prototxt, model_weights, detection))

获取当前的准确率。

 current_top1_accuracy = get_the_accuracy(caffe_bin, prototxt, model_weights, iterations, detection, blob_name)

执行remove_top_quantized_parameter直到准确率损失满足要求。
remove_top_quantized_parameter从前往后，每次清除一层量化卷积的参数值。

    while abs(current_top1_accuracy - base_top1_accuracy) >= accuracy_loss:print 'Tuning... 'print abs(current_top1_accuracy - base_top1_accuracy)remove_top_quantized_parameter(prototxt)current_top1_accuracy = get_the_accuracy(caffe_bin, prototxt, model_weights, iterations, detection, blob_name)

transform_convolutions

将文件中的信息合并到net中。

    net = caffe_pb2.NetParameter()with open(model_path) as f:s = f.read()txtf.Merge(s, net)

get_fusion_conv_names返回输入大于1的卷积层名及其第2个输入。fusion 是 intel/ideep 的一种优化。

    fusion_layer = get_fusion_conv_names(compiled_model_path)new_net = copy.deepcopy(net)

获得卷积层名及其索引。

 convolution_layers = [(value, index) for index, value in enumerate(net.layer) if value.type == 'Convolution']

    interesting_layers = ['ReLU']skip_layers = ['Convolution', 'Eltwise', 'Concat']

filter_fusion_layers 过滤已在fusion_layer中的层。

    u8_max = 255s8_max = 127u8_layers = filter_fusion_layers(net, fusion_layer, convolution_layers)

get_all_bottom_layers 获得输出给 Relu 的层？
get_all_top_layers 获得输入接 Relu 的层？

    for (l, index) in convolution_layers:outputwith_relu = get_all_bottom_layers(net.layer[index].top[0], net, index + 1, skip_layers,interesting_layers)conv_relu_flag = check_relu_existence(net, index,convolution_layers[convolution_layers.index((l, index)) + 1][1]if (l, index) != convolution_layers[-1]else len(net.layer), [i[0] for i in fusion_layer])inputwith_relu = get_all_top_layers(l, net, index, skip_layers, interesting_layers)

对于层输出，卷积后接 Relu 的层使用 u8 ？
Quantization::QuantizeNet() 获得输入输出及参数的数值范围，这里计算其缩放系数。系数四舍五入到小数点后两位。

        for si in range(0, len(new_net.layer[index].quantization_param.scale_out)):if len(outputwith_relu) > 0 or l.name in u8_layers or conv_relu_flag:  # u8new_net.layer[index].quantization_param.scale_out[si] = round(u8_max / new_net.layer[index].quantization_param.scale_out[si], 2)else:  # s8new_net.layer[index].quantization_param.scale_out[si] = round(s8_max / new_net.layer[index].quantization_param.scale_out[si], 2)

对于层输入，如果其接 Relu 层或者其为卷积，则使用 u8，否则不进行量化。

        for si in range(0, len(new_net.layer[index].quantization_param.scale_in)):if len(inputwith_relu) > 0 or l.type == 'Convolution':  # u8new_net.layer[index].quantization_param.scale_in[si] = round(u8_max / new_net.layer[index].quantization_param.scale_in[si], 2)else:new_net.layer[index].ClearField('quantization_param')continue

层参数使用 s8。

        for si in range(0, len(new_net.layer[index].quantization_param.scale_params)):new_net.layer[index].quantization_param.scale_params[si] = round(s8_max / new_net.layer[index].quantization_param.scale_params[si], 2)

将网络描述保存为文件。

    with open(model_path, 'w') as f:f.write(str(new_net))

filter_fusion_layers

filter_fusion_layers 同时输入原网络和量化网络的卷积层。

如果输入的fusion_layer或conv_layer为空，则返回。

    if not fusion_layer or not conv_layer:return []

get_all_bottom_layers 获取所有以特定 blob 为输入的层。
如果没有以该层为输入，则加入到output_with_relu_layer。

    interesting_layers = ['ReLU']skip_layers = ['Convolution', 'Eltwise', 'Concat']output_with_relu_layer = [(l.name, net.layer[index].top[0]) for l, index in conv_layerif len(get_all_bottom_layers(net.layer[index].top[0], net, index + 1,skip_layers, interesting_layers)) == 0]

删除已在fusion_layer中的层。

    output_without_dict = {v: k for (k, v) in output_with_relu_layer}for layer_name, top_name in fusion_layer:if top_name in output_without_dict.keys():del output_without_dict[top_name]return output_without_dict.values()

get_fusion_conv_names

读取量化后网络(quantized_prototxt) test 输出的日志文件，返回输入大于1的卷积层层名和第2个 blob。

    compiled_net = caffe_pb2.NetParameter()with open(compiled_model) as f:s = f.read()txtf.Merge(s, compiled_net)return [(layer.name, layer.bottom[1]) for _, layer in enumerate(compiled_net.layer)if layer.type == 'Convolution' and len(layer.bottom) > 1]

get_all_bottom_layers

get_bottom_layers 获得从start层开始，以top_name为输入的层。
如果为空则中止。

    all_bottom_layers = []bottom_layers = get_bottom_layers(top_name, net, start)while True:if len(bottom_layers) == 0:break

对于这些层，去除属于skip_layers中类型的层，仅保留一个interesting_layers中的层，否则去除该层并追加以该层为输入的层。
感觉不如 if else 结构清晰。修改bottom_layers也会影响processed_layers。

        processed_layers = bottom_layers  # sync bottom_layers changefor (li, ln, lt) in processed_layers:if lt in skip_layers:bottom_layers.remove((li, ln, lt))continueif lt in interesting_layers:lp = (li, ln, lt)bottom_layers.remove(lp)if lp not in all_bottom_layers:all_bottom_layers.append(lp)continuenew_bottom_layers = get_bottom_layers(ln, net, li + 1)bottom_layers.remove((li, ln, lt))bottom_layers.extend(new_bottom_layers)

get_bottom_layers

获得从start层开始，以top_name为输入的层。

    bottom_layers = []for index, value in enumerate(net.layer[start:]):for sub_index, sub_value in enumerate(value.bottom):if sub_value == top_name:bottom_layers.append((index, value.name, value.type))return bottom_layers

get_all_top_layers

get_top_layers 返回输出为l层输入的层列表。

    all_top_layers = []top_layers = get_top_layers(l, net, end)while True:if len(top_layers) == 0:break

处理模式与 get_all_bottom_layers 相同。

        processed_layers = top_layers  # sync topLayers changefor (li, ln, lt) in processed_layers:if lt in skip_layers:top_layers.remove((li, ln, lt))continueif lt in interesting_layers:lp = (li, ln, lt)top_layers.remove(lp)if lp not in all_top_layers:all_top_layers.append(lp)continuenew_top_layers = get_top_layers(net.layer[li], net, li)top_layers.remove((li, ln, lt))top_layers.extend(new_top_layers)return all_top_layers

get_top_layers

get_top_layers 从第end - 1层开始逆序遍历。
对于每一层，如果其输出是l层的输入，则添加到top_layers中。

    top_layers = []for layerIndex in range(0, end):reverse_layer_index = end - layerIndex - 1for blobIndex in range(0, len(net.layer[reverse_layer_index].top)):if net.layer[reverse_layer_index].top[blobIndex] in l.bottom:top_layers.append((reverse_layer_index, net.layer[reverse_layer_index].name,net.layer[reverse_layer_index].type))return top_layers

Quantization::QuantizeNet()

Created with Raphaël 2.2.0 QuantizeNet model_,weights_ CheckWritePermissions CopyTrainedLayersFrom RunForwardBatches Quantize2DynamicFixedPoint model_quantized_ End

  CheckWritePermissions(model_quantized_);float accuracy;Net<float>* net_test = new Net<float>(model_, caffe::TEST);net_test->CopyTrainedLayersFrom(weights_);RunForwardBatches(this->iterations_, net_test, &accuracy, true, this->score_number); // RangeInLayer during samplingdelete net_test;// Do network quantization and scoring.if (trimming_mode_ == "dynamic_fixed_point") {Quantize2DynamicFixedPoint();} else {LOG(FATAL) << "Unknown trimming mode: " << trimming_mode_;}

Quantization::Quantize2DynamicFixedPoint()

找到动态定点数的整数长度。选择使得不发生饱和的整数长度。这种近似假定了一个无限长的小数部分。对于层激活，我们将整数长度减少一位。

对于每一层，如果采用幂表示，则计算出输入和输出的整型长度，否则保存输入和输出的幅值。
Net::RangeInLayers产生layer_names_的内容。

  vector<int> lens;vector<float> scales;for (int i = 0; i < layer_names_.size(); ++i) {if (this->power) {il_in_.push_back((int)ceil(log2(max_in_[i])));il_out_.push_back((int)ceil(log2(max_out_[i])));} else {scale_in_.push_back(max_in_[i]);scale_out_.push_back(max_out_[i]);}

如果缩放是"single"模式，根据最大参数计算长度，否则根据每个卷积核的最大参数计算出相应长度。

    if (this->scaling == "single") {if (this->power)lens.push_back((int)ceil(log2(max_params_[i][0])+1));elsescales.push_back(max_params_[i][0]);} else {for (int j = 0; j < max_params_[i].size(); j++) {if (this->power)lens.push_back((int)ceil(log2(max_params_[i][j])+1));elsescales.push_back(max_params_[i][j]+0.0);}}

获得参数的整型长度。

    if (this->power) {il_params_.push_back(lens);lens.clear();} else {scale_params_.push_back(scales);scales.clear();}}

输出日志信息。

  // Debugfor (int k = 0; k < layer_names_.size(); ++k) {if (this->scaling != "single") {if (this->power)LOG(INFO) << "Layer " << layer_names_[k] << ", parameters channel=" << il_params_[k].size();elseLOG(INFO) << "Layer " << layer_names_[k] << ", parameters channel=" << scale_params_[k].size();}if (this->power) {LOG(INFO) << "Integer length input=" << il_in_[k];LOG(INFO) << "Integer length output=" << il_out_[k];} else {LOG(INFO) << "Scale input=" << scale_in_[k];LOG(INFO) << "Scale output=" << scale_out_[k];}if (this->scaling == "single") {if (this->power)LOG(INFO) << "Integer length param=" << il_params_[k][0];elseLOG(INFO) << "Scale param=" << scale_params_[k][0];} else {if (this->power){for (int j = 0; j < il_params_[k].size(); j++) {LOG(INFO) << "Integer length params[" << j << "]=" << il_params_[k][j];}} else{for (int j = 0; j < scale_params_[k].size(); j++) {LOG(INFO) << "Scale params[" << j << "]=" << scale_params_[k][j];}}}}

设置卷积参数及输入输出的位宽。

  // Choose bit-width for different network partsbw_conv_params_ = 8; bw_out_ = 8;bw_in_ = bw_out_;

从网络描述文件读取参数，调用EditNetDescriptionDynamicFixedPoint函数，保存量化参数到文件。

  NetParameter param;// Score dynamic fixed point network.// This network combines dynamic fixed point parameters in convolutional and// inner product layers, as well as dynamic fixed point activations.caffe::ReadNetParamsFromTextFileOrDie(model_, &param);EditNetDescriptionDynamicFixedPoint(&param, "Convolution","Parameters_and_Activations", bw_conv_params_, bw_in_,bw_out_);WriteProtoToTextFile(param, model_quantized_);

EditNetDescriptionDynamicFixedPoint

对于每一层，如果layer_quantize中有卷积层，且该层为卷积层。
std::string::npos为size_t的最大值。

  int index = 0;bool first_convolution = false;for (int i = 0; i < param->layer_size(); ++i) {// TODO: move first convolution check to transform scriptif (layer_quantize.find("Convolution") != string::npos &&param->layer(i).type().find("Convolution") != string::npos) {

若是第一个卷积层则跳过。第一个卷积不量化？

      if (!first_convolution) {first_convolution = true;continue;}

如果net_part有参数，设置层类型为卷积，设置卷积位宽。
如果是幂表示，获得小数部分位宽，否则设置缩放参数。
GetIntegerLengthParams 获得层参数在il_params_中的索引。
GetScaleParams 获得层参数在scale_params_中的索引。

      // quantize parametersif (net_part.find("Parameters") != string::npos) {LayerParameter* param_layer = param->mutable_layer(i);param_layer->set_type("Convolution");if (trimming_mode_ == "dynamic_fixed_point") {param_layer->mutable_quantization_param()->set_bw_params(bw_conv);if (this->power) {vector<int> vals = GetIntegerLengthParams(param->layer(i).name());for (int j = 0; j < vals.size(); j++) {vals[j] = bw_conv - vals[j];param_layer->mutable_quantization_param()->add_fl_params(vals[j]);}} else {vector<float> vals = GetScaleParams(param->layer(i).name());for (int j = 0; j < vals.size(); j++) {param_layer->mutable_quantization_param()->add_scale_params(vals[j]);}}}}

如果net_part，中有激活层，设置层类型为卷积，设置输入输出的位宽。

      // quantize activationsif (net_part.find("Activations") != string::npos) {LayerParameter* param_layer = param->mutable_layer(i);param_layer->set_type("Convolution");if (trimming_mode_ == "dynamic_fixed_point") {param_layer->mutable_quantization_param()->set_bw_layer_in(bw_in);param_layer->mutable_quantization_param()->set_bw_layer_out(bw_out);if (this->power) {int val = GetIntegerLengthIn(param->layer(i).name());param_layer->mutable_quantization_param()->add_fl_layer_in(bw_in - val);val = GetIntegerLengthOut(param->layer(i).name());param_layer->mutable_quantization_param()->add_fl_layer_out(bw_out - val);} else {float val = GetScaleIn(param->layer(i).name());param_layer->mutable_quantization_param()->add_scale_in(val);val = GetScaleOut(param->layer(i).name());param_layer->mutable_quantization_param()->add_scale_out(val);}}}

设置层参数的精度类型。

      LayerParameter* param_layer = param->mutable_layer(i);if (trimming_mode_ == "dynamic_fixed_point") {param_layer->mutable_quantization_param()->set_precision(caffe::QuantizationParameter_Precision(0));} else {LOG(FATAL) << "Unknown trimming mode: " << trimming_mode_;}index++;}}

Net::RangeInLayers

如果layer_name为空，遍历layers_，找出卷积层并初始化layer_name、max_in、max_out和max_param。

  // Initialize vector elements, if needed.if(layer_name->size()==0) {for (int layer_id = 0; layer_id < layers_.size(); ++layer_id) {if (strcmp(layers_[layer_id]->type(), "Convolution") == 0) {layer_name->push_back(this->layer_names()[layer_id]);max_in->push_back(0);max_out->push_back(0);if (scaling == "single") {max_param->push_back(vector<Dtype>(1, 0));}else {int param_shape = (&(*layers_[layer_id]->blobs()[0]))->shape(0);max_param->push_back(vector<Dtype>(param_shape, 0));}}}}

对于每个卷积层，调用 FindMax 函数获得第一个 bottom 和 top blob 的最大值。

  // Find maximal values.int index = 0;vector<Dtype> max_vals;for (int layer_id = 0; layer_id < layers_.size(); ++layer_id) {if (strcmp(layers_[layer_id]->type(), "Convolution") == 0) {max_vals = FindMax(bottom_vecs_[layer_id][0]);max_in->at(index) = std::max(max_in->at(index), max_vals.at(0)); max_vals = FindMax(top_vecs_[layer_id][0]);max_out->at(index) = std::max(max_out->at(index), max_vals.at(0));

如果缩放模式为"single"，找到参数的最大值，否则记录每个卷积核的最大值。

      // Consider the weights only, ignore the biasif (scaling == "single") {max_vals = FindMax(&(*layers_[layer_id]->blobs()[0]));max_param->at(index).at(0) = std::max(max_param->at(index).at(0), max_vals.at(0));} else {max_vals = FindMax(&(*layers_[layer_id]->blobs()[0]), false);for(int i = 0; i < max_vals.size(); ++i) max_param->at(index).at(i) = std::max(max_param->at(index).at(i), max_vals.at(i));}index++;}}

Net::FindMax

获取 blob 数据。

  const Dtype* data = blob->cpu_data();int cnt = blob->count();vector<Dtype> max_vals;Dtype max_val = (Dtype)(-10);

如果 blob 为4维，通过循环遍历找出最大值。
channel表意不明。

  int index = 0;if(blob->shape().size() == 4) {if(is_single) {max_vals = vector<Dtype>(1, Dtype(-10));for (int i = 0; i < cnt; ++i) {max_val = std::max(max_val, (Dtype)fabs(data[i]));}max_vals.at(0) = max_val;} else { // output_channel * input_channel * kernel_height * kernel_widthint height = blob->shape(2);int width = blob->shape(3);int channel = blob->shape(0);max_vals = vector<Dtype>(channel, Dtype(-10));int step = blob->shape(1) * height * width;for (int i = 0; i < cnt; ++i) {if((i + 1) % step == 0) {max_vals.at(index) = std::max(max_val, (Dtype)fabs(data[i]));++index;} else {max_val = std::max(max_val, (Dtype)fabs(data[i]));}}}

处理 CHW 格式的数据。

  } else {if(is_single) {max_vals = vector<Dtype>(1, Dtype(-10));for (int i = 0; i < cnt; ++i) {max_val = std::max(max_val, (Dtype)fabs(data[i]));}max_vals.at(0) = max_val;} else { // output_channel * input_channelint channel = blob->shape(0);max_vals = vector<Dtype>(channel, Dtype(-10));int step = blob->shape(1);for (int i = 0; i < cnt; ++i) {if((i + 1) % step == 0) {max_vals.at(index) = std::max(max_val, (Dtype)fabs(data[i]));++index;} else {max_val = std::max(max_val, (Dtype)fabs(data[i]));}}}}return max_vals;

Quantization::RunForwardBatches

初始化变量。使用网络描述文件中的输入。

  LOG(INFO) << "Running for " << iterations << " iterations.";vector<Blob<float>* > bottom_vec;vector<int> test_score_output_id;vector<float> test_score;float loss = 0;

对于每次迭代，统计各层卷积数值分布，记录损失值。

  for (int i = 0; i < iterations; ++i) {float iter_loss;// Do forward propagation.const vector<Blob<float>*>& result =caffe_net->Forward(bottom_vec, &iter_loss);// Find maximal values in network.if(do_stats) {caffe_net->RangeInLayers(&layer_names_, &max_in_, &max_out_, &max_params_, this->scaling);}// Keep track of network score over multiple batches.loss += iter_loss;

如果是检测，跳过后续步骤。

 if (this->detection) continue;

累加每次迭代的测试得分。

    int idx = 0;for (int j = 0; j < result.size(); ++j) {const float* result_vec = result[j]->cpu_data();for (int k = 0; k < result[j]->count(); ++k, ++idx) {const float score = result_vec[k];if (i == 0) {test_score.push_back(score);test_score_output_id.push_back(j);} else {test_score[idx] += score;}}}LOG(INFO) << "Iteration: " << i;}

计算损失均值。检测直接返回。

  loss /= iterations;LOG(INFO) << "Loss: " << loss;if (this->detection) return;

输出损失权重。

  for (int i = 0; i < test_score.size(); ++i) {const float loss_weight = caffe_net->blob_loss_weights()[caffe_net->output_blob_indices()[test_score_output_id[i]]];std::ostringstream loss_msg_stream;const float mean_score = test_score[i] / iterations;if (loss_weight) {loss_msg_stream << " (* " << loss_weight<< " = " << loss_weight * mean_score << " loss)";}}*accuracy = test_score[score_number] / iterations;

Intel Caffe int8 推理校准工具相关推荐

NLP 训练及推理一体化工具（TurboNLPExp）
作者:TurboNLP,腾讯 TEG 后台工程师导语 NLP 任务(序列标注.分类.句子关系判断.生成式)训练时,通常使用机器学习框架 Pytorch 或 Tensorflow,在其之上定义模型以及 ...
基于TensorRT 3的自动驾驶快速INT8推理
基于TensorRT 3的自动驾驶快速INT8推理 Fast INT8 Inference for Autonomous Vehicles with TensorRT 3 自主驾驶需要安全性,需要一种 ...
深度学习多框架多平台推理引擎工具
一种深度学习推理引擎工具,支持多框架.支持多平台推理项目下载地址:下载地址支持的计算平台: - Windows 10 (Visual Studio 2019 x64) - Linux (x64, ...
腾讯第100个开源项目：微信开源推理加速工具TurboTransformers
出品 | AI科技大本营(ID:rgznai100) 4月24日,腾讯正式宣布开源Transformer推理加速工具TurboTransformers.该工具面向自然语言处理领域中Transforme ...
腾讯开源 TurboTransformers：自然语言处理推理加速工具
4 月 24 日,腾讯正式宣布开源 Transformer 推理加速工具 TurboTransformers.该工具面向自然语言处理领域中 Transformers 相关模型丰富的线上预测场景,在微信 ...
微信开源推理加速工具 TurboTransformers，性能超越 PyTorch/TensorFlow 与主流优化引擎
腾讯宣布开源 Transformer 推理加速工具 TurboTransformers.该工具面向自然语言处理领域中 Transformers 相关模型丰富的线上预测场景,据介绍,其在微信.腾讯云.Q ...
苹果回应iPhone邮件漏洞；GitHub连续3天爆严重宕机情况；腾讯开源Transformer推理加速工具| 极客头条...
整理 | 屠敏头图 | CSDN 下载自东方 IC 快来收听极客头条音频版吧,智能播报由标贝科技提供技术支持. 「极客头条」-- 技术人员的新闻圈! CSDN 的读者朋友们早上好哇,「极客头条」来啦 ...
ssd训练时提示：Cannot copy param 0 weights from 'xxxx'，以及提示No module named caffe.proto,推理时设置GPU模式
1.Cannot copy param 0 weights from 'xxxx' 使用预训练的模型,当希望修改部分网络结构并继续使用相同结构的网络参数时,经常遇见这个问题.以为改的天衣无缝,但是这个 ...
BetterDisplay Pro Mac显示器校准工具
BetterDisplay Pro是一款Mac平台上的显示器校准软件,可以帮助用户调整显示器的颜色和亮度,以获得更加真实.清晰和舒适的视觉体验. 以下是BetterDisplay Pro的主要特点: ...