Yolov4性能分析（上）

一．目录

实验测试

1）测试介绍

2） Test

3） Train

二．分析

1．实验测试

1
实验测试方法

Yolov4训练train实验方法(Darknet should be
compiled with OpenCV)：

duration_run_detector:

./darknet detector train cfg/coco.data
cfg/yolov4.cfg data/yolov4.conv.137

Yolov4测试test实验方法（Yolo v4 - save result videofile
res.avi）：

Yolo v4 - save result videofile
res.avi: darknet.exe
detector demo cfg/coco.data cfg/yolov4.cfg yolov4.weights test.mp4
-out_filename res.avi

打开Yolov4 Main函数：

duration_run_detector: 0

duration_main_test_resize: 0

duration_main_visualize: 0

duration_main_partial: 0

duration_main_oneoff: 0

duration_main_operations: 0

duration_main_rescale_net 0

duration_main_normalize_net 0

duration_main_statistics_net
0

duration_main_reset_normalize_net 0

duration_main_run_rgbr_net 0

duration_main_run_nightmare 0

duration_main_run_captcha 0

duration_main_speed 0

duration_main_test_resize 0

duration_main_composite_3d 0

duration_main_run_writing 0

duration_main_run_dice 0

duration_main_run_compare 0

duration_main_run_tag 0

duration_main_run_art 0

duration_main_run_classifier 0

duration_main_predict_classifier 0

duration_main_run_coco
0

duration_main_run_vid_rnn 0

duration_main_run_char_rnn 0

duration_main_run_go 0

duration_main_run_cifar 0

duration_main_test_detector 0

//下面的接口参数是Train，Test，Validate的总接口

duration_main_run_detector 27023955

duration_main_run_super 0

duration_main_run_voxel 0

duration_main_run_yolo 0

duration_main_average 0

duration_main_denormalize_net 0

if (0 == strcmp(argv[2], "test")) test_detector(datacfg, cfg, weights, filename,

thresh, hier_thresh, dont_show, ext_output, save_labels, outfile, letter_box,
benchmark_layers); // 测试test_detector函数入口。

else if (0 == strcmp(argv[2], "train")) train_detector(datacfg, cfg, weights, gpus, ngpus,

clear, dont_show, calc_map, mjpeg_port, show_imgs, benchmark_layers,
chart_path); // 训练train_detector函数入口。

else if (0 == strcmp(argv[2], "valid")) validate_detector(datacfg,

cfg, weights, outfile); // 验证validate_detector函数入口。

一．Test

duration_run_detector_find_arg: 3

duration_run_detector_test_detector: 0

duration_run_detector_demo_detector: 27023955

duration_run_detector_train_detector: 0

duration_run_detector_calc_anchors: 0

duration_run_detector_draw_object: 0

duration_run_detector_validate_detector: 0

duration_run_detector_validate_detector_recall:
0

duration_run_detector_validate_map: 0

if (0 == strcmp(argv[2], “demo”)) {

    list

*options = read_data_cfg(datacfg);

int

classes = option_find_int(options, “classes”, 20);

    char

*name_list = option_find_str(options, “names”,
“data/names.list”);

    char

**names = get_labels(name_list);

if

(filename)

if

(strlen(filename) > 0)

if (filename[strlen(filename) - 1] == 0x0d) filename[strlen(filename) -
1] = 0;

    demo(cfg, weights, thresh,

hier_thresh, cam_index, filename, names, classes, avgframes, frame_skip,
prefix, out_filename,

mjpeg_port, dontdraw_bbox, json_port, dont_show, ext_output, letter_box,
time_limit_sec, http_post_host, benchmark, benchmark_layers);

free_list_contents_kvp(options);

free_list(options);

Demo Detector

duration_parse_network_cfg_custom 442932/ 27023955=1.64%

duration_demo_load_weights 497513/
27023955=1.84%

duration_fuse_conv_batchnorm 393218/
27023955=1.46%

duration_calculate_binary_weights 591245/27023955=2.19%

duration_get_capture_video_stream 610033/27023955=2.26%

duration_get_capture_webcam

duration_custom_create_thread 220031/27023955=0.8%

duration_thread_sync 315469/27023955=1.17%

duration_create_window_cv 1663027/27023955=6.15%

duration_get_stream_fps_cpp_cv 1335095/27023955=4.94%

duration_create_video_writer 2016790/27023955=7.46%

duration_get_time_point 1803257/27023955=6.67%

duration_this_thread_yield 2208903/27023955=8.17%

duration_custom_atomic_stire_int 478896/27023955=1.77%

duration_diounms_sort 448094/27023955=1.66%

duration_set_track_id 610708/27023955=2.26%

duration_send_json 2365887/27023955=8.75%

duration_send_http_post_request 1082366/27023955=4.01%

duration_draw_detections_cv_v3 3092754/27023955=11.41%

duration_save_cv_jpg 2890907/27023955=10.70%

duration_send_mjpg 2988041/27023955=11.57%

duration_write_frame_cv 2605713/27023955=9.64%

duration_realease_image_mat 523714/27023955=1.94%

duration_delay_time 505567/27023955=1.87%

duration_free_all_thread 587132/27023955=2.17%

Demo：

net = parse_network_cfg_custom(cfgfile, 1, 1); // set batch=1

load_weights(&net, weightfile);

fuse_conv_batchnorm(net);

calculate_binary_weights(net);

if(filename){

printf(“video file: %s\n”, filename);

    cap =

get_capture_video_stream(filename);

}

else

{

  printf("Webcam

index: %d\n", cam_index);

  cap =

get_capture_webcam(cam_index);

}

custom_create_thread(&fetch_thread, 0,
fetch_in_thread, 0))；

fetch_in_thread_sync(0); //fetch_in_thread(0);

detect_in_thread_sync(0); //fetch_in_thread(0);

create_window_cv(“Demo”, full_screen, 1352,
1013);

if

(out_filename && !flag_exit)

{int

src_fps = 25;

    src_fps

= get_stream_fps_cpp_cv(cap);

output_video_writer =

create_video_writer(out_filename, ‘D’, ‘I’, ‘V’, ‘X’, src_fps,
get_width_mat(det_img), get_height_mat(det_img), 1);

    //'H',

‘2’, ‘6’, ‘4’

    //'D',

‘I’, ‘V’, ‘X’

    //'M',

‘J’, ‘P’, ‘G’

    //'M',

‘P’, ‘4’, ‘V’

    //'M',

‘P’, ‘4’, ‘2’

    //'X',

‘V’, ‘I’, ‘D’

    //'W',

‘M’, ‘V’, ‘2’

}

this_thread_yield();

if (!benchmark)
custom_atomic_store_int(&run_fetch_in_thread, 1);

custom_atomic_store_int(&run_detect_in_thread, 1);

if (nms) {

if (l.nms_kind == DEFAULT_NMS) do_nms_sort(local_dets,
local_nboxes, l.classes, nms);

else diounms_sort(local_dets, local_nboxes, l.classes,
nms, l.nms_kind, l.beta_nms);

}

if (l.embedding_size) set_track_id(local_dets,
local_nboxes, demo_thresh, l.sim_thresh, l.track_ciou_norm,
l.track_history_size, l.dets_for_track, l.dets_for_show);

if (demo_json_port > 0) {

int timeout

= 400000;

send_json(local_dets, local_nboxes, l.classes, demo_names, frame_id,
demo_json_port, timeout);

}

show_image_mat(show_img, “Demo”);

wait_key_cv(1);

send_http_post_request(http_post_host, http_post_port,
filename,

local_dets, nboxes, classes, names, frame_id, ext_output, timeout)；

draw_detections_cv_v3(show_img, local_dets,
local_nboxes, demo_thresh, demo_names, demo_alphabet, demo_classes,
demo_ext_output);

free_detections(local_dets, local_nboxes);

if(show_img) save_cv_jpg(show_img, buff);

// if you run it with param -mjpeg_port 8090 then open URL in your web-browser:
http://localhost:8090

if

(mjpeg_port > 0 && show_img) {

int

port = mjpeg_port;

int

timeout = 400000;

int

jpeg_quality = 40; // 1 - 100

send_mjpeg(show_img, port, timeout, jpeg_quality);

// save video
file

if

(output_video_writer && show_img) {

write_frame_cv(output_video_writer, show_img);

printf("\n cvWriteFrame \n");

while
(custom_atomic_load_int(&run_detect_in_thread)) {

if(avg_fps > 180) this_thread_yield();

else this_thread_sleep_for(thread_wait_ms); // custom_join(detect_thread, 0);

if
(!benchmark) {

   while

(custom_atomic_load_int(&run_fetch_in_thread)) {

       if(avg_fps

this_thread_yield();

       else

this_thread_sleep_for(thread_wait_ms);
// custom_join(fetch_thread, 0);

free_image(det_s);

if
(time_limit_sec > 0 && (get_time_point() - start_time_lim)/1000000

time_limit_sec) {

printf(" start_time_lim = %f, get_time_point() = %f, time spent =
%f \n", start_time_lim, get_time_point(), get_time_point() -
start_time_lim);

break;

}

二．Train

1）if (0 == strcmp(argv[2],
“train”)) train_detector(datacfg, cfg,
weights, gpus, ngpus, clear, dont_show, calc_map, mjpeg_port, show_imgs,
benchmark_layers, chart_path);

2）train_detector()函数：数据加载入口。

pthread_t load_thread = load_data(args); // 首次创建并启动加载线程，args为模型

训练参数。

1） load_data()函数：load_threads()分配线程。

pthread_t load_data(load_args args)

/* 调用load_threads()函数。 */

if(pthread_create(&thread, 0, load_threads, ptr)) error(“Thread creation failed”); // 参数1:指向线程标识符的指针；参数2:设置线程属性；参数3:线程运行函数的地址；参数4:运行函数的参数。

2）多线程调用run_thread_loop()。

if (pthread_create(&threads[i], 0, run_thread_loop, ptr)) error(“Thread creation failed”); // 根据线程个数，调用run_thread_loop函数。

3） load_thread()函数中：根据type标识符执行最底层的数据加载任务load_data_detection()。

void *run_thread_loop(void
*ptr)

pthread_mutex_lock(&mtx_load_data);

load_args *args_local =
(load_args *)xcalloc(1, sizeof(load_args));

    *args_local = args_swap[i]; //

传入线程ID，在load_threads()函数中args_swap[i]
= args。

    pthread_mutex_unlock(&mtx_load_data);load_thread(args_local); // 调用load_thread()函数。

custom_atomic_store_int(&run_load_data[i],
0);

4） load_thread()函数中：根据type标识符执行最底层的数据加载任务load_data_detection()。

if (a.type == DETECTION_DATA){
// 用于检测的数据，在train_detector()函数中，args.type
= DETECTION_DATA。

    *a.d = load_data_detection(a.n,

a.paths, a.m, a.w, a.h, a.c, a.num_boxes, a.classes, a.flip, a.gaussian_noise,
a.blur, a.mixup, a.jitter, a.resize, a.hue, a.saturation, a.exposure,
a.mini_batch, a.track, a.augment_speed, a.letter_box, a.show_imgs);

5） “darknet/src/data.c”–load_data_detection()函数根据是否配置opencv，有两个版本，opencv版本中：

基本数据处理：

包括crop、flip、HSV augmentation、blur以及gaussian_noise。(注意，a.type
== DETECTION_DATA时，无angle参数传入，没有图像旋转增强)

if (track) random_paths =
get_sequential_paths(paths, n, m, mini_batch, augment_speed); // 目标跟踪。

else random_paths =
get_random_paths(paths, n, m); // 随机选取n张图片的路径。

src = load_image_mat_cv(filename, flag); //
image_opencv.cpp中，load_image_mat_cv函数入口，使用opencv读取图像。

/* 将原图进行一定比例的缩放。 */

float
img_ar = (float)ow / (float)oh; // 读取到的原始图像宽高比。

float
net_ar = (float)w / (float)h; // 规定的，输入到网络要求的图像宽高比。

float result_ar = img_ar / net_ar; //
两者求比值来判断如何进行letter_box缩放。

//
swidth - should be increased

/* 执行letter_box变换。 */

/*
truth在调用函数后获得所有图像的标签信息，因为对原始图片进行了数据增强，其中的平移抖动势必会改动每个物体的矩形框标签信息，需要根据具体的数据增强方式进行相应矫正，后面的参数就是用于数据增强后的矩形框信息矫正。 */

//
image_opencv.cpp中，image_data_augmentation函数入口，数据增强。

image ai = image_data_augmentation(src, w,
h, pleft, ptop, swidth, sheight, flip, dhue, dsat, dexp, gaussian_noise, blur,
boxes, truth);

6） image_data_augmentation()函数

cv::Mat img = *(cv::Mat *)mat; // 读取图像数据。

   // crop

//
flip，虽然配置文件里没有flip参数，但代码里有使用。

// HSV
augmentation

gaussian_noise

// Mat
-> image

7）高级数据处理：

主要是mosaic数据增强。

…

if (use_mixup == 0) { // 不使用mixup。

 d.X.vals[i] = ai.data;

memcpy(d.y.vals[i], truth, 5 *
boxes * sizeof(float)); // C库函数，从存储区truth复制5 *
boxes * sizeof(float)个字节到存储区d.y.vals[i]。

        }else if (use_mixup == 1) { // 使用mixup。if

(i_mixup == 0) { //
第一个序列。

         d.X.vals[i] = ai.data;memcpy(d.y.vals[i], truth, 5 * boxes * sizeof(float)); // n张图的label->d.y.vals，i_mixup=1时，作为上一个sequence的label。}

else if (i_mixup == 1) { // 第二个序列，此时d.X.vals已经储存上个序列n张增强后的图。

image old_img = make_empty_image(w, h, c);

old_img.data =
d.X.vals[i]; // 记录上一个序列的n张old_img。

blend_images_cv(ai, 0.5,
old_img, 0.5); //
image_opencv.cpp中，blend_images_cv函数入口，新旧序列对应的两张图进行线性融合，ai只是在i_mixup和i循环最里层的一张图。

blend_truth(d.y.vals[i],
boxes, truth); // 上一个序列的d.y.vals[i]与这个序列的truth融合。

free_image(old_img);
// 释放img数据。

d.X.vals[i] = ai.data;
// 保存这个序列的n张图。

}

        else if (use_mixup == 3) { // mosaic数据增强。if

(i_mixup == 0) { //
第一序列，初始化。

                image tmp_img = make_image(w, h, c);

d.X.vals[i] =
tmp_img.data;

}

if (flip) { // 翻转。

                int tmp =

pleft;

pleft = pright;

pright = tmp;

}

const int left_shift = min_val_cmp(cut_x[i],
max_val_cmp(0, (-pleft*w
/ ow))); // utils.h中，min_val_cmp函数入口，取小(min)取大(max)。

const int top_shift = min_val_cmp(cut_y[i],
max_val_cmp(0, (-ptoph
/ oh))); // ptop<0时，取cut_y[i]与-ptoph / oh较小的，否则返回0。

const int right_shift = min_val_cmp((w -
cut_x[i]), max_val_cmp(0,
(-pright*w / ow)));

const int bot_shift = min_val_cmp(h -
cut_y[i], max_val_cmp(0,
(-pbot*h / oh)));

int k, x, y;

for (k
= 0; k < c; ++k) { //
通道。

for (y
= 0; y < h; ++y) { //
高度。

int j =
yw + kw*h; // 每张图i，按行堆叠索引j。

if (i_mixup == 0
&& y < cut_y[i]) { // 右下角区块，i_mixup=0~3，d.X.vals[i]未被清0，累计粘贴4块区域。

int
j_src = (w - cut_x[i] - right_shift) + (y + h - cut_y[i] - bot_shift)w + kw*h;

memcpy(&d.X.vals[i][j

0],
&ai.data[j_src], cut_x[i] * sizeof(float));
// 由ai.data[j_src]所指内存区域复制cut_x[i]*sizeof(float)个字节到&d.X.vals[i][j
0]所指内存区域。

}

if (i_mixup
== 1 && y <
cut_y[i]) { // 左下角区块。

int
j_src = left_shift + (y + h - cut_y[i] - bot_shift)w + kw*h;

memcpy(&d.X.vals[i][j

cut_x[i]], &ai.data[j_src], (w-cut_x[i]) * sizeof(float));

}

if
(i_mixup == 2 &&
y >= cut_y[i]) { // 右上角区块。

int
j_src = (w - cut_x[i] - right_shift) + (top_shift + y - cut_y[i])w + kw*h;

memcpy(&d.X.vals[i][j

0],
&ai.data[j_src], cut_x[i] * sizeof(float));

}

if
(i_mixup == 3 &&
y >= cut_y[i]) { // 左上角区块。

int
j_src = left_shift + (top_shift + y - cut_y[i])w + kw*h;

memcpy(&d.X.vals[i][j

cut_x[i]], &ai.data[j_src], (w - cut_x[i]) * sizeof(float));

}

blend_truth_mosaic(d.y.vals[i], boxes, truth, w, h,
cut_x[i], cut_y[i], i_mixup, left_shift, right_shift, top_shift, bot_shift); //
label对应shift调整。

free_image(ai);

ai.data =
d.X.vals[i];

}

…

8）

整体架构

整体架构和YOLO-V3相同（感谢知乎大神@江大白），创新点如下：

输入端 --> Mosaic数据增强、cmBN、SAT自对抗训练；

BackBone --> CSPDarknet53、Mish激活函数、Dropblock；

Neck --> SPP、FPN+PAN结构；

Prediction --> GIOU_Loss、DIOU_nms。

网络配置文件(.cfg)决定了模型架构，训练时需要在命令行指定。文件以[net]段开头，定义与训练直接相关的参数：

[net]

Testing # 测试时，batch和subdivisions设置为1,否则可能出错。

#batch=1 # 大一些可以减小训练震荡及训练时NAN的出现。

#subdivisions=1 # 必须为为8的倍数，显存吃紧可以设成32或64。

Training

batch=64 # 训练过程中将64张图一次性加载进内存，前向传播后将64张图的loss累加求平均，再一次性后向传播更新权重。

subdivisions=16 # 一个batch分16次完成前向传播，即每次计算4张。

width=608 # 网络输入的宽。

height=608 # 网络输入的高。

channels=3 # 网络输入的通道数。

momentum=0.949 # 动量梯度下降优化方法中的动量参数，更新的时候在一定程度上保留之前更新的方向。

decay=0.0005 # 权重衰减正则项，用于防止过拟合。

angle=0 # 数据增强参数，通过旋转角度来生成更多训练样本。

saturation = 1.5 # 数据增强参数，通过调整饱和度来生成更多训练样本。

exposure = 1.5 # 数据增强参数，通过调整曝光量来生成更多训练样本。

hue=.1 # 数据增强参数，通过调整色调来生成更多训练样本。

learning_rate=0.001 # 学习率。

burn_in=1000 # 在迭代次数小于burn_in时，学习率的更新为一种方式，大于burn_in时，采用policy的更新方式。

max_batches = 500500 #训练迭代次数，跑完一个batch为一次，一般为类别数*2000，训练样本少或train from scratch可适当增加。

policy=steps # 学习率调整的策略。

steps=400000,450000 # 动态调整学习率，steps可以取max_batches的0.8~0.9。

scales=.1,.1 # 迭代到steps(1)次时，学习率衰减十倍，steps(2)次时，学习率又会在前一个学习率的基础上衰减十倍。

#cutmix=1 # cutmix数据增强，将一部分区域cut掉但不填充0像素而是随机填充训练集中的其他数据的区域像素值，分类结果按一定的比例分配。

mosaic=1 # 马赛克数据增强，取四张图，随机缩放、随机裁剪、随机排布的方式拼接，详见上述代码分析。

其余区段，包括[convolutional]、[route]、[shortcut]、[maxpool]、[upsample]、[yolo]层，为不同类型的层的配置参数。YOLO-V4中[net]层之后堆叠多个CBM及CSP层，首先是2个CBM层，CBM结构如下：

[convolutional]

batch_normalize=1 # 是否进行BN。

filters=32

卷积核个数，也就是该层的输出通道数。

size=3

卷积核大小。

stride=1

卷积步长。

pad=1

pad边缘补像素。

activation=mish # 网络层激活函数，yolo-v4只在Backbone中采用了mish，网络后面仍采用Leaky_relu。

创新点是Mish激活函数，与Leaky_Relu曲线对比如图：

Mish在负值的时候并不是完全截断，而是允许比较小的负梯度流入，保证了信息的流动。此外，平滑的激活函数允许更好的信息深入神经网络，梯度下降效果更好，从而提升准确性和泛化能力。

两个CBM后是CSP1，CSP1结构如下：

CSP1 = CBM + 1个残差unit + CBM -> Concat(with CBM)，见总图。

[convolutional] # CBM层，直接与7层后的route层连接，形成总图中CSPX下方支路。

batch_normalize=1

filters=64

size=1

stride=1

pad=1

activation=mish

[route] # 得到前面第2层的输出，即CSP开始位置，构建如图所示的CSP第一支路。

layers = -2

[convolutional] # CBM层。

batch_normalize=1

filters=64

size=1

stride=1

pad=1

activation=mish

Residual Block

[convolutional] # CBM层。

batch_normalize=1

filters=32

size=1

stride=1

pad=1

activation=mish

[convolutional] # CBM层。

batch_normalize=1

filters=64

size=3

stride=1

pad=1

activation=mish

[shortcut] # add前面第3层的输出，Residual Block结束。

from=-3

activation=linear

[convolutional] # CBM层。

batch_normalize=1

filters=64

size=1

stride=1

pad=1

activation=mish

[route] # Concat上一个CBM层与前面第7层(CBM)的输出。

layers = -1,-7

接下来的CBM及CSPX架构与上述block相同，只是CSPX对应X个残差单元，如图：

CSP模块将基础层的特征映射划分为两部分，再skip
connection，减少计算量的同时保证了准确率。

要注意的是，backbone中两次出现分支，与后续Neck连接，稍后会解释。

四. Neck&Prediction

.cfg配置文件后半部分是Neck和YOLO-Prediction设置，我做了重点注释：

CBL*3

[convolutional]

batch_normalize=1

filters=512

size=1

stride=1

pad=1

activation=leaky

不再使用Mish。

[convolutional]

batch_normalize=1

size=3

stride=1

pad=1

filters=1024

activation=leaky

[convolutional]

batch_normalize=1

filters=512

size=1

stride=1

pad=1

activation=leaky

SPP-最大池化的方式进行多尺度融合

[maxpool] #
5*5。

stride=1

size=5

[route]

layers=-2

[maxpool] #
9*9。

stride=1

size=9

[route]

layers=-4

[maxpool] #
13*13。

stride=1

size=13

[route] #
Concat。

layers=-1,-3,-5,-6

End SPP

CBL*3

[convolutional]

batch_normalize=1

filters=512

size=1

stride=1

pad=1

activation=leaky

不再使用Mish。

[convolutional]

batch_normalize=1

size=3

stride=1

pad=1

filters=1024

activation=leaky

[convolutional]

batch_normalize=1

filters=512

size=1

stride=1

pad=1

activation=leaky

CBL

[convolutional]

batch_normalize=1

filters=256

size=1

stride=1

pad=1

activation=leaky

上采样

[upsample]

stride=2

[route]

layers = 85 # 获取Backbone中CBM+CSP8+CBM模块的输出，85从net以外的层开始计数，从0开始索引。

[convolutional]

增加CBL支路。

batch_normalize=1

filters=256

size=1

stride=1

pad=1

activation=leaky

[route] #
Concat。

layers = -1,
-3

CBL*5

[convolutional]

batch_normalize=1

filters=256

size=1

stride=1

pad=1

activation=leaky

[convolutional]

batch_normalize=1

size=3

stride=1

pad=1

filters=512

activation=leaky

[convolutional]

batch_normalize=1

filters=256

size=1

stride=1

pad=1

activation=leaky

[convolutional]

batch_normalize=1

size=3

stride=1

pad=1

filters=512

activation=leaky

[convolutional]

batch_normalize=1

filters=256

size=1

stride=1

pad=1

activation=leaky

CBL

[convolutional]

batch_normalize=1

filters=128

size=1

stride=1

pad=1

activation=leaky

上采样

[upsample]

stride=2

[route]

layers = 54 # 获取Backbone中CBM2+CSP1+CBM2+CSP2+CBM*2+CSP8+CBM模块的输出，54从net以外的层开始计数，从0开始索引。

CBL

[convolutional]

batch_normalize=1

filters=128

size=1

stride=1

pad=1

activation=leaky

[route] #
Concat。

layers = -1,
-3

CBL*5

[convolutional]

batch_normalize=1

filters=128

size=1

stride=1

pad=1

activation=leaky

[convolutional]

batch_normalize=1

size=3

stride=1

pad=1

filters=256

activation=leaky

[convolutional]

batch_normalize=1

filters=128

size=1

stride=1

pad=1

activation=leaky

[convolutional]

batch_normalize=1

size=3

stride=1

pad=1

filters=256

activation=leaky

[convolutional]

batch_normalize=1

filters=128

size=1

stride=1

pad=1

activation=leaky

Prediction

CBL

[convolutional]

batch_normalize=1

size=3

stride=1

pad=1

filters=256

activation=leaky

conv

[convolutional]

size=1

stride=1

pad=1

filters=255

activation=linear

[yolo] #
7676255，对应最小的anchor box。mask = 0,1,2 #
当前属于第几个预选框。# coco数据集默认值，可通过detector
calc_anchors，利用k-means计算样本anchors，但要根据每个anchor的大小(是否超过6060或3030)更改mask对应的索引(第一个yolo层对应小尺寸；第二个对应中等大小；第三个对应大尺寸)及上一个conv层的filters。anchors = 12,
16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
classes=80 # 网络需要识别的物体种类数。num=9 # 预选框的个数，即anchors总数。jitter=.3 # 通过抖动增加噪声来抑制过拟合。ignore_thresh
= .7truth_thresh = 1scale_x_y = 1.2iou_thresh=0.213cls_normalizer=1.0iou_normalizer=0.07iou_loss=ciou

CIOU损失函数，考虑目标框回归函数的重叠面积、中心点距离及长宽比。nms_kind=greedynmsbeta_nms=0.6max_delta=5

[route]

layers = -4 # 获取Neck第一层的输出。

构建第二分支 ###### CBL

###[convolutional]batch_normalize=1size=3stride=2pad=1filters=256activation=leaky

[route] #
Concat。layers = -1, -16

CBL*5

###[convolutional]batch_normalize=1filters=256size=1stride=1pad=1activation=leaky

[convolutional]batch_normalize=1size=3stride=1pad=1filters=512activation=leaky

[convolutional]batch_normalize=1filters=256size=1stride=1pad=1activation=leaky

[convolutional]batch_normalize=1size=3stride=1pad=1filters=512activation=leaky

[convolutional]batch_normalize=1filters=256size=1stride=1pad=1activation=leaky

CBL

###[convolutional]batch_normalize=1size=3stride=1pad=1filters=512activation=leaky

conv

###[convolutional]size=1stride=1pad=1filters=255activation=linear

[yolo] #
3838255，对应中等的anchor box。mask =
3,4,5anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192,
243, 459, 401classes=80num=9jitter=.3ignore_thresh = .7truth_thresh =
1scale_x_y =
1.1iou_thresh=0.213cls_normalizer=1.0iou_normalizer=0.07iou_loss=ciounms_kind=greedynmsbeta_nms=0.6max_delta=5

[route] # 获取Neck第二层的输出。layers = -4

构建第三分支 ###### CBL

###[convolutional]batch_normalize=1size=3stride=2pad=1filters=512activation=leaky

[route] #
Concat。layers = -1, -37

CBL*5

###[convolutional]batch_normalize=1filters=512size=1stride=1pad=1activation=leaky

[convolutional]batch_normalize=1size=3stride=1pad=1filters=1024activation=leaky

[convolutional]batch_normalize=1filters=512size=1stride=1pad=1activation=leaky

[convolutional]batch_normalize=1size=3stride=1pad=1filters=1024activation=leaky

[convolutional]batch_normalize=1filters=512size=1stride=1pad=1activation=leaky

CBL

###[convolutional]batch_normalize=1size=3stride=1pad=1filters=1024activation=leaky

conv

###[convolutional]size=1stride=1pad=1filters=255activation=linear

[yolo] #
1919255，对应最大的anchor box。mask =
6,7,8anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192,
243, 459, 401classes=80num=9jitter=.3ignore_thresh = .7truth_thresh =
1random=1scale_x_y = 1.05iou_thresh=0.213cls_normalizer=1.0iou_normalizer=0.07iou_loss=ciounms_kind=greedynmsbeta_nms=0.6max_delta=5

其中第一个创新点是引入Spatial Pyramid Pooling(SPP)模块：

代码中max pool和route层组合，三个不同尺度的max-pooling将前一个卷积层输出的feature maps进行多尺度的特征处理，再与原图进行拼接，一共4个scale。相比于只用一个max-pooling，提取的特征范围更大，而且将不同尺度的特征进行了有效分离；

第二个创新点是在FPN的基础上引入PAN结构：

原版PANet中PAN操作是做element-wise相加，YOLO-V4则采用扩增维度的Concat，如下图：

Backbone下采样不同阶段得到的特征图Concat后续上采样阶对应尺度的的output，形成FPN结构，再经过两个botton-up的PAN结构。

下采样1：前10个block中，只有3个CBM的stride为2，输入图像尺寸变为608/222=76，filters根据最后一个CBM为256，因此第10个block输出feature map为7676256；

下采样2：继续Backbone，同理，第13个block(CBM)输出3838512的特征图；

下采样3：第23个block(CBL)输出为1919512;

上采样1：下采样3

CBL + 上采样 =
3838256；

Concat1：[上采样1] Concat [下采样2 + CBL] = [3838256] Concat
[3838512 + (256，1)] =
3838512；

上采样2：Concat1

CBL5 + CBL + 上采样
= 7676*128；

Concat2：[上采样2] Concat [下采样1 + CBL] = [7676128] Concat
[7676256 + (128，1)] =
7676256；

Concat3(PAN1)：[Concat2

CBL5 + CBL] Concat [Concat1 + CBL5] = [7676256 + (128，1) + (256，2)] Concat [3838512 + (256，1)] = [3838256] Concat [3838256]
= 3838512；

Concat4(PAN2)：[Concat3

CBL5 + CBL] Concat [下采样3]
= [3838512 + (256，1) +
(512，2)]
Concat [1919512] = 1919*1024；

Prediction①：Concat2

CBL5 + CBL + conv = 7676*256 + (128，1)
(256，1) +
(filters，1) =
7676filters，其中filters
= (class_num + 5)*3，图中默认COCO数据集，80类所以是255；

Prediction②：PAN1

CBL5 + CBL + conv = 3838*512 + (256，1)
(512，1) +
(filters，1) =
3838filters，其中filters
= (class_num + 5)*3，图中默认COCO数据集，80类所以是255；

Prediction③：PAN2

CBL5 + CBL + conv = 1919*1024 + (512，1)
(1024，1) +
(filters，1) =
1919filters，其中filters
= (class_num + 5)*3，图中默认COCO数据集，80类所以是255。

五. 网络构建

上述从backbone到prediction的网络架构，源码中都是基于network结构体来储存网络参数。具体流程如下：

“darknet/src/detector.c”–train_detector()函数中：

// 计算mAP。

五. 网络构建

上述从backbone到prediction的网络架构，源码中都是基于network结构体来储存网络参数。具体流程如下：

“darknet/src/detector.c”–train_detector()函数中：

…

network

net_map;

if (calc_map) {

// 计算mAP。

    ......net_map =

parse_network_cfg_custom(cfgfile, 1, 1); // parser.c中parse_network_cfg_custom函数入口，加载cfg和参数构建网络，batch = 1。

net_map.benchmark_layers = benchmark_layers;

    const int

net_classes = net_map.layers[net_map.n - 1].classes;

    int k;  // free memory unnecessary arraysfor (k = 0;

k < net_map.n - 1; ++k) free_layer_custom(net_map.layers[k], 1);

    ......}srand(time(0));char *base =

basecfg(cfgfile); // utils.c中basecfg()函数入口，解析cfg/yolo-obj.cfg文件，就是模型的配置参数，并打印。

printf("%s\n", base);

float avg_loss

= -1;

network* nets =

(network*)xcalloc(ngpus, sizeof(network)); // 给network结构体分内存，用来储存网络参数。

srand(time(0));int seed =

rand();

int k;for (k = 0; k

< ngpus; ++k) {

    srand(seed);

#ifdef GPU

cuda_set_device(gpus[k]);

#endif

    nets[k] =

parse_network_cfg(cfgfile); // parse_network_cfg_custom(cfgfile, 0, 0)，nets根据GPU个数分别加载配置文件。

nets[k].benchmark_layers = benchmark_layers;

if

(weightfile) {

load_weights(&nets[k], weightfile); // parser.c中load_weights()接口，读取权重文件。

    }if (clear)

{ // 是否清零。

*nets[k].seen = 0;

*nets[k].cur_iteration = 0;

nets[k].learning_rate *= ngpus;

}srand(time(0));network net =

nets[0]; // 参数传递给net

....../* 准备加载参数。 */load_args args

= { 0 };

args.w = net.w;args.h = net.h;args.c = net.c;args.paths =

paths;

args.n = imgs;args.m =

plist->size;

args.classes =

classes;

args.flip =

net.flip;

args.jitter =

l.jitter;

args.resize =

l.resize;

args.num_boxes

= l.max_boxes;

net.num_boxes =

args.num_boxes;

net.train_images_num = train_images_num;

args.d =

&buffer;

args.type =

DETECTION_DATA;

args.threads =

64; // 16 or 64

…

“darknet/src/parser.c”–parse_network_cfg_custom()函数中：

network parse_network_cfg_custom(char *filename, int
batch, int time_steps)

{

list *sections

= read_cfg(filename); // 读取配置文件，构建成一个链表list。

node *n =

sections->front; // 定义sections的首节点为n。

if(!n)

error(“Config file has no sections”);

network net =

make_network(sections->size - 1); // network.c中，make_network函数入口，从net变量下一层开始，依次为其中的指针变量分配内存。由于第一个段[net]中存放的是和网络并不直接相关的配置参数，因此网络中层的数目为sections->size

1。

net.gpu_index =
gpu_index;

size_params
params;

if (batch >

params.train = 0; // allocates
memory for Detection only

else
params.train = 1; //
allocates memory for Detection & Training

section *s =
(section *)n->val; // 首节点n的val传递给section。

list *options =
s->options;

if(!is_network(s)) error(“First section must be [net] or
[network]”);

parse_net_options(options, &net); // 初始化网络全局参数，包含但不限于[net]中的参数。

#ifdef GPU

printf(“net.optimized_memory = %d \n”, net.optimized_memory);

if

(net.optimized_memory >= 2 && params.train) {

pre_allocate_pinned_memory((size_t)1024 * 1024 * 1024 * 8); // pre-allocate 8 GB CPU-RAM for pinned
memory

#endif // GPU

......while(n){ //初始化每一层的参数。

params.index = count;

fprintf(stderr, "%4d ", count);

s =

(section *)n->val;

    options =

s->options;

    layer l = {

(LAYER_TYPE)0 };

    LAYER_TYPE

lt = string_to_layer_type(s->type);

    if(lt ==

CONVOLUTIONAL){ // 卷积层，调用parse_convolutional()函数执行make_convolutional_layer()创建卷积层。

l =

parse_convolutional(options, params);

    }else if(lt

== LOCAL){

l =

parse_local(options, params);

    }else if(lt

== ACTIVE){

l =

parse_activation(options, params);

    }else if(lt

== RNN){

l =

parse_rnn(options, params);

    }else if(lt

== GRU){

l =

parse_gru(options, params);

    }else if(lt

== LSTM){

l =

parse_lstm(options, params);

    }else if

(lt == CONV_LSTM) {

l =

parse_conv_lstm(options, params);

    }else if(lt

== CRNN){

l =

parse_crnn(options, params);

    }else if(lt

== CONNECTED){

l =

parse_connected(options, params);

    }else if(lt

== CROP){

        l = parse_crop(options,

params);

    }else if(lt

== COST){

l =

parse_cost(options, params);

l.keep_delta_gpu = 1;

    }else if(lt

== REGION){

l =

parse_region(options, params);

l.keep_delta_gpu = 1;

    }else if (lt == YOLO) { // yolov3/4引入的yolo_layer，调用parse_yolo()函数执行make_yolo_layer()创建yolo层。l =

parse_yolo(options, params);

l.keep_delta_gpu = 1;

    }else if

(lt == GAUSSIAN_YOLO) {

l =

parse_gaussian_yolo(options, params);

l.keep_delta_gpu = 1;

    }else if(lt

== DETECTION){

l =

parse_detection(options, params);

    }else if(lt

== SOFTMAX){

l =

parse_softmax(options, params);

net.hierarchy = l.softmax_tree;

l.keep_delta_gpu = 1;

    }else if(lt

== NORMALIZATION){

l =

parse_normalization(options, params);

    }else if(lt

== BATCHNORM){

l =

parse_batchnorm(options, params);

    }else if(lt

== MAXPOOL){

l =

parse_maxpool(options, params);

    }else if

(lt == LOCAL_AVGPOOL) {

l =

parse_local_avgpool(options, params);

    }else if(lt

== REORG){

l =

parse_reorg(options, params); }

    else if (lt

== REORG_OLD) {

l =

parse_reorg_old(options, params);

    }else if(lt

== AVGPOOL){

l =

parse_avgpool(options, params);

    }else if(lt

== ROUTE){

l =

parse_route(options, params);

        int k;for (k

= 0; k < l.n; ++k) {

net.layers[l.input_layers[k]].use_bin_output = 0;

net.layers[l.input_layers[k]].keep_delta_gpu = 1;

        }}else if

(lt == UPSAMPLE) {

l =

parse_upsample(options, params, net);

    }else if(lt

== SHORTCUT){

l =

parse_shortcut(options, params, net);

net.layers[count - 1].use_bin_output = 0;

net.layers[l.index].use_bin_output = 0;

net.layers[l.index].keep_delta_gpu = 1;

    }else if

(lt == SCALE_CHANNELS) {

l =

parse_scale_channels(options, params, net);

net.layers[count - 1].use_bin_output = 0;

net.layers[l.index].use_bin_output = 0;

net.layers[l.index].keep_delta_gpu = 1;

    }else if (lt

== SAM) {

l =

parse_sam(options, params, net);

net.layers[count - 1].use_bin_output = 0;

net.layers[l.index].use_bin_output = 0;

        net.layers[l.index].keep_delta_gpu = 1;}else if(lt

== DROPOUT){

l =

parse_dropout(options, params);

l.output = net.layers[count-1].output;

        l.delta

= net.layers[count-1].delta;

#ifdef GPU

        l.output_gpu

= net.layers[count-1].output_gpu;

l.delta_gpu = net.layers[count-1].delta_gpu;

l.keep_delta_gpu = 1;

#endif

    }else if (lt

== EMPTY) {

        layer

empty_layer = {(LAYER_TYPE)0};

empty_layer.out_w = params.w;

empty_layer.out_h = params.h;

empty_layer.out_c = params.c;

l =

empty_layer;

l.output = net.layers[count - 1].output;

        l.delta

= net.layers[count - 1].delta;

#ifdef GPU

l.output_gpu = net.layers[count - 1].output_gpu;

l.delta_gpu = net.layers[count - 1].delta_gpu;

#endif

    }else{

fprintf(stderr, “Type not recognized: %s\n”, s->type);

    }......net.layers[count]

= l; // 每个解析函数返回一个填充好的层l，将这些层全部添加到network结构体的layers数组中。

if

(l.workspace_size > workspace_size) workspace_size = l.workspace_size; //
workspace_size表示网络的工作空间，指的是所有层中占用运算空间最大的那个层的，因为实际上在GPU或CPU中某个时刻只有一个层在做前向或反向运算。

    if (l.inputs

max_inputs) max_inputs = l.inputs;

if

(l.outputs > max_outputs) max_outputs = l.outputs;

free_section(s);

n =

n->next; // node节点前沿，empty则while-loop结束。

    ++count;if(n){ // 这部分将连接的两个层之间的输入输出shape统一。if (l.antialiasing)

{

params.h = l.input_layer->out_h;

params.w = l.input_layer->out_w;

params.c = l.input_layer->out_c;

params.inputs = l.input_layer->outputs;

        }else {

params.h = l.out_h;

params.w = l.out_w;

params.c = l.out_c;

params.inputs = l.outputs;

        }}if

(l.bflops > 0) bflops += l.bflops;

    if (l.w

1 && l.h > 1) {

avg_outputs += l.outputs;

avg_counter++;

}}

free_list(sections);

......return net; // 返回解析好的network类型的指针变量，这个指针变量会伴随训练的整个过程。

}

以卷积层和yolo层为例，介绍网络层的创建过程，convolutional_layer.c中make_convolutional_layer()函数：

convolutional_layer make_convolutional_layer(int batch,
int steps, int h, int w, int c, int n, int groups, int size, int stride_x, int
stride_y, int dilation, int padding, ACTIVATION activation, int
batch_normalize, int binary, int xnor, int adam, int use_bin_output, int index,
int antialiasing, convolutional_layer *share_layer, int assisted_excitation,
int deform, int train)

{

int total_batch

= batch*steps;

int i;convolutional_layer

l = { (LAYER_TYPE)0 }; // convolutional_layer其实就是layer。

l.type =

CONVOLUTIONAL; // layer的类型，此处为卷积层。

l.train =

train;

/* 改变输入和输出的维度。 */if (xnor)

groups = 1; // disable groups for
XNOR-net

if (groups <

groups = 1; // group将对应的输入输出通道对应分组，默认为1(输出输入的所有通道各为一组)，把卷积group等于输入通道，输出通道等于输入通道就实现了depthwize separable convolution结构。

const int
blur_stride_x = stride_x;

const int
blur_stride_y = stride_y;

l.antialiasing
= antialiasing;

if
(antialiasing) {
```
 stride_x =
```

stride_y = l.stride = l.stride_x = l.stride_y = 1; // use stride=1 in
host-layer

}l.deform =

deform;

l.assisted_excitation = assisted_excitation;

l.share_layer =

share_layer;

l.index =

index;

l.h = h; //

input的高。

l.w = w; //

input的宽。

l.c = c; //

input的通道。

l.groups =

groups;

l.n = n; // 卷积核filter的个数。l.binary =

binary;

l.xnor = xnor;

l.use_bin_output = use_bin_output;

l.batch =

batch; // 训练使用的batch_size。

l.steps =

steps;

l.stride =

stride_x; // 移动步长。

l.stride_x =

stride_x;

l.stride_y =

stride_y;

l.dilation =

dilation;

l.size = size;

// 卷积核的大小。

l.pad =

padding; // 边界填充宽度。

l.batch_normalize = batch_normalize; // 是否进行BN操作。

l.learning_rate_scale = 1;

/* 数组的大小: c/groups*n*size*size。 */l.nweights = (c

/ groups) * n * size * size; // groups默认值为1，出现c的原因是对多个通道的广播操作。

if

(l.share_layer) {

    if (l.size

!= l.share_layer->size || l.nweights != l.share_layer->nweights || l.c !=
l.share_layer->c || l.n != l.share_layer->n) {

printf(" Layer size, nweights, channels or filters don’t match for
the share_layer");

getchar();

    }l.weights =

l.share_layer->weights;

l.weight_updates = l.share_layer->weight_updates;

    l.biases =

l.share_layer->biases;

l.bias_updates = l.share_layer->bias_updates;

}else {l.weights =

(float*)xcalloc(l.nweights, sizeof(float));

    l.biases =

(float*)xcalloc(n, sizeof(float));

    if (train)

{

l.weight_updates = (float*)xcalloc(l.nweights, sizeof(float));

l.bias_updates = (float*)xcalloc(n, sizeof(float));

    }}// float scale

= 1./sqrt(sizesizec);

float scale =

sqrt(2./(sizesizec/groups)); // 初始值scale。

if

(l.activation == NORM_CHAN || l.activation == NORM_CHAN_SOFTMAX || l.activation
== NORM_CHAN_SOFTMAX_MAXVAL) {

    for (i = 0;

i < l.nweights; ++i) l.weights[i] = 1;
// rand_normal();

}else {for (i = 0;

i < l.nweights; ++i) l.weights[i] = scale*rand_uniform(-1, 1); // rand_normal();

}/* 根据公式计算输出维度。 */int out_h =

convolutional_out_height(l);

int out_w =

convolutional_out_width(l);

l.out_h =

out_h; // output的高。

l.out_w =

out_w; // output的宽。

l.out_c = n; //

output的通道，等于卷积核个数。

l.outputs =

l.out_h * l.out_w * l.out_c; // 一个batch的output维度大小。

l.inputs = l.w

l.h * l.c; // 一个batch的input维度大小。

l.activation =
activation;

l.output =
(float*)xcalloc(total_batch*l.outputs, sizeof(float)); // 输出数组。

#ifndef GPU

if (train)

l.delta = (float*)xcalloc(total_batch*l.outputs, sizeof(float)); // 暂存更新数据的输出数组。

#endif // not GPU

/* 三个重要的函数，前向运算，反向传播和更新函数。 */l.forward = forward_convolutional_layer;l.backward =

backward_convolutional_layer;

l.update =

update_convolutional_layer; // 明确了更新的策略。

if(binary){

l.binary_weights = (float*)xcalloc(l.nweights, sizeof(float));

    l.cweights

= (char*)xcalloc(l.nweights, sizeof(char));

    l.scales =

(float*)xcalloc(n, sizeof(float));

}if(xnor){

l.binary_weights = (float*)xcalloc(l.nweights, sizeof(float));

l.binary_input = (float*)xcalloc(l.inputs * l.batch, sizeof(float));

    int align =

32;// 8;

int

src_align = l.out_h*l.out_w;

    l.bit_align

= src_align + (align - src_align % align);

    l.mean_arr

= (float*)xcalloc(l.n, sizeof(float));

    const

size_t new_c = l.c / 32;

    size_t

in_re_packed_input_size = new_c * l.w * l.h + 1;

l.bin_re_packed_input = (uint32_t*)xcalloc(in_re_packed_input_size,
sizeof(uint32_t));

    l.lda_align

= 256; // AVX2

    int k =

l.sizel.sizel.c;

    size_t

k_aligned = k + (l.lda_align - k%l.lda_align);

    size_t

t_bit_input_size = k_aligned * l.bit_align / 8;

l.t_bit_input = (char*)xcalloc(t_bit_input_size, sizeof(char));

}/* Batch

Normalization相关的变量设置。 */

if(batch_normalize){

    if (l.share_layer)

{

l.scales = l.share_layer->scales;

l.scale_updates = l.share_layer->scale_updates;

        l.mean

= l.share_layer->mean;

l.variance = l.share_layer->variance;

l.mean_delta = l.share_layer->mean_delta;

l.variance_delta = l.share_layer->variance_delta;

l.rolling_mean = l.share_layer->rolling_mean;

l.rolling_variance = l.share_layer->rolling_variance;

    }else {

l.scales = (float*)xcalloc(n, sizeof(float));

        for (i

= 0; i < n; ++i) {

l.scales[i] = 1;

}if

(train) {

l.scale_updates = (float*)xcalloc(n, sizeof(float));

l.mean = (float*)xcalloc(n, sizeof(float));

l.variance = (float*)xcalloc(n, sizeof(float));

l.mean_delta = (float*)xcalloc(n, sizeof(float));

l.variance_delta = (float*)xcalloc(n, sizeof(float));

l.rolling_mean = (float*)xcalloc(n, sizeof(float));

l.rolling_variance = (float*)xcalloc(n, sizeof(float));

    }......return l;

}

yolo_layer.c中make_yolo_layer()函数：

layer make_yolo_layer(int batch, int w, int h, int n, int
total, int *mask, int classes, int max_boxes)

{

int i;layer l = {

(LAYER_TYPE)0 };

l.type = YOLO;

// 层类别。

l.n = n; // 一个cell能预测多少个b-box。l.total =

total; // anchors数目，9。

l.batch =

batch; // 一个batch包含的图像张数。

l.h = h; //

input的高。

l.w = w; //

imput的宽。

l.c =

n*(classes + 4 + 1);

l.out_w = l.w;

// output的高。

l.out_h = l.h;

// output的宽。

l.out_c = l.c;

// output的通道，等于卷积核个数。

l.classes =

classes; // 目标类别数。

l.cost =

(float*)xcalloc(1, sizeof(float)); // yolo层总的损失。

l.biases =

(float*)xcalloc(total * 2, sizeof(float)); // 储存b-box的anchor box的[w，h]。

if(mask) l.mask

= mask; // 有mask传入。

else{l.mask =

(int*)xcalloc(n, sizeof(int));

    for(i = 0;

i < n; ++i){

l.mask[i] = i;

    }}l.bias_updates

= (float*)xcalloc(n * 2, sizeof(float)); // 储存b-box的anchor box的[w，h]的更新值。

l.outputs =

hwn*(classes + 4 + 1); // 一张训练图片经过yolo层后得到的输出元素个数（Grid数每个Grid预测的矩形框数每个矩形框的参数个数）

l.inputs =

l.outputs; // 一张训练图片输入到yolo层的元素个数（对于yolo_layer，输入和输出的元素个数相等）

l.max_boxes =

max_boxes; // 一张图片最多有max_boxes个ground truth矩形框，这个数量时固定写死的。

l.truths =

l.max_boxes*(4 + 1); // 4个定位参数+1个物体类别，大于GT实际参数数量。

l.delta =

(float*)xcalloc(batch * l.outputs, sizeof(float)); // yolo层误差项，包含整个batch的。

l.output =

(float*)xcalloc(batch * l.outputs, sizeof(float)); // yolo层所有输出，包含整个batch的。

/* 存储b-box的Anchor box的[w,h]的初始化，在parse.c中parse_yolo函数会加载cfg中Anchor尺寸。*/for(i = 0; i

< total*2; ++i){

    l.biases[i]

= .5;

}/* 前向运算，反向传播函数。*/l.forward =

forward_yolo_layer;

l.backward =

backward_yolo_layer;

#ifdef GPU

l.forward_gpu =

forward_yolo_layer_gpu;

l.backward_gpu

= backward_yolo_layer_gpu;

l.output_gpu =

cuda_make_array(l.output, batch*l.outputs);

l.output_avg_gpu = cuda_make_array(l.output, batch*l.outputs);

l.delta_gpu =

cuda_make_array(l.delta, batch*l.outputs);

free(l.output);if (cudaSuccess

== cudaHostAlloc(&l.output, batchl.outputssizeof(float),
cudaHostRegisterMapped)) l.output_pinned = 1;

else {

cudaGetLastError(); // reset CUDA-error

    l.output =

(float*)xcalloc(batch * l.outputs, sizeof(float));

}free(l.delta);if (cudaSuccess

== cudaHostAlloc(&l.delta, batchl.outputssizeof(float),
cudaHostRegisterMapped)) l.delta_pinned = 1;

else {

cudaGetLastError(); // reset CUDA-error

    l.delta =

(float*)xcalloc(batch * l.outputs, sizeof(float));

#endif

fprintf(stderr,

“yolo\n”);

srand(time(0));return l;

}

这里要强调下"darknet/src/list.h"中定义的数据结构list：

typedef struct node{

void *val;struct node

*next;

struct node

*prev;

} node;

typedef struct list{

int size; //

list的所有节点个数。

node *front; //

list的首节点。

node *back; //

list的普通节点。

} list; // list类型变量保存所有的网络参数，有很多的sections节点，每个section中又有一个保存层参数的小list。

以及"darknet/src/parser.c"中定义的数据结构section：

typedef struct{

char *type; //

section的类型，保存的是网络中每一层的网络类型和参数。在.cfg配置文件中, 以‘[’开头的行被称为一个section(段)。

list *options;

// section的参数信息。

}section;

“darknet/src/parser.c”–read_cfg()函数的作用就是读取.cfg配置文件并返回给list类型变量sections：

/* 读取神经网络结构配置文件.cfg文件中的配置数据，将每个神经网络层参数读取到每个section结构体(每个section是sections的一个节点)中，而后全部插入到list结构体sections中并返回。*/

/* param: filename是C风格字符数组，神经网络结构配置文件路径。*/

/* return: list结构体指针，包含从神经网络结构配置文件中读入的所有神经网络层的参数。*/

list *read_cfg(char *filename)

{

FILE *file =

fopen(filename, “r”);

if(file == 0)

file_error(filename);

/* 一个section表示配置文件中的一个字段，也就是网络结构中的一层，因此，一个section将读取并存储某一层的参数以及该层的type。 */char *line;int nu = 0; // 当前读取行记号。list *sections

= make_list(); // sections包含所有的神经网络层参数。

section

*current = 0; // 当前读取到的某一层。

while((line=fgetl(file)) != 0){

    ++ nu;

strip(line); // 去除读入行中含有的空格符。

switch(line[0]){

        /* 以'['开头的行是一个新的section，其内容是层的type，比如[net],[maxpool],[convolutional]... */case

‘[’:

current = (section*)xmalloc(sizeof(section)); // 读到了一个新的section:current。

list_insert(sections, current); // list.c中，list_insert函数入口，将该新的section保存起来。

current->options = make_list();

current->type = line;

break;

        case

‘\0’: // 空行。

        case

‘#’: // 注释。

        case

‘;’: // 空行。

free(line); // 对于上述三种情况直接释放内存即可。

break;

        /* 剩下的才真正是网络结构的数据，调用read_option()函数读取，返回0说明文件中的数据格式有问题，将会提示错误。 */

default:

if(!read_option(line, current->options)){ // 将读取到的参数保存在current变量的options中，这里保存在options节点中的数据为kvp键值对类型。

fprintf(stderr, “Config file error line %d, could parse:
%s\n”, nu, line);

                free(line);}

break;

    }}fclose(file);return

sections;

}

综上，解析过程将链表中的网络参数保存到network结构体，用于后续权重更新。

Yolov4性能分析（上）相关推荐

Yolov4性能分析（下）
Yolov4性能分析(下) 六. 权重更新 "darknet/src/detector.c"–train_detector()函数中: ....../* 开始训练网络 */floa ...
代码函数从零开始学习OpenCL开发（二）一个最简单的示例与简单性能分析
在本文中,我们要主介绍代码函数的内容,自我感觉有个不错的建议和大家分享下迎欢存眷转载请注明 http://blog.csdn.net/leonwei/article/details/8893796 ...
Es底层查询原理、数据结构、及性能分析
Elasticsearch是一个很火的分布式搜索系统,提供了非常强大而且易用的查询和分析能力,包括全文索引.模糊查询.多条件组合查询.地理位置查询等等,而且具有一定的分析聚合能力.因为其查询场景非 ...
浅析微前端架构下的Web性能分析
我们都知道Web性能关乎用户体验,进一步影响用户留存.转化率,显然用户体验不友好,最终导致流失.可见Web页面性能对用户和企业而言,可谓举足轻重. 因此,对Web页面的性能分析相关性能优化,是开发者不 ...
eBCC性能分析最佳实践（1） - 线上lstat, vfs_fstatat 开销高情景分析...
Guide: eBCC性能分析最佳实践(0) - 开启性能分析新篇章 eBCC性能分析最佳实践(1) - 线上lstat, vfs_fstatat 开销高情景分析 eBCC性能分析最佳实践(2) - ...
性能测试分析与性能调优诊断--史上最全的服务器性能分析监控调优篇
来源: https://www.cnblogs.com/laoqing/p/11629941.html 一个系统或者网站在功能开发完成后一般最终都需要部署到服务器上运行,那么服务器的性能监控和分析就显 ...
php mysql 网站性能分析工具_如何使用工具进行线上 PHP 性能追踪及分析？
工作了一两年的 PHPer 大概都多多少少知道一些性能分析的工具,比如 Xdebug.xhprof.New Relic .OneAPM.使用基于 Xdebug 进行 PHP 的性能分析,对于本地开发环 ...
深度学习模型Intel与ARM部署性能分析，Intel和ARM CPU上CNN计算速度差距分析。
深度学习模型部署性能分析,Intel和ARM CPU上CNN计算速度差距分析. 一. 模型部署CPU性能分析 1.1 开发阶段CPU-Intel X86架构 1.2 测试阶段CPU-ARM架构 1.3 ...
JS在浏览器上的性能分析（一）脚本的下载与运行
JS在浏览器上的性能分析(一)脚本的下载与运行前言 JS在浏览器上的性能,可以认为是开发者所面临的最严重的可用性问题.JS的阻塞特性使得浏览器在执行JS代码时不能同时做其他任何事情,而大多数浏览器使 ...

Yolov4性能分析（上）

Testing # 测试时，batch和subdivisions设置为1,否则可能出错。

Training

卷积核个数，也就是该层的输出通道数。

卷积核大小。

卷积步长。

pad边缘补像素。

CSP1 = CBM + 1个残差unit + CBM -> Concat(with CBM)，见总图。

Residual Block

CBL*3

不再使用Mish。

SPP-最大池化的方式进行多尺度融合

End SPP

CBL*3

不再使用Mish。

CBL

上采样

增加CBL支路。

CBL*5

CBL

上采样

CBL

CBL*5

Prediction

CBL

conv

CIOU损失函数，考虑目标框回归函数的重叠面积、中心点距离及长宽比。nms_kind=greedynmsbeta_nms=0.6max_delta=5

构建第二分支 ###### CBL

CBL*5

CBL

conv

构建第三分支 ###### CBL

CBL*5

CBL

conv

Yolov4性能分析（上）相关推荐

最新文章

热门文章