Ubuntu服务器上用CornerNet-Squeeze训练自己的模型

博主的上篇博客《Ubuntu服务器上搭建CornerNet-Lite环境》中已经详细阐述了如何在服务器上搭建CornerNet-Lite环境，并用官方的CornerNet-Squeeze模型进行了测试。本篇博客将会分享博主如何在服务器上用CornerNet-Squeeze训练自己的数据集，以及在探索图中遇到的一些坑，希望对读者有所帮助，如有错误之处，敬请指正。

一、数据集的制作

在训练之前，首先要准备好数据集。CornerNet-Squeeze的数据集是json格式的，而我们常用的labelImg标注软件只能标注txt格式和xml格式，因此我们需要有一套用xml格式转json格式的方法。

1.将图像标注成xml格式的文件

标注过程中需要用到labelImg软件，如果没有可以先自行下载，也可以在百度网盘中下载。

百度网盘链接：https://pan.baidu.com/share/init?surl=lABtNpQF7ABAvhcmnT8o-g，密码：t2r8

百度网盘下载后如果出现闪退现象，可以戳labelImg闪退解决方案

下载完成后打开labelImg/data/目录下的predefined_classes.txt文件，将里面的内容全部删掉，改成自己要标注的类别名。

更改完成后即可打开labelImg/目录下的labelImg.exe执行文件。

点击"Open Dir"选择图片目录，点击"Change Save Dir"选择保存xml文件的目录，确认标注的格式是PascalVOC后，即可开始标注。

2.将数据集分类，并将xml格式转化为json格式

这一部分参考了知乎上《CornerNet-Lite训练自己的数据集》一文。

首先将图片和xml文件进行重命名，保证他们的名称统一，可以采用如下renameData.py代码：

import os, shutil
import random
import numpy as npdef renameData(xmlDir, imgDir):xmlFiles = os.listdir(xmlDir)total = len(xmlFiles)cur = 0for xml in xmlFiles:cur += 1if cur % 500 == 1:print("Total/cur:", total, "/", cur)imgPath = imgDir + xml[:-4] + ".jpg"outName = ("%08d" % (cur))outXMLPath = ("%s/%s.xml" % (xmlDir,outName))outImgPath = ("%s/%s.jpg" % (imgDir,outName))os.rename(xmlDir+xml,outXMLPath)os.rename(imgPath,outImgPath)print("picker number:",cur)if __name__ == '__main__':#下面的图片路径和标注路径注意更改为自己的实际路径xmlDir = "/mnt/disk/home1/cjp/data_pack2/xml/"    imgDir = "/mnt/disk/home1/cjp/data_pack2/imgs/"print(xmlDir)print(imgDir)renameData(xmlDir, imgDir)

上述代码中只需对main函数中的文件路径进行修改。

随后将标注的数据集根据自己的需求分成train、val、test三部分，可以采用如下splitData.py代码：

import random
from random import randint
import os
import numpy as np
import shutildef clc(dir):ls = os.listdir(dir)for i in ls:c_path = os.path.join(dir, i)os.remove(c_path)if __name__ == '__main__':folder_list= ["train","val","test"]#注意更改base_dir为本地实际图像和标注文件路径base_dir = "/mnt/disk/home1/cjp/data_pack2/" for i in range(3):folderName = folder_list[i]xml_dir = base_dir + folderName + "/Annotations/"if not os.path.exists(xml_dir):os.makedirs(xml_dir)train_dir = base_dir + folder_list[0] + "/Annotations/"val_dir = base_dir + folder_list[1] + "/Annotations/"test_dir = base_dir + folder_list[2] + "/Annotations/"clc(train_dir)clc(test_dir)clc(val_dir)total = np.arange(1,1036+1)valList = random.sample(range(1, 1036+1), 100) #val random selectprint(sorted(valList))# move 500 val pics to splited folderfor i in range(0,100):xmlfile = ("%08d" % (valList[i]))file_dir = base_dir + "/xml/"+'/'+ xmlfile +'.xml'shutil.copy(file_dir,val_dir)# take away val ids from total idstotal_val = [] # total ids minus val ids, ids left stay herefor i in range(0,1036):if total[i] not in valList:total_val.append(total[i])testList_temp = random.sample(range(0, 1036-100), 100) #test random selectprint(sorted(testList_temp))testList = []for k in range(0,100):testList.append(total_val[testList_temp[k]])print(sorted(testList))# move 1000 test pics to splited folderfor i in range(0,100):xmlfile = ("%08d" % (testList[i]))file_dir = base_dir + "/xml/"+'/'+ xmlfile +'.xml'shutil.copy(file_dir,test_dir)# take away test ids from total_val idstotal_val_test = [] # total ids minus val&test ids, ids left stay herefor i in range(0,1036-100):if total_val[i] not in testList:total_val_test.append(total_val[i])print(sorted(total_val_test))# move lest train pics to splited folderfor i in range(0,1036-100-100):xmlfile = ("%08d" % (total_val_test[i]))file_dir = base_dir + "/xml/"+'/'+ xmlfile +'.xml'shutil.copy(file_dir,train_dir)

上述代码中需要对main函数中的文件路径进行修改，同时需要根据自己数据集总数以及train、val、test部分中需要的数量对for循环中的数据进行修改。如本段代码中的数据集总数为1036，val、test部分各包含100个数据，train部分包含1036-100-100个数据。

随后根据分好的xml文件将对应的图片也分成train、val、test三个部分，可以采用如下splitImage.py代码：

import os
import numpy as np
import shutildef splitImg(xml_dir,origin_img,img_dir):xmlFiles = os.listdir(xml_dir)for xml in xmlFiles: originimgPath = origin_img + xml[:-4] + ".jpg"newimgPath = img_dir + xml[:-4] + ".jpg"print(shutil.copy(originimgPath,newimgPath))if __name__ == '__main__':folder_list= ["train","val","test"]#注意更改base_dir为本地实际图像和标注文件路径base_dir = "/mnt/disk/home1/cjp/data_pack2/"origin_img = base_dir + "imgs/" for i in range(3):folderName = folder_list[i]xml_dir = base_dir + folderName + "/Annotations/"if not os.path.exists(xml_dir):os.makedirs(xml_dir)img_dir = base_dir + "/images/" + folderName + "/"if not os.path.exists(img_dir):os.makedirs(img_dir)print(img_dir)splitImg(xml_dir,origin_img,img_dir)

上述代码中只需要对main函数中的文件路径进行修改。

最后将分类好的xml类型的文件转化为json文件，可以采用如下xml2json.py代码：

import sys
import os
import json
import xml.etree.ElementTree as ETSTART_BOUNDING_BOX_ID = 1#注意下面的dict存储的是实际检测的类别，需要根据自己的实际数据进行修改
#这里以自己的数据集person和hat两个类别为例，如果是VOC数据集那就是20个类别
#注意类别名称和xml文件中的标注名称一致
PRE_DEFINE_CATEGORIES = {"0": 0, "1": 1}def get(root, name):vars = root.findall(name)return varsdef get_and_check(root, name, length):vars = root.findall(name)if len(vars) == 0:raise NotImplementedError('Can not find %s in %s.'%(name, root.tag))if length > 0 and len(vars) != length:raise NotImplementedError('The size of %s is supposed to be %d, but is %d.'%(name, length, len(vars)))if length == 1:vars = vars[0]return varsdef get_filename_as_int(filename):try:filename = os.path.splitext(filename)[0]return int(filename)except:raise NotImplementedError('Filename %s is supposed to be an integer.'%(filename))def convert(xml_dir, json_file):xmlFiles = os.listdir(xml_dir)json_dict = {"images":[], "type": "instances", "annotations": [],"categories": []}categories = PRE_DEFINE_CATEGORIESbnd_id = START_BOUNDING_BOX_IDnum = 0for line in xmlFiles:# print("Processing %s"%(line))num +=1if num%50==0:print("processing ",num,"; file ",line)#print("processing ",num,"; file ",line)xml_f = os.path.join(xml_dir, line)tree = ET.parse(xml_f)root = tree.getroot()## The filename must be a numberfilename = line[:-4]image_id = get_filename_as_int(filename)size = get_and_check(root, 'size', 1)width = int(get_and_check(size, 'width', 1).text)height = int(get_and_check(size, 'height', 1).text)# image = {'file_name': filename, 'height': height, 'width': width,#          'id':image_id}image = {'file_name': (filename+'.jpg'), 'height': height, 'width': width,'id':image_id}json_dict['images'].append(image)## Cruuently we do not support segmentation#  segmented = get_and_check(root, 'segmented', 1).text#  assert segmented == '0'for obj in get(root, 'object'):category = get_and_check(obj, 'name', 1).textif category == 'bag':print(category,',',line)if category not in categories:print(category,',',line)new_id = len(categories)categories[category] = new_idcategory_id = categories[category]bndbox = get_and_check(obj, 'bndbox', 1)xmin = int(get_and_check(bndbox, 'xmin', 1).text) - 1ymin = int(get_and_check(bndbox, 'ymin', 1).text) - 1xmax = int(get_and_check(bndbox, 'xmax', 1).text)ymax = int(get_and_check(bndbox, 'ymax', 1).text)#print(xmin,xmax,ymin,ymax)assert(xmax > xmin)assert(ymax > ymin)o_width = abs(xmax - xmin)o_height = abs(ymax - ymin)ann = {'area': o_width*o_height, 'iscrowd': 0, 'image_id':image_id, 'bbox':[xmin, ymin, o_width, o_height],'category_id': category_id, 'id': bnd_id, 'ignore': 0,'segmentation': []}json_dict['annotations'].append(ann)bnd_id = bnd_id + 1for cate, cid in categories.items():cat = {'supercategory': 'none', 'id': cid, 'name': cate}json_dict['categories'].append(cat)json_fp = open(json_file, 'w')json_str = json.dumps(json_dict)json_fp.write(json_str)json_fp.close()if __name__ == '__main__':folder_list= ["train","val","test"]#注意更改base_dir为本地实际图像和标注文件路径base_dir = "/mnt/disk/home1/cjp/data_pack2/" for i in range(3):folderName = folder_list[i]#xml_dir = base_dir + folderName + "/"xml_dir = base_dir + folderName + "/Annotations/"json_dir = base_dir + folderName + "/instances_" + folderName + ".json"print("deal: ",folderName)print("xml dir: ",xml_dir)print("json file: ",json_dir)convert(xml_dir,json_dir)

上述代码中只需要对main函数中的文件路径进行修改。

4个函数都执行完成后，原始的数据集路径中会新出现images、train、val、test四个文件夹。其中images文件夹存放了分类后的所有图像，train、val、test文件夹中分别存放了三个数据集的xml和json文件。

3.制作COCO类型的数据集

CornerNet-Squeeze训练的对象是COCO类型的数据集，其目录结构如下：

其中images文件夹存放了分类后的所有图像，与第2步中最后获得的images文件夹相同。annotations文件夹中存放了train、val、test的json文件。

因此第2步完成后，只需进行如下几个步骤即可获得COCO数据集：

创建一个空的文件夹，例如取名为pack
将第2步中的images文件夹直接复制到pack/目录下
在pack/目录下创建annotations文件夹，分别将第2步中train、val、test文件夹中的instances_train.json、instances_val.json、instances_test.json复制到pack/annotations/目录下

二、配置训练环境

首先要编译工程中的Corner Pooling层：

(CornerNet_Lite) cjp@ubuntu3:~/CornerNet-Lite$ cd core/models/py_utls_cpools
(CornerNet_Lite) cjp@ubuntu3:~/CornerNet-Lite/core/models/py_utls_cpools$ python setup.py install --user

随后要编译NMS：

(CornerNet_Lite) cjp@ubuntu3:~/CornerNet-Lite$ cd core/external
(CornerNet_Lite) cjp@ubuntu3:~/CornerNet-Lite/core/external$ make

如果按照笔者的上篇博客《Ubuntu服务器上搭建CornerNet-Lite环境》搭建好了环境，则上述编译过程都不会报错。

编译完成后就要安装COCO的API，在CornerNet-Lite文件夹中创建data文件夹，并在该目录下下载COCO的API：

(CornerNet_Lite) cjp@ubuntu3:~/CornerNet-Lite/data$ git clone https://github.com/cocodataset/cocoapi

将下载的文件夹cocoapi重命名为coco：

(CornerNet_Lite) cjp@ubuntu3:~/CornerNet-Lite/data$ mv cocoapi coco

随后进入PythonAPI目录进行编译：

(CornerNet_Lite) cjp@ubuntu3:~/CornerNet-Lite/data$ cd PythonAPI
(CornerNet_Lite) cjp@ubuntu3:~/CornerNet-Lite/data/PythonAPI$ make install

编译完成后，就可以将步骤一中第3步得到的pack文件夹复制到CornerNet-Lite/data/目录下。至此环境的配置以及数据集的准备全部完成。

三、修改训练参数

这一部分仍然参考了知乎上《CornerNet-Lite训练自己的数据集》这篇文章，主要包含以下几个步骤：

参照CornerNet-Lite/core/dbs/coco.py构建自己的数据读取接口，并保存在CornerNet-Lite/core/dbs/目录下，文件名可自己取，如pack.py。
修改CornerNet-Lite/core/dbs/目录下的__init__.py，指向上述pack.py中定义的数据接口类。
修改CornerNet-Lite/configs/目录下的CornerNet_Squeeze.json配置文件。
修改CornerNet-Lite/core/dbs/detection.py下的line10 self._configs[“categories”] = 4，这一项是训练的类别数，根据自己的需要进行修改。
修改CornerNet-Lite/core/models/CornerNet_Squeeze.py的line94和line95，将相关的变量改成自己数据集的类别数。

具体过程参照知乎上的这篇文章，讲述的非常详细，笔者这里就不进行赘述了。

修改好了之后就可以运行train.py函数进行训练了：

(CornerNet_Lite) cjp@ubuntu3:~/CornerNet-Lite$ python train.py CornerNet_Squeeze

也可以改为输入以下命令进行远程训练：

(CornerNet_Lite) cjp@ubuntu3:~/CornerNet-Lite$ nohup python train.py CornerNet_Squeeze &

远程训练不需要占用控制台，而是将训练过程中的日志打印在当前目录的nohup.out文件中。因此在电脑没有连接服务器的状态下也可以继续进行训练。

当然第一次训练时一定会有各种情况的报错，笔者将自己训练过程中遇到的一些报错和解决方法记录在了文章最后的附录中，供读者进行参考。

四、用训练好的模型进行测试

根据自己的数据集大小和所用GPU资源不同，训练的时间也不一样，一般来说训练过程需要几天的时间。训练完成后就可以运用自己的训练模型进行测试了。

参照CornerNet-Lite/目录下的demo.py文件，只需要将测试的图改成自己的图，将模型改成自己的模型就可以了，包括如下几个步骤：

修改CornerNet-Lite/core目录下的detectors.py文件，将from .dbs.coco import COCO改为from .dbs.pack import COCO（注意这里的pack要改成你自己在步骤三中构建的数据接口文件名称，笔者构建的数据接口文件名称为pack.py，因此这里用.dbs.pack）
将CornerNet_Squeeze类中的model_path改为自己训练得到的CornerNet_Squeeze_50000.pkl（这里的.pkl模型文件名称需要改为你自己训练得到的模型文件名称，如果不知道可以在CornerNet-Lite/cache/nnet/CornerNet_Squeeze/文件目录下查看）
进入CornerNet-Lite文件夹，修改demo.py中的图片为自己要测试的图片

上述步骤完成后运行demo.py：

(CornerNet_Lite) cjp@ubuntu3:~/CornerNet-Lite$ python demo.py

输出的demo_out.jpg图片即为测试结果。

附录：训练过程中遇到的报错与解决方法

(1).ImportError: /mnt/disk/home1/cjp/anaconda/envs/CornerNet_Lite/lib/python3.7/site-packages/kiwisolver.cpython-37m-x86_64-linux-gnu.so : symbol _ZTVNSt7__cxx1118basic_stringstreamIcSt11char_traitsIcESaIcEEE, version GLIBCXX_3.4.21 not defined in file libstdc++.so.6 with link time reference

解决方案：
将相关的库导入本地的路径，输入：

(CornerNet_Lite) cjp@ubuntu3:~/CornerNet-Lite$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/mnt/disk/home1/cjp/anaconda/lib

注意将上述命令中anaconda的的路径改为你自己的路径。

(2).ValueError: numpy.ufunc size changed, may indicate binary incompatibility

解决方案：
运行如下命令更新numpy库：

(CornerNet_Lite) cjp@ubuntu3:~/CornerNet-Lite$ pip install --upgrade numpy

(3).python RuntimeError:cannot join current thread，

解决方案：
运行如下命令更新tqdm库：

(CornerNet_Lite) cjp@ubuntu3:~/CornerNet-Lite$ pip install --upgrade tqdm

(4).RuntimeError: CUDA out of memory

解决方案：
修改CornerNet-Lite/configs/目录下CornerNet_Squeeze.json文件中的batch_size和chunk_sizes，将它们的大小改小之后就不会出现out of memory的报错了，如果改了之后仍然出现，请先输入：

(CornerNet_Lite) cjp@ubuntu3:~/CornerNet-Lite$ nvidia-smi

查看GPU是否处于空闲状态。

(5).RuntimeError: Dimension out of range (expected to be in range of [-2, 1], but got 2)

解决方案：
将CornerNet-Lite/configs/目录下的CornerNet_Squeeze.json文件中每个GPU的chunk_sizes改成一样大小即可，笔者设置的是[2, 2, 2, 2]，也可以改成只用一个GPU，即将chunk_sizes设置为[8]。