实践教程｜如何用YOLOX训练自己的数据集？

作者 | JuLec@知乎（已授权）

来源 | https://zhuanlan.zhihu.com/p/402210371

编辑 | 极市平台

导读

Yolo系列因为其灵活性，一直是目标检测热门算法。无奈用它训练自己的数据集有些不好用，于是有空就搞了一下，训练自己的数据集。

代码：https://github.com/Megvii-BaseDetection/YOLOX

论文：https://arxiv.org/abs/2107.08430

Yolo系列因为其灵活性，一直是目标检测热门算法。无奈用它训练自己的数据集有些不好用，于是有空就搞了一下，训练自己的数据集。

1.安装YOLOX

git clone git@github.com:Megvii-BaseDetection/YOLOX.git
cd YOLOX
pip3 install -U pip && pip3 install -r requirements.txt
pip3 install -v -e .  # or  python3 setup.py develop
pip3 install cython; pip3 install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

2.下载预训练权重

https://github.com/Megvii-BaseDetection/YOLOX/blob/main/exps/default/yolox_s.py

3.准备自己的Voc数据集

-----datasets------VOCdevkit------DATA_NAME   #  你自己存储数据集的文件夹名称------JPEGImages------000000000000000.jpg------Annotations------000000000000000.xml------ImageSets-------Main------trainval.txt------test.txt

4.配置文件编辑（config.yaml）

CLASSES:
- person             # 数据集的标签，本教程只检测人
CLASSES_NUM: 1       # 待检测的类别个数
SUB_NAME: 'custom'   #  上一步中的DATA_NAME

5.修改yolox文件，适配自己的数据集

5.1

首先在exps/example/yolox_voc/yolox__voc_s.py文件最前面写入下面的代码，主要是采用yaml解析config.yaml获得SUB_NAME

import sys
sys.path.insert(1,"../../")
# parseYaml库是自己编写的用于解析yaml
import parseYaml
cfg = parseYaml.get_config("./config.yaml")DATA_NAME = cfg.SUB_NAME

注：parseYaml脚本如下：

import yaml
import os
from easydict import EasyDict as edict
class YamlParser(edict):""" This is yaml parser based on EasyDict."""def __init__(self, cfg_dict=None, config_file=None):if cfg_dict is None:cfg_dict = {}if config_file is not None:assert(os.path.isfile(config_file))with open(config_file, 'r') as fo:cfg_dict.update(yaml.load(fo.read(),Loader=yaml.FullLoader))super(YamlParser, self).__init__(cfg_dict)def merge_from_file(self, config_file):with open(config_file, 'r') as fo:self.update(yaml.load(fo.read()))def merge_from_dict(self, config_dict):self.update(config_dict)def get_config(config_file=None):return YamlParser(config_file=config_file)

5.2 修改voc_classes.py

cfg = parseYaml.get_config("./config.yaml")
if cfg.CUSTOM:VOC_CLASSES = cfg.CLASSES
else:VOC_CLASSES = ("person","aeroplane","bicycle","bird","boat","bus","bottle","car","cat","chair","cow","diningtable","dog","horse","motorbike","pottedplant","sheep","sofa","train","tvmonitor",)

5.3

修改Exp类的_init__方法，主要是采用yaml解析获得CLASS__NUM

def __init__(self):super(Exp, self).__init__()self.num_classes = cfg.CLASSES_NUM    # 获得检测的类别个数self.depth = 0.33self.width = 0.50self.exp_name = os.path.split(os.path.realpath(__file__))[1].split(".")[0]

5.4 修改数据加载过程

dataset = VOCDetection(data_dir=os.path.join(get_yolox_datadir(), "VOCdevkit"),# image_sets=[('2007', 'trainval'), ('2012', 'trainval')],image_sets=[(DATA_NAME, 'trainval')],      # 适配自己的数据集名称img_size=self.input_size,preproc=TrainTransform(rgb_means=(0.485, 0.456, 0.406),std=(0.229, 0.224, 0.225),max_labels=50,),custom=True,                                # 新增custom参数)

5.5

根据5.3中的custom参数，修改voc.py中的VOCDetection的_init_方法

class VOCDetection(Dataset):def __init__(self,data_dir,image_sets=[('2007', 'trainval'), ('2012', 'trainval')],img_size=(416, 416),preproc=None,target_transform=AnnotationTransform(),dataset_name="VOC0712",custom = True                      # 新增):super().__init__(img_size)self.root = data_dirself.image_set = image_setsself.img_size = img_sizeself.preproc = preprocself.target_transform = target_transformself.name = dataset_nameself._annopath = os.path.join("%s", "Annotations", "%s.xml")self._imgpath = os.path.join("%s", "JPEGImages", "%s.jpg")self._classes = VOC_CLASSESself.ids = list()self.custom = customif self.custom:            # 处理自己的数据集self.base_dir,self.custom_name = image_sets[0]    # DATA_NAMErootpath = os.path.join(self.root, self.base_dir)for line in open(os.path.join(rootpath, "ImageSets", "Main", self.custom_name + ".txt")):self.ids.append((rootpath, line.strip()))else:                     # 处理默认的Voc数据集for (year, name) in image_sets:self._year = yearrootpath = os.path.join(self.root, "VOC" + year)for line in open(os.path.join(rootpath, "ImageSets", "Main", name + ".txt")):self.ids.append((rootpath, line.strip()))

5.6 修改get_eval_loader方法

valdataset = VOCDetection(data_dir=os.path.join(get_yolox_datadir(), "VOCdevkit"),# image_sets=[('2007', 'test')],image_sets=[(DATA_NAME, 'test')],img_size=self.test_size,preproc=ValTransform(rgb_means=(0.485, 0.456, 0.406),std=(0.229, 0.224, 0.225),),custom=True,)

6.执行训练

python tools/train.py -f exps/example/yolox_voc/yolox_voc_s.py -expn TEST -d 4 -b 64 --fp16 -o -c weights/yolox_s.pth

7.执行推理验证

python tools/demo.py image/video/webcam -f exps/example/yolox_voc/yolox_voc_s.py -c YOLOX_outputs/yolox_voc_s/best_ckpt.pth.tar --path img/1.jpg --conf 0.25 --nms 0.45 --tsize 640 --save_result --device gpu# if choose webcam--camid 0/"rtsp:"

本文仅做学术分享，如有侵权，请联系删文。

3D视觉精品课程推荐：

1.面向自动驾驶领域的多传感器数据融合技术

2.面向自动驾驶领域的3D点云目标检测全栈学习路线！(单模态+多模态/数据+代码)
3.彻底搞透视觉三维重建：原理剖析、代码讲解、及优化改进
4.国内首个面向工业级实战的点云处理课程
5.激光-视觉-IMU-GPS融合SLAM算法梳理和代码讲解
6.彻底搞懂视觉-惯性SLAM：基于VINS-Fusion正式开课啦
7.彻底搞懂基于LOAM框架的3D激光SLAM: 源码剖析到算法优化
8.彻底剖析室内、室外激光SLAM关键算法原理、代码和实战(cartographer+LOAM +LIO-SAM)

9.从零搭建一套结构光3D重建系统[理论+源码+实践]

10.单目深度估计方法：算法梳理与代码实现

11.自动驾驶中的深度学习模型部署实战

12.相机模型与标定(单目+双目+鱼眼）

13.重磅！四旋翼飞行器：算法与实战

重磅！3DCVer-学术论文写作投稿 交流群已成立

扫码添加小助手微信，可申请加入3D视觉工坊-学术论文写作与投稿微信交流群，旨在交流顶会、顶刊、SCI、EI等写作与投稿事宜。

同时也可申请加入我们的细分方向交流群，目前主要有3D视觉、CV&深度学习、SLAM、三维重建、点云后处理、自动驾驶、多传感器融合、CV入门、三维测量、VR/AR、3D人脸识别、医疗影像、缺陷检测、行人重识别、目标跟踪、视觉产品落地、视觉竞赛、车牌识别、硬件选型、学术交流、求职交流、ORB-SLAM系列源码交流、深度估计等微信群。

一定要备注：研究方向+学校/公司+昵称，例如：”3D视觉 + 上海交大 + 静静“。请按照格式备注，可快速被通过且邀请进群。原创投稿也请联系。

▲长按加微信群或投稿

▲长按关注公众号

3D视觉从入门到精通知识星球：针对3D视觉领域的视频课程（三维重建系列、三维点云系列、结构光系列、手眼标定、相机标定、激光/视觉SLAM、自动驾驶等）、知识点汇总、入门进阶学习路线、最新paper分享、疑问解答五个方面进行深耕，更有各类大厂的算法工程人员进行技术指导。与此同时，星球将联合知名企业发布3D视觉相关算法开发岗位以及项目对接信息，打造成集技术与就业为一体的铁杆粉丝聚集区，近4000星球成员为创造更好的AI世界共同进步，知识星球入口：

学习3D视觉核心技术，扫描查看介绍，3天内无条件退款

圈里有高质量教程资料、答疑解惑、助你高效解决问题

觉得有用，麻烦给个赞和在看~