YOLOX-PAI:一种改进的YOLOX,比YOLOv6更强更快

原文:https://arxiv.org/pdf/2208.13040.pdf

代码:https://github.com/alibaba/EasyCV

0.Abstract

We develop an all-in-one computer vision toolbox named EasyCV to facilitate the use of various SOTA computer vision methods. Recently, we add YOLOX-PAI, an improved version of YOLOX, into EasyCV. We conduct ablation studies to investigate the in fluence of some detection methods on YOLOX. We also provide an easy use for PAI-Blade which is used to accelerate the inference process based on BladeDISC and TensorRT. Finally, we receive 42.8 mAP on COCO dateset within 1.0 ms on a single NVIDIA V100 GPU, which is a bit faster than YOLOv6. A simple but efficient predictor api is also designed in EasyCV to conduct end2end object detection.

我们开发了一个名为 EasyCV 的一体化计算机视觉工具箱,以方便使用各种 SOTA 计算机视觉方法。最近,我们将 YOLOX 的改进版 YOLOX-PAI 添加到 EasyCV 中。我们进行消融研究以调查某些检测方法对 YOLOX 的影响。我们还为 PAI-Blade 提供了一个简单的用法,用于加速基于 BladeDISC 和 TensorRT 的推理过程。最后,我们在单个 NVIDIA V100 GPU 上在 1.0 毫秒内收到 COCO 数据集上的 42.8 mAP,这比 YOLOv6 快一点。 EasyCV 中还设计了一个简单但高效的预测器 api 来进行端到端对象检测。

1 Introduction

YOLOX (Ge et al., 2021) is one of the most famous one-stage object detection methods, and has been widely used in a various field, such as automatic driving, defect inspection, etc. It introduces the decoupled head and the anchor-free manner into the YOLO series, and receives state-of-the-art results among 40 mAP to 50 mAP.

YOLOX是最著名的单阶段目标检测方法之一,已广泛应用于自动驾驶、缺陷检测等各个领域。它将the decoupled head和anchor-free 方式进入 YOLO 系列,并在 40 mAP 到 50 mAP 之间获得最先进的结果。

Considering its flexibility and efficiency, we intend to integrate YOLOX into our EasyCV, an allin-one computer vision methods that helps even a beginner easily use a computer vision algorithm.In addition, we investigate the improvement upon YOLOX by using different enhancement of the detection backbone, neck, and head. Users can simply set different configs to obtain a suitable object detection model according to their own requirements.Also, based on PAI-Blade (an inference optimization framework by PAI), we further speed up the inference process and provide an easy api to use PAI-Blade in our EasyCV. Finally, we design an efficient predictor api to use our YOLOX-PAI in an end2end manner, which accelerate the original YOLOX by a large margin. The comparisons between YOLOX-PAI and the state-of-the-art object detection methods have been shown in Fig. 1.

考虑到它的灵活性和效率,我们打算将 YOLOX 集成到我们的 EasyCV 中,这是一种一体式计算机视觉方法,即使是初学者也可以轻松使用计算机视觉算法。此外,我们通过使用不同的检测增强来研究对 YOLOX 的改进脊椎、颈部和头部。用户可以根据自己的需求简单地设置不同的配置来获得合适的目标检测模型。另外,基于PAI-Blade(PAI的推理优化框架),我们进一步加快了推理过程,并提供了一个简单易用的api我们 EasyCV 中的 PAI-Blade。最后,我们设计了一个高效的预测器 api,以端到端的方式使用我们的 YOLOX-PAI,大大加速了原始 YOLOX。 YOLOX-PAI 与最先进的目标检测方法之间的比较如图 1 所示。

Figure 1: The comparisons between YOLOX-PAI and the existing methods.

图 1:YOLOX-PAI 与现有方法的比较。

In brief, our main contributions are as follows:

简而言之,我们的主要贡献如下:

• We release YOLOX-PAI in EasyCV as a simple yet efficient object detection tool (containing the docker image, the process of model training, model evaluation and model deployment). We hope that even a beginner can use our YOLOX-PAI to accomplish his object detection tasks.

我们在 EasyCV 中发布 YOLOX-PAI 作为一个简单而高效的对象检测工具(包含 docker 图像、模型训练、模型评估和模型部署的过程)。我们希望即使是初学者也可以使用我们的 YOLOX-PAI 来完成他的目标检测任务。

• We conduct ablation studies of existing object detection methods based on YOLOX, where only a config file is used to construct selfdesigned YOLOX model. With the improvement of the architecture and the efficiency of PAI-Blade, we obtain state-of-the-art object detection results among 40 mAP and 50 mAP within 1ms for model inference on a single NVIDIA Tesla V100 GPU.

**我们对现有的基于 YOLOX 的目标检测方法进行了消融研究,其中仅使用一个配置文件来构建自行设计的 YOLOX 模型。**随着架构的改进和 PAI-Blade 的效率,我们在单个 NVIDIA Tesla V100 GPU 上在 1ms 内获得了 40 mAP 和 50 mAP 中最先进的目标检测结果,用于模型推理。

• We provide a flexible predictor api in EasyCV that accelerates both the preprocess, inference and postprocess procedure, respectively. In this way, user can better use YOLOX-PAI for end2end objection detection task.

我们在 EasyCV 中提供了一个灵活的预测器 api,它分别加速了预处理、推理和后处理过程。这样,用户可以更好地使用 YOLOX-PAI 进行端到端的异物检测任务。

2 Methods

In this section, we will take a brief review of the used methods in YOLOX-PAI. We conduct several improvements on both the detection backbone, neck, head. We also use PAI-Blade to accelerate the inference process.

2.1 Backbone

Recently, YOLOv6 and PP-YOLOE (Xu et al., 2022) have replaced the backbone of CSPNet(Wang et al., 2020) to RepVGG (Ding et al., 2021).

最近,YOLOv6 和 PP-YOLOE (Xu et al., 2022) 已经将 CSPNet (Wang et al., 2020) 的主干替换为 RepVGG (Ding et al., 2021)。

In RepVGG, a 3x3 convolution block is used to replace a multi-branch structure during the inference process, which is beneficial to both save the inference time and improve the object detection results.Following YOLOv6,we also use a RepVGG-based backbone as a choice in YOLOX-PAI.

在 RepVGG 中,

2.2 Neck

We use two methods to improve the performance of YOLOX in the neck of YOLOX-PAI, that is 1) Adaptively Spatial Feature Fusion (ASFF) (Liu et al., 2019) and its variance (denoted as ASFF_Sim) for feature augmentation 2) GSConv (Li et al., 2022), a lightweight convolution block to reduce the compute cost.

我们使用两种方法来提高 YOLOX 在 YOLOX-PAI 颈部的性能,即
1)自适应空间特征融合(ASFF)(Liu 等人,2019)及其方差(表示为 ASFF_Sim)用于特征增强
2) GSConv (Li et al., 2022),一种用于降低计算成本的轻量级卷积块。

The original ASFF method uses several vanilla convolution blocks to first unify the dimension of different feature maps. Inspired by the Focus layer in YOLOv5, we replace the convolution blocks by using the non-parameter slice operation and mean operation to obtain the unified feature maps (denoted as ASFF_Sim). To be specific, the operation for each feature map of the output of YOLOX is defined in Fig. 2.

原始的 ASFF 方法使用几个 vanilla 卷积块来首先统一不同特征图的维度。受 YOLOv5 中 Focus 层的启发,我们通过使用非参数切片操作和均值操作来替换卷积块以获得统一的特征图(表示为 ASFF_Sim)。具体来说,YOLOX 输出的每个特征图的操作在图 2 中定义。

We also use two kinds of GSConv-based neck to optimize YOLOX. The used neck architectures are shown in Fig 3 and Fig 4. The differences of the two architectures are whether to replace all the blocks with GSConv. As proved by the authors, GSconv is specially designed for the neck where the channel reaches the maximum and the the size reaches the minimum.

我们还使用两种基于 GSConv 的颈部来优化 YOLOX。使用的neck架构如图3和图4所示。**两种架构的区别在于是否将所有块替换为GSConv。**正如作者所证明的那样,GSconv 是专门为通道达到最大和尺寸达到最小的颈部设计的。

2.3 Head

We enhance the YOLOX-Head with the attention mechanism as (Feng et al., 2021) to align the task of object detection and classification (denoted as TOOD-Head). The architecture is shown in Fig. 5.

我们使用 (Feng et al., 2021) 的注意机制增强了 YOLOX-Head,以协调对象检测和分类的任务(表示为 TOOD-Head)架构如图 5 所示。


A stem layer is first used to reduce the channel, following by a group of inter convolution layers to obtain the inter feature maps. Finally, the adaptive weight is computed according to different tasks.We test the result of using the vanilla convolution or the repvgg-based convolution in the TOOD-Head, respectively.

首先使用一个stem层来减少通道,然后是一组间卷积层以获得间特征图。最后,根据不同的任务计算自适应权重。我们分别在 TOOD-Head 中测试了使用 vanilla 卷积或基于 repvgg 的卷积的结果。

2.4 PAI-Blade

PAI-Blade is an easy and robust inference optimization framework for model acceleration. It is based on many optimization techniques, such as Blade Graph Optimizer, TensorRT, PAI-TAO (Tensor Accelerator and Optimizer), and so on. PAI-Blade will automatically search for the best method to optimize the input model. Therefore, people without the professional knowledge of model deployment can also use PAI-Blade to optimize the inference process. We integrate the use of PAI-Blade in EasyCV so that users are allowed to obtain an efficient model by simply change the export config.

PAI-Blade 是一个用于模型加速的简单且强大的推理优化框架。它基于许多优化技术,如 Blade Graph Optimizer、TensorRT、PAI-TAO(Tensor Accelerator and Optimizer)等。 **PAI-Blade 将自动搜索优化输入模型的最佳方法。**因此,没有模型部署专业知识的人也可以使用 PAI-Blade 来优化推理过程。我们在 EasyCV 中集成了 PAI-Blade 的使用,让用户只需更改导出配置即可获得高效的模型。

2.5 EasyCV Predictor

Along with the model inference, the preprocess function and the postprocess function are also important in an end2end object detection task, which are often ignored by the existing object detection toolbox. In EasyCV, we allow user to choose whether to export the model with the preprocess/postprocess procedure flexibly. Then, a predictor api is provided to conduct efficient end2end object detection task with only a few lines.

除了模型推断,预处理功能和后处理功能在端到端目标检测任务中也很重要,而现有的目标检测工具箱往往会忽略这些功能。在 EasyCV 中,我们允许用户灵活选择是否使用预处理/后处理程序导出模型。然后,提供了一个预测器 api 来执行高效的端到端对象检测任务,只需几行代码。

3 Experiments

In this section, we report the ablation study results of the above methods on the COCO dataset (Lin et al., 2014) and the comparisons between YOLOXPAI and the SOTA object detection methods.

在本节中,我们报告了上述方法在 COCO 数据集(Lin et al., 2014)上的消融研究结果,以及 YOLOXPAI 和 SOTA 对象检测方法之间的比较。

3.1 Comparisons with the SOTA methods

We select the useful improvements in YOLOX-PAI and compare it with the SOTA YOLOv6 method in Table 1. It can be seen that YOLOX-PAI is much faster compared to the corresponding version of YOLOv6 with a better mAP (i.e., obtain +0.2 mAP and 22% speed up, +0.2 mAP and 13% speed up of the YOLOv6-tiny, and the YOLOv6-s model, respectively).

我们选择 YOLOX-PAI 中有用的改进,并将其与表 1 中的 SOTA YOLOv6 方法进行比较。可以看出,与 YOLOv6 的相应版本相比,YOLOX-PAI 速度要快得多,具有更好的 mAP(即获得 +0.2 mAP YOLOv6-tiny 和 YOLOv6-s 模型分别加速 22%、+0.2 mAP 和 13%)。

3.2 Ablation studies

Influence of Backbone.

As shown in Table 1,YOLOX with a RepVGG-based backbone achieve better mAP with only a little sacrifice of speed. It indeed adds more parameters and flops that may need more computation resource, but does not require much inference time. Considering its effi-ciency, we make it as a flexible config setting in EasyCV.

如表 1 所示,具有基于 RepVGG 的骨干网的 YOLOX 实现了更好的 mAP,而只牺牲了一点速度。它确实增加了更多的参数和触发器,可能需要更多的计算资源,但不需要太多的推理时间。考虑到它的效率,我们将其作为 EasyCV 中的灵活配置设置。

Influence of Neck.

The influence of ASFF and ASFF_Sim are shown in Table 2. It shows that, compared with ASFF, ASFF_Sim is also benefical to improve the detection result with only a little gain of parameters and flops. However, the time cost is much larger and we will implement the CustomOP to optimize it in the future.

ASFF 和 ASFF_Sim 的影响如表 2 所示。这表明,**与 ASFF 相比,ASFF_Sim 也有利于提高检测结果,而参数和触发器的增益很小。**但是,时间成本要大得多,我们将在未来实施 CustomOP 对其进行优化。

The influence of GSConv is shown in Table 3. The result is that GSConv will bring 0.3 mAP and reduce 3% of speed on a single NVIDIA V100 GPU.

GSConv 的影响如表 3 所示。结果是 GSConv 在单个 NVIDIA V100 GPU 上会带来 0.3 mAP 并降低 3% 的速度。

Influence of Head.

The influence of the TOODHead is shown in Table 4. We investigate the influence of different number of inter convolution layers. We show that when adding additional inter convolution layers, the detection results can be better. It is a trade-off between the speed and accuracy to choose a suitable hyperparameter. We also show that when replacing the vanilla convolution with the repconv-based convolution in the inter convolution layers, the result become worse.

TOODHead 的影响如表 4 所示。我们研究了不同数量的互卷积层的影响。我们表明,当添加额外的卷积层时,检测结果会更好。选择合适的超参数是速度和准确性之间的权衡。我们还表明,当在卷积层间用基于 repconv 的卷积替换 vanilla 卷积时,结果会变得更糟。


It will slightly improve the result when using repconvbased cls_conv/reg_conv layer (in Fig. 5) when the stack number is small (i.e., 2, 3).

当堆栈数较小(即 2、3)时,使用基于 repconv 的 cls_conv/reg_conv 层(图 5)会略微改善结果。

3.3 End2end results

Table 5 shows the end2end prediction results of YOLOXs model with different export configs. The keywords in the table are the same as in our EasyCV config file. It is evident that the blade optimization is useful to optimize the inference process. Also, the preprocess process can be greatly speed up by the exported jit model.

表 5 显示了 YOLOXs 模型在不同导出配置下的端到端预测结果。表中的关键字与我们的 EasyCV 配置文件中的关键字相同。很明显,blade优化对于优化推理过程很有用。此外,导出的 jit 模型可以大大加快预处理过程。

As for the postprocess, we are still work on it to realize a better CustomOP that can be optimized by PAI-Blade for a better performance. In the right part of Fig 1, we show that with the optimization of PAI-Blade and EasyCV predictor, we can receive a satisfactory end2end inference time on YOLOX.

至于后处理,我们仍在努力实现更好的 CustomOP,可以通过 PAI-Blade 进行优化以获得更好的性能。在图 1 的右侧,我们展示了通过 PAI-Blade 和 EasyCV 预测器的优化,我们可以在 YOLOX 上获得令人满意的端到端推理时间。

4 Conclusion

In this paper, we introduced YOLOX-PAI, an improved version of YOLOX based on EasyCV. It receives SOTA object detection results among 40 mAP and 50 mAP with the improvement of the model architecture and PAI-Blade. We also provide an easy and efficient predictor api to conduct end2end object detection in a flexible way. EasyCV is an all-in-one toolkit box that focuses on SOTA computer vision methods, especially in the field of self-supervised learning and vision transformer.We hope that users can conduct computer vision tasks immediately and enjoy CV by using EasyCV!

在本文中,我们介绍了 YOLOX-PAI,它是基于 EasyCV 的 YOLOX 的改进版本。随着模型架构和 PAI-Blade 的改进,它在 40 mAP 和 50 mAP 之间接收 SOTA 目标检测结果。我们还提供了一个简单高效的预测器 api,以灵活的方式进行端到端对象检测。 **EasyCV 是一款专注于 SOTA 计算机视觉方法的一体化工具箱,特别是在自监督学习和视觉转换器领域。**我们希望用户可以立即进行计算机视觉任务,并通过使用 EasyCV 享受 CV!

YOLOX-PAI: An Improved YOLOX, Stronger and Faster than YOLOv6相关推荐

  1. YOLOX升级 | 阿里巴巴提出YOLOX-PAI,1ms内精度无敌!

    点击下方卡片,关注"自动驾驶之心"公众号 ADAS巨卷干货,即可获取 点击进入→自动驾驶之心技术交流群 后台回复[数据集下载]获取计算机视觉近30种数据集! EasyCV 是一个一 ...

  2. YOLOX升级 | 阿里巴巴提出YOLOX-PAI,1ms内精度无敌,超越YOLOv6、PP-YOLOE

    EasyCV 是一个一体化计算机视觉工具箱,以方便使用各种 SOTA 计算机视觉方法.最近,作者将 YOLOX 的改进版 YOLOX-PAI 添加到 EasyCV 中.作者进行消融研究以调查某些检测方 ...

  3. YOLOX:高性能目标检测的最新实践 | 报告详解

    近年来,目标检测的工程应用研究中,YOLO系列以快速响应.高精度.结构简单以及容易部署的特点备受工程研究人员的青睐.同时,YOLO系列存在需要手工设定样本正负导致模型泛化能力较差的问题.为了解决此类问 ...

  4. Win10+Torch1.9+CUDA11.1成功配置YOLOX预测环境

    因为windows使用较多,所以想在上面装一个pytorch环境进行学习,之前我根据个人笔记本电脑显卡型号已成功安装了显卡驱动和CUDA,安装最新版pytorch(1.9)也可以调用GPU,后面直接拿 ...

  5. YOLOX论文讲解和无人机检测项目实战

    文章目录 YOLOX介绍 YOLOX新特性 目标检测中的anchor Anchor based方法的缺点 Anchor free 的目标检测 Anchor free 的目标检测 Anchor Free ...

  6. 30、OAK摄像头使用官方的yolox进行初训练和测试

    基本思想:想法是使用OAK深度相机进行目标检测和测距,整个平台在树莓派上运行,同时使用超声波测距测试,对比一下OAK测试距离是否准确,先测试自训练一下yolox,实测一下效果,代码例程是官方的,资料有 ...

  7. 37、测试Yolox+TensorRT Yolox+NCNN Yolox+Tengine

    觉基本思想:最近yolox刚被放出来,因为之前很多项目都是基于TensorRT部署nano,突然想使用Tengine部署一下nano,随手记录一下 分了四步走:1)先测试一下Yolovx在PC端的性能 ...

  8. Tensorrt下的Yolox部署

    这里写目录标题 一.Ubuntu系统的安装与显卡驱动安装 二.Tensorrt的安装 三.YOLOX的安装 四. torch2trt的安装 五.engine文件的准备 根据设备修改源文件 引擎生成 六 ...

  9. 【目标检测】YOLOX ,YOLO系列的集大成者

    文章目录 一.YOLOX简介 二.YOLOX模型结构(Decouple head) 三.YOLOX的改进之处 3.1 数据增强(data augmentation) 3.2 Anchor-free(不 ...

  10. 【YOLOX训练部署】将自己训练的YOLOX权重转化成ONNX 并进行推理

    YOLOX 训练自己的VOC数据集 [YOLOX训练部署]YOLOX训练自己的VOC数据集_乐亦亦乐的博客-CSDN博客YOLOX 环境安装与训练自己标注的VOC数据集:https://blog.cs ...

最新文章

  1. 聊聊Unsafe的一些使用技巧
  2. 故事点数是对工时的度量
  3. sql 树状结构中知道 父节点与孙节点_集群环境中使用Zookeeper实现分布式幂等控制...
  4. apk ionic 破损_cordova – ionic build android不生成任何.apk文件或错误
  5. [Objective-c 基础 - 2.4] 多态
  6. 第五天:Swift拖动 item 重排 CollectionView
  7. 跟熊浩学沟通30讲读后感_怎样提高自己的沟通表达能力
  8. 在SQL Server中使用SQL Coalesce函数
  9. java正则表达式判断整数_java正则表达式判断数字
  10. 工程项目管理问题那么多,什么软件可以实现工程项目管理自动化
  11. C语言编写的爱心代码
  12. ROS--基于机器人操作系统设计与实现
  13. Excel从入门到精通--基础篇
  14. 软件工程课程的实践(综合实践能力创新实训 3)解决方案
  15. 基于压缩传感的脉冲GPR成像技术研究(硕士学位论文初稿20111230)
  16. 转:什么都没有反而可以做得更好
  17. python批量下载bilibili视频_关于bilibili视频下载的一些小思路
  18. XeLaTeX+xeCJK中文字体设置
  19. 【文献阅读】 Sorghum segmentation by skeleton extraction
  20. 基础知识------我所知道的、应该知道的

热门文章

  1. mysql 库存超卖_mysql处理高并发,防止库存超卖
  2. memcmp的用法 详讲
  3. keil5编译器出现Undefined symbol time (referred from xxx.o).
  4. Docker网桥模式ping不通宿主机
  5. 太空大战--声音与特效
  6. ping计算机名获取IP
  7. 来自百度,为什么要重构(Refactoring)
  8. Riverbed宣布收购领先的Wi-Fi网络提供商Xirrus
  9. 码率、帧率和I B P帧
  10. linux系统如何看懂日志信息,Linux系统查看日志信息总结