paper：https://arxiv.org/abs/2003.12063

code：https://github.com/Scalsol/mega.pytorch

1.环境创建

1.1 创建cuda10.0+pytorch1.3.0+python3.7环境

1.2 安装其他工具包

2.制作自己的数据集

2.1 数据集格式

2.2 xml文件格式

2.3 训练和测试文档准备

2.4 参数修改

3.训练和测试代码

4.报错汇总

1.环境创建

1.1 创建cuda10.0+pytorch1.3.0+python3.7环境

docker容器配置指定深度学习环境cuda+python+pytorch_qq_44716044的博客-CSDN博客

1.2 安装其他工具包

conda install ipython pip# mega and coco api dependencies
pip install ninja yacs cython matplotlib tqdm opencv-python scipyexport INSTALL_DIR=$PWD# install pycocotools
cd $INSTALL_DIR
git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
python setup.py build_ext install# install cityscapesScripts
cd $INSTALL_DIR
git clone https://github.com/mcordts/cityscapesScripts.git
cd cityscapesScripts/
python setup.py build_ext install# install apex
cd $INSTALL_DIR
git clone https://github.com/NVIDIA/apex.git
cd apex
python setup.py install --cuda_ext --cpp_ext# install PyTorch Detection
cd $INSTALL_DIR
git clone https://github.com/Scalsol/mega.pytorch.git
cd mega.pytorch# the following will install the lib with
# symbolic links, so that you can modify
# the files if you want and won't need to
# re-build it
python setup.py build developpip install 'pillow<7.0.0'unset INSTALL_DIR# or if you are on macOS
# MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py build develop

【注意】如果不严格按照作者所述的cuda10.0+python3.7+pytorch1.3.0环境配置，在后续安装apex的时候可能会报错

Cuda extensions are being compiled with a version of Cuda that does not match the verson

【以下为错误尝试】

apex安装报错：Cuda extensions are being compiled with a version of Cuda that does not match the verson_沃特么.拆基.达柴机的博客-CSDN博客j

将apex的安装代码

pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

修改为

pip install -v --no-cache-dir ./

但是后续运行时仍会报错

AttributeError: module 'torch.cuda' has no attribute 'amp'

但是如果安装apex报错： Error: command ‘/usr/local/cuda-10.0/bin/nvcc’ failed with exit status 1

参考：http://www.cxyzjd.com/article/qq_22565865/107532012

2.制作自己的数据集

参考作者提供的customize.md文件

2.1 数据集格式

datasets
├── vid_custom
|   |── train
|   |   |── video_snippet_1
|   |   |   |── 000000.JPEG
|   |   |   |── 000001.JPEG
|   |   |   |── 000002.JPEG
|   |   |   ...
|   |   |── video_snippet_2
|   |   |   |── 000000.JPEG
|   |   |   |── 000001.JPEG
|   |   |   |── 000002.JPEG
|   |   |   ...
|   |   ...
|   |── val
|   |   |── video_snippet_1
|   |   |   |── 000000.JPEG
|   |   |   |── 000001.JPEG
|   |   |   |── 000002.JPEG
|   |   |   ...
|   |   |── video_snippet_2
|   |   |   |── 000000.JPEG
|   |   |   |── 000001.JPEG
|   |   |   |── 000002.JPEG
|   |   |   ...
|   |   ...
|   |── annotation
|   |   |── train
|   |   |   |── video_snippet_1
|   |   |   |   |── 000000.xml
|   |   |   |   |── 000001.xml
|   |   |   |   |── 000002.xml
|   |   |   |   ...
|   |   |   |── video_snippet_2
|   |   |   |   |── 000000.xml
|   |   |   |   |── 000001.xml
|   |   |   |   |── 000002.xml
|   |   |   |   ...
|   |   ...
|   |   |── val
|   |   |   |── video_snippet_1
|   |   |   |   |── 000000.xml
|   |   |   |   |── 000001.xml
|   |   |   |   |── 000002.xml
|   |   |   |   ...
|   |   |   |── video_snippet_2
|   |   |   |   |── 000000.xml
|   |   |   |   |── 000001.xml
|   |   |   |   |── 000002.xml
|   |   |   |   ...
|   |   ...

【注意事项】

1.图片编号是从0开始的6位数字；

2.annotation内的xml文件与train、val钟文件一一对应。

2.2 xml文件格式

参考：ImageNet VID数据集介绍 - 知乎 (zhihu.com)

【注意事项】

1.xml文件内的name名称对应于标准ImageNetVID的类别，一个编号对应于一个类别，也可以在代码中找到（后续参数更改提到）；

2.更多详细介绍请参照上述链接。

2.3 训练和测试文档准备

1.VID_train_15frames.txt

train/ILSVRC2015_VID_train_0000/ILSVRC2015_train_00000000 1 10 300
train/ILSVRC2015_VID_train_0000/ILSVRC2015_train_00000000 1 30 300
train/ILSVRC2015_VID_train_0000/ILSVRC2015_train_00000000 1 50 300
train/ILSVRC2015_VID_train_0000/ILSVRC2015_train_00000000 1 70 300
train/ILSVRC2015_VID_train_0000/ILSVRC2015_train_00000000 1 90 300
train/ILSVRC2015_VID_train_0000/ILSVRC2015_train_00000000 1 110 300
train/ILSVRC2015_VID_train_0000/ILSVRC2015_train_00000000 1 130 300
train/ILSVRC2015_VID_train_0000/ILSVRC2015_train_00000000 1 150 300
train/ILSVRC2015_VID_train_0000/ILSVRC2015_train_00000000 1 170 300
train/ILSVRC2015_VID_train_0000/ILSVRC2015_train_00000000 1 190 300
train/ILSVRC2015_VID_train_0000/ILSVRC2015_train_00000000 1 210 300
train/ILSVRC2015_VID_train_0000/ILSVRC2015_train_00000000 1 230 300
train/ILSVRC2015_VID_train_0000/ILSVRC2015_train_00000000 1 250 300
train/ILSVRC2015_VID_train_0000/ILSVRC2015_train_00000000 1 270 300
train/ILSVRC2015_VID_train_0000/ILSVRC2015_train_00000000 1 290 300
train/ILSVRC2015_VID_train_0000/ILSVRC2015_train_00001000 1 1 48
train/ILSVRC2015_VID_train_0000/ILSVRC2015_train_00001000 1 4 48
train/ILSVRC2015_VID_train_0000/ILSVRC2015_train_00001000 1 8 48
train/ILSVRC2015_VID_train_0000/ILSVRC2015_train_00001000 1 11 48
train/ILSVRC2015_VID_train_0000/ILSVRC2015_train_00001000 1 14 48
train/ILSVRC2015_VID_train_0000/ILSVRC2015_train_00001000 1 17 48
train/ILSVRC2015_VID_train_0000/ILSVRC2015_train_00001000 1 20 48
train/ILSVRC2015_VID_train_0000/ILSVRC2015_train_00001000 1 24 48
train/ILSVRC2015_VID_train_0000/ILSVRC2015_train_00001000 1 27 48
train/ILSVRC2015_VID_train_0000/ILSVRC2015_train_00001000 1 30 48
train/ILSVRC2015_VID_train_0000/ILSVRC2015_train_00001000 1 33 48
train/ILSVRC2015_VID_train_0000/ILSVRC2015_train_00001000 1 36 48
train/ILSVRC2015_VID_train_0000/ILSVRC2015_train_00001000 1 40 48
train/ILSVRC2015_VID_train_0000/ILSVRC2015_train_00001000 1 43 48
train/ILSVRC2015_VID_train_0000/ILSVRC2015_train_00001000 1 46 48
...

【说明】

（1）作者的解释：每一行4列依次代表：video folder， no meaning(just ignore it)，frame number，video length；

（2）需要注意的是video length可以看作某个视频文件夹内的图片总数，frame number可以人为设置，只要满足在一个视频文件夹内选择15帧作为关键帧即可。

2.VID_val_frames.txt

val/ILSVRC2015_val_00000000/000000 1
val/ILSVRC2015_val_00000000/000001 2
val/ILSVRC2015_val_00000000/000002 3
val/ILSVRC2015_val_00000000/000003 4
val/ILSVRC2015_val_00000000/000004 5
val/ILSVRC2015_val_00000000/000005 6

但是在训练的时候使用这个文档会报错：AttributeError: 'VIDMEGADataset' object has no attribute 'frame_seg_len' ，所以最终使用的是下面的文档

3.VID_val_videos.txt（较简单，不再叙述）

以上三个文档具体参考源MEGA代码中datasets\ILSVRC2015\ImageSets提供的文档。

2.4 参数修改

1.mega_core/data/datasets/vid.py修改VIDDataset内classes和classes_map：

classes=['__background__',#alwaysindex0'car']classes_map=['__background__',#alwaysindex0'n02958343']

2.mega_core/config/paths_catalog.py

（1）修改 DatasetCatalog.DATASETS

"vid_custom_train":{"img_dir":"vid_custom/train","anno_path":"vid_custom/annotation","img_index":"vid_custom/VID_train_15frames.txt"},"vid_custom_val":{"img_dir":"vid_custom/val","anno_path":"vid_custom/annotation","img_index":"vid_custom/VID_val_frames.txt"}

（2）修改if函数下if语句，添加上vid条件

if("DET"inname)or("VID"inname)or("vid"inname):

3.修改configs/BASE_RCNN_4gpu.yaml（取决于你用几张gpu训练）

num_classes:2#(物体类别数+背景）TRAIN: ("vid_custom_train",)#记得加“,”TEST: ("vid_custom_val",)#记得加“,”

3.训练和测试代码

1.训练代码（训练脚本会自动在训练完成进行测试，如果不想测试可注释掉测试代码）

python -m torch.distributed.launch     --nproc_per_node=4     tools/train_net.py     --master_port=$((RANDOM + 10000))     --config-file configs/MEGA/vid_R_101_C4_MEGA_1x.yaml         OUTPUT_DIR training_dir/MEGA_R_101_1x

2.测试代码

python -m torch.distributed.launch     --nproc_per_node 4     tools/test_net.py     --config-file configs/MEGA/vid_R_101_C4_MEGA_1x.yaml  MODEL.WEIGHT training_dir/MEGA_R_101_1x/model_final.pth

【注意】上述提供的测试和训练代码与作者提供的稍有不同，都删去了---motion-specific参数。

如果不删去该参数会报错：IndexError: list index out of range，具体原因是添加上述参数时代码会加载一个测试模型（大概可以这么说？），与自己制作的的数据集不配适，故删去。（以上只是自己的粗略推测，仅供参考，如果有人知道具体原因，望告知）

4.报错汇总

（1）FileNotFoundError: [Errno 2] No such file or directory: 'configs/BASE_RCNN_3gpu.yaml'

解决：作者只提供了1或4个GPU的训练配置文件，改一下训练代码里的--nproc_per_node参数为1或4即可。

（2）FileNotFoundError: [Errno 2] No such file or directory: 'datasets/ILSVRC2015/Annotations/DET/train/ILSVRC2014_train_0000/ILSVRC2014_train_00000663.xml'

解决：检查自己的数据集路径是否错误。

（3）ValueError: Type mismatch (<class 'tuple'> vs. <class 'str'>) with values (() vs. vid_custom_train) for config key: DATASETS.TRAIN

FileNotFoundError: [Errno 2] No such file or directory: 'datasets/ILSVRC2015/ImageSets/DET_train_30classes.txt'

FileNotFoundError: [Errno 2] No such file or directory: 'datasets/ILSVRC2015/ImageSets/VID_train_15frames.txt'

解决：两个yml文件里需要修改训练参数，并在最后加上两个逗号

TRAIN: ("vid_custom_train",)TEST: ("vid_custom_val",)

（4）RuntimeError: copy_if failed to synchronize: device-side assert triggered

解决：num_classes参数设置错误，类别应该是1+1，需要加上背景

（5）xml.etree.ElementTree.ParseError: mismatched tag: line 197, column 14

解决：检查自己的xml文件，一般出错的应该是每个文件夹的第一个或者最后一个。

（6）RuntimeError: Error(s) in loading state_dict for GeneralizedRCNN

解决：训练时使用的时4个gpu，但是测试时使用的是1个gpu，故可以修改测试代码（如下）或者修改BASE_RCNN_1gpu.yaml。

args=parser.parse_args()
cfg.merge_from_file("configs/BASE_RCNN_4gpu.yaml")#改为4gpu，因为1gpu没改配置文件
cfg.merge_from_file(args.config)
cfg.merge_from_list(["MODEL.WEIGHT",args.checkpoint])

（7）AttributeError: 'VIDMEGADataset' object has no attribute 'frame_seg_len'

解决：如前所述，更改验证文档

（8）使用--motion-specific参数报错

  for i in range(len(motion_ious['motion_iou']))])
Evaluating motion iou range 0.0 - 1.0
Traceback (most recent call last):File "tools/train_net.py", line 226, in <module>main()File "tools/train_net.py", line 222, in mainrun_test(cfg, model, args.distributed, args.motion_specific)File "tools/train_net.py", line 137, in run_testoutput_folder=output_folder,File "/workspace/mega2/mega.pytorch/mega_core/engine/inference.py", line 132, in inference**extra_args)File "/workspace/mega2/mega.pytorch/mega_core/data/datasets/evaluation/__init__.py", line 29, in evaluatereturn vid_evaluation(**args)File "/workspace/mega2/mega.pytorch/mega_core/data/datasets/evaluation/vid/__init__.py", line 15, in vid_evaluationlogger=logger,File "/workspace/mega2/mega.pytorch/mega_core/data/datasets/evaluation/vid/vid_eval.py", line 51, in do_vid_evaluationuse_07_metric=FalseFile "/workspace/mega2/mega.pytorch/mega_core/data/datasets/evaluation/vid/vid_eval.py", line 149, in eval_detection_vidmotion_range=motion_range,File "/workspace/mega2/mega.pytorch/mega_core/data/datasets/evaluation/vid/vid_eval.py", line 180, in calc_detection_vid_prec_recif motion_iou[gt_index] < motion_range[0] or motion_iou[gt_index] > motion_range[1]:
IndexError: list index out of range

（9）subprocess.CalledProcessError: Command '['/usr/bin/python', '-u', 'tools/train_net.py', '--local_rank=3', '--master_port=36874', '--config-file', 'configs/MEGA/vid_R_101_C4_MEGA_1x.yaml', '--motion-specific', 'OUTPUT_DIR', 'training_dir/MEGA_R_101_1x']' returned non-zero exit status 1.

未解决，但是只要其他没问题，这条报错也不会出现

【参考】

2021-06-30 MEGA论文阅读+代码复现_qq_42068904的博客-CSDN博客

MEGA配置和在单GPU训练_莫愁-前路的博客-CSDN博客

MEGA环境配置，踩坑无数，记录。_没拿过奖学金的废物的博客-CSDN博客

MEGA训练自己的数据集-docker相关推荐

手把手教你使用YOLOV5训练自己的数据集并用TensorRT加速
点击上方"3D视觉工坊",选择"星标" 干货第一时间送达前言本文主要介绍目标检测YOLOV5算法来训练自己的数据集,并且使用TensorRT来对训练好的模型 ...
MMdetection3d环境搭建、使用MMdetection3d做3D目标检测训练自己的数据集、测试、可视化，以及常见的错误
MMdetection3d环境搭建.使用MMdetection3d做3D目标检测训练自己的数据集.测试.可视化,以及常见的错误 1 mmdetection3d环境搭建与测试 1.1 从docker开始 ...
NVIDIA之TLT迁移学习训练自己的数据集
0 背景 NVIDIA Transfer Learning Toolkit是英伟达推出的迁移学习工具包,使用提供的预训练模型来训练自己的数据,并可以完成模型裁剪.导出到deepstream框架中,实现 ...
YOLO | 用YOLOv7训练自己的数据集（超详细版）
一.环境设置本文环境设置:Ubuntu (docker) pytorch-gpu 1.远程Ubuntu新建一个新的docker 容器以下命令是创建一个名称为torch_yolo的gpu容器.如果没 ...
yolov5训练自己的数据集
How to Train YOLOv5 On a Custom Dataset 根据这篇文章下的数据集 YOLOv5训练自己的数据集根据这篇文章,输入训练命令行 Hyperparameter ...
实践教程｜如何用YOLOX训练自己的数据集？
作者 | JuLec@知乎(已授权) 来源 | https://zhuanlan.zhihu.com/p/402210371 编辑 | 极市平台导读 Yolo系列因为其灵活性,一直是目标检测热门算法 ...
mask rcnn训练自己的数据集
原文首发于微信公众号「3D视觉工坊」--mask rcnn训练自己的数据集前言最近迷上了mask rcnn,也是由于自己工作需要吧,特意研究了其源代码,并基于自己的数据进行训练~ 本博客参考:ht ...
YOLOv3 Darknet安装编译与训练自己的数据集
YOLOv3 Darknet安装编译与训练自己的数据集文章目录: 1安装编译darknet与测试darknet 1.1 安装编译darknet 1.1.1 下载darknet安装包 1.1.2 编译 ...
yolov5训练自己的数据集（缺陷检测分类——看了就懂系列）
这里记录一下将yolov5用于训练自己的数据集,这里v5的github地址 Environment 本文是直接 pip install -r requirement.txt直接安装成功,如果不容易成功 ...

MEGA训练自己的数据集-docker

1.环境创建

1.1 创建cuda10.0+pytorch1.3.0+python3.7环境

1.2 安装其他工具包

2.制作自己的数据集

2.1 数据集格式

2.2 xml文件格式

2.3 训练和测试文档准备

2.4 参数修改

3.训练和测试代码

4.报错汇总

MEGA训练自己的数据集-docker相关推荐

最新文章

热门文章