今天开始复现2021 CVPR ReDet
ReDet: A Rotation-equivariant Detector for Aerial Object Detection
原文GitHub:https://github.com/csuhan/ReDet
复现环境基于autodl租用的RTX3090(2.6¥/小时),数据集是HRSC2016,36epochs大约两小时,从调试到跑通共消费在20元左右,如果环境没问题,单训练36epochs大约5块钱。
注意。我第一次没跑成功以及遇到的问题在后边的记流水账版本。前边的这个版本跑成功了。
数据集HRSC2016好多人都找不到,我在这里贴一下我存到百度网盘的链接:
链接:https://pan.baidu.com/s/1saGxrQ6B0MWhc_DvR6Uf1A?pwd=zwg6
提取码:zwg6
–来自百度网盘超级会员V6的分享
他里边有好几个压缩包,你要全部选中,一起解压

1.Installation

Requirements
Linux
Python 3.5/3.6/3.7
PyTorch 1.1/1.3.1
CUDA 10.0/10.1
NCCL 2+
GCC 4.9+
mmcv<=0.2.14

官方提示的Requirements如上所示,我选配的AUTODL的服务器是
RTX3090 ,
PyTorch 1.8.1
Python 3.8
Cuda 11.1

Install ReDet

a. Create a conda virtual environment and activate it. Then install Cython.

先创建一个conda环境名叫redet,python版本3.7,然后安装cython

conda create -n redet python=3.7 -y
source activate redet
conda install cython

然后补充上边的mmcv==0.2.13(后边mmdet 0.6.0不支持0.2.14,所以用0.2.13)

pip install mmcv==0.2.13

b. Install PyTorch and torchvision following the official instructions.

这里因为我们的版本不同,用
从pytorch官网找官方语句
https://pytorch.org/get-started/previous-versions/

pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html

tips:这里官方提示了许多版本的相关问题,我上一次调试就是因为版本不对,pytorch1.1.0却用的3090,肯定不行,这里有想法的同学可以自己去看,这里不再赘述。

Note:
1.If you want to use Pytorch>1.5, you have to made some modifications to the cuda ops. See here for a reference.
2.There is a known bug happened to some users but not all (As I have successfully run it on V100 and Titan Xp). If it occurs, please refer to here.
3.If you want to use Python<=3.6, you need to install e2cnn@legacy_py3.6 mamually, see here for an instruction.

c. Clone the ReDet repository.

上一步没操作完不要紧,可以再开一个链接操作着这个(如果听不懂就当我没说)

git clone https://github.com/csuhan/ReDet.git
cd ReDet

在这里创建一个文件夹data(他源代码就是这样的,方便调bug)

然后我们可以先把数据集同步进来,我自己上传到autodl的网盘上的,开的多线(窗)程(口),(不懂就当我没说)

cp /root/autodl-nas/HRSC2016 /root/ReDet/data/ -r

d. Compile cuda extensions.

然后就到了激动人心的bash了,前情介绍,这里bug巨多,如果你成功了算你幸运,我的环境需要替换mmdet/ops里的所有AT_CHECK为TORCH_CHECK。
这里用一下GitHub在issue里边一位大神的代码,作用是把该文件夹中所有文件遍历,然后修改文件中的AT_CHECK为TORCH_CHECK。我是在Redet/mmdet/ops里边运行的,因为在系统里运行太漫长了

find . -type f -exec sed -i 's/AT_CHECK/TORCH_CHECK/g' {} +

然后再进行编译:

bash compile.sh

报错:

(redet) root@container-2f3811a53c-c526a191:~/ReDet# bash compile.sh
Building roi align op...
Traceback (most recent call last):File "setup.py", line 2, in <module>from torch.utils.cpp_extension import BuildExtension, CUDAExtension
ModuleNotFoundError: No module named 'torch'

居然是没有安装好pytorch
回过头去看:

(redet) root@container-2f3811a53c-c526a191:~/redet# pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
Looking in indexes: http://mirrors.aliyun.com/pypi/simple
Looking in links: https://download.pytorch.org/whl/torch_stable.html
Collecting torch==1.8.1+cu111Downloading https://download.pytorch.org/whl/cu111/torch-1.8.1%2Bcu111-cp37-cp37m-linux_x86_64.whl (1982.2 MB)|███████████████████████████████ | 1922.0 MB 31 kB/s eta 0:31:26Killed

好家伙,刚才科学上网不小心把远程连接断开了,失误失误。
安装等待ing 16:22开始,看视频ing(不看了,调了一个小时了,站起来歇歇老腰)
ma de 又一遍还是killled,网上找的解决办法是后边加个尾缀试试

pip install xxxx--no-cache-dir
pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html --no-cache-dir

网友评论显示很有用,包括下载pip包也是,我没试过。
pytorch安装能行了,但不确定是不是这个原因。
这下提示安装成功了

Successfully installed pillow-9.0.1 torch-1.8.1+cu111 torchaudio-0.8.1 torchvision-0.9.1+cu111 typing-extensions-4.1.1

编译一手。
注意了一下编译报的错,编译

/root/miniconda3/envs/redet/lib/python3.7/site-packages/torch/include/ATen/core/TensorBody.h:303:30: note: declared hereDeprecatedTypeProperties & type() const {^~~~File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1561, in _get_cuda_arch_flagsarch_list[-1] += '+PTX'
IndexError: list index out of range

好像影响不大,有个疑问就是我用的是无卡模式,不知道是否影响编译。

gcc: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-7/README.Bugs> for instructions.
error: command 'gcc' failed with exit status 4

这也有个大红error,网友说内存不够,那有卡的时候再跑一边。
先不管了,下一步走起。

e. Install ReDet (other dependencies will be installed automatically).

python setup.py develop
# or "pip install -e ."

中途有卡住的地方自己手动pip就行

Install DOTA_devkit

sudo apt-get install swig
cd DOTA_devkit
swig -c++ -python polyiou.i
python setup.py build_ext --inplace

第一行我得用conda install swig

吃饭去了。

2022年3月27日,早上起来,10点20分,租了RTX 3090,再bash compile.sh一遍,看看跟内存相关的那个大红error还有没有。
目前没有,希望一切正常.目前我没看到错误,已经编译完成了hhh。
然后准备txt,

2.get start

准备数据集,
我的程序放在

/root/ReDet

我的数据集放在

/root/ReDet/data/HRSC2016

由于HRSC2016带着的imageSets不行,和Train、Test里边对应的图片不符,自己手写了generate_txt.py来生成train.txt和test.txt

import os
import re
images_path = '/root/ReDet/data/HRSC2016/Train/images'   # 图片存放目录
txt_save_path = '/root/ReDet/data/HRSC2016/train.txt'  # 生成的图片列表清单txt文件名
fw = open(txt_save_path, "w")
for filename in os.listdir(images_path):print(filename.split(".")[0])  fw.write(filename.split(".")[0] + '\n') images_path = '/root/ReDet/data/HRSC2016/Test/images'   # 图片存放目录
txt_save_path = '/root/ReDet/data/HRSC2016/test.txt'  # 生成的图片列表清单txt文件名
fw = open(txt_save_path, "w")
for filename in os.listdir(images_path):print(filename.split(".")[0])  fw.write(filename.split(".")[0] + '\n')

然后运行

python DOTA_devkit/HRSC20162COCO.py

然后,把他提供的文件放到新建的work_dirs里边

 cp /root/autodl-nas/ReDet_re50_refpn_3x_hrsc2016/ /root/ReDet/work_dirs/ -r

测试test.py

python tools/test.py configs/ReDet/ReDet_re50_refpn_3x_hrsc2016.py work_dirs/ReDet_re50_refpn_3x_hrsc2016/ReDet_re50_refpn_3x_hrsc2016-d1b4bd29.pth --out work_dirs/ReDet_re50_refpn_3x_hrsc2016/results.pkl

输出:


ReResNet Orientation: 8 Fix Params: False
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
/root/miniconda3/envs/redet/lib/python3.7/site-packages/e2cnn-0.2.1-py3.7.egg/e2cnn/nn/modules/r2_conv/basisexpansion_singleblock.py:80:
UserWarning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead. (Triggered internally at
/pytorch/aten/src/ATen/native/IndexingUtils.h:30.)full_mask[mask] = norms.to(torch.uint8)
The model and loaded state dict do not match exactlymissing keys in source state_dict: neck.fpn_convs.0.conv.expanded_bias, backbone.layer3.5.conv3.filter, neck.fpn_convs.0.conv.filter,
此处略过20行
backbone.layer4.0.conv3.filter, backbone.conv1.filter

尼玛,终于显示了:

[                                                  ] 0/444, elapsed: 0s, ETA:/root/ReDet/mmdet/core/bbox/transforms.py:56: UserWarning: This overload of addcmul is deprecated:addcmul(Tensor input, Number value, Tensor tensor1, Tensor tensor2, *, Tensor out)
Consider using one of the following signatures instead:addcmul(Tensor input, Tensor tensor1, Tensor tensor2, *, Number value, Tensor out) (Triggered internally at  /pytorch/torch/csrc/utils/python_arg_parser.cpp:1005.)gx = torch.addcmul(px, 1, pw, dx)  # gx = px + pw * dx
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 444/444, 3.2 task/s, elapsed: 138s, ETA:     0s
writing results to work_dirs/ReDet_re50_refpn_3x_hrsc2016/results.pkl
(redet) root@container-2f3811a53c-c526a191:~/ReDet#

我猜应该是行了。泪目啊。

试试评价
先把hrsc2016_evaluation.py里边的几行改了

 detpath = r'work_dirs/Task1_{:s}.txt'#annopath = r'data/HRSC2016/Test/labelTxt/{:s}.txt'  # change the directory to the path of val/labelTxt, if you want to do evaluation on the valsetimagesetfile = r'data/HRSC2016/test.txt'

然后运行

python DOTA_devkit/hrsc2016_evaluation.py

显示的东西咱也看不懂。只认得最后那个ap50是90.46,是论文中的结果。

(redet) root@container-2f3811a53c-c526a191:~/ReDet# python DOTA_devkit/hrsc2016_evaluation.py
DOTA_devkit/hrsc2016_evaluation.py:153: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecationsdifficult = np.array([x['difficult'] for x in R]).astype(np.bool)
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 1. 0. ... 1. 1. 1.]
check tp [1. 0. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [1. 1. 0. ... 1. 1. 1.]
check tp [0. 0. 1. ... 0. 0. 0.]
npos num: 1188
AP50: 90.46     AP75: 89.46      mAP: 70.41

测试大尺寸图像中的推理演示。

python demo_large_image.py

报错:

ReResNet Orientation: 8 Fix Params: False
loading annotations into memory...
Traceback (most recent call last):File "demo_large_image.py", line 137, in <module>r"work_dirs/ReDet_re50_refpn_1x_dota15_ms/ReDet_re50_refpn_1x_dota15_ms-9d1a523c.pth")File "demo_large_image.py", line 89, in __init__self.dataset = get_dataset(self.data_test)File "/root/ReDet/mmdet/datasets/utils.py", line 109, in get_datasetdset = obj_from_dict(data_info, datasets)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/mmcv/runner/utils.py", line 78, in obj_from_dictreturn obj_type(**args)File "/root/ReDet/mmdet/datasets/custom.py", line 68, in __init__self.img_infos = self.load_annotations(ann_file)File "/root/ReDet/mmdet/datasets/coco.py", line 25, in load_annotationsself.coco = COCO(ann_file)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/pycocotools-2.0.4-py3.7-linux-x86_64.egg/pycocotools/coco.py", line 81, in __init__with open(annotation_file, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/workfs/jmhan/dota15_1024_ms/test1024/DOTA1_5_test1024.json'

不改了,吃饭了
2022年3月27日15点55分
开始训练吧,

测试了大图片推理(预测)

把测试文件的路径稍作修改:

 model = DetectorModel(r"configs/ReDet/ReDet_re50_refpn_3x_hrsc2016.py",r"work_dirs/ReDet_re50_refpn_3x_hrsc2016/ReDet_re50_refpn_3x_hrsc2016-d1b4bd29.pth")img_dir = "byHand/largeImage"out_dir = 'byHand'

就放了一张图,1000011.bmp
然后运行
输出如下:

(redet) root@container-2f3811a53c-c526a191:~/ReDet# python demo_large_image.py
ReResNet Orientation: 8 Fix Params: False
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
/root/miniconda3/envs/redet/lib/python3.7/site-packages/e2cnn-0.2.1-py3.7.egg/e2cnn/nn/modules/r2_conv/basisexpansion_singleblock.py:80: UserWarning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead. (Triggered internally at  /pytorch/aten/src/ATen/native/IndexingUtils.h:30.)full_mask[mask] = norms.to(torch.uint8)
The model and loaded state dict do not match exactlymissing keys in source state_dict:backbone.layer3.4.conv2.filter,backbone.layer3.5.conv1.filter, backbone.layer3.5.conv3.filter,neck.lateral_convs.3.conv.filter此处上略一万行100000011.bmp0%|                                                                                                      | 0/2 [00:00<?, ?it/s]/root/ReDet/mmdet/core/bbox/transforms.py:56: UserWarning: This overload of addcmul is deprecated:addcmul(Tensor input, Number value, Tensor tensor1, Tensor tensor2, *, Tensor out)
Consider using one of the following signatures instead:addcmul(Tensor input, Tensor tensor1, Tensor tensor2, *, Number value, Tensor out) (Triggered internally at  /pytorch/torch/csrc/utils/python_arg_parser.cpp:1005.)gx = torch.addcmul(px, 1, pw, dx)  # gx = px + pw * dx
100%|██████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  8.33it/s]
(redet) root@container-2f3811a53c-c526a191:~/ReDet#

然后查看文件夹中生成的图片:

贴上原图做对比

我很激动,能推理了,表明大概理也能训练了,这么激动的时刻,先多测试几张,好写实验报告hhh。

后台挂起训练

nohup试一下

nohup python tools/test.py configs/ReDet/ReDet_re50_refpn_3x_hrsc2016.py work_dirs/ReDet_re50_refpn_3x_hrsc2016/ReDet_re50_refpn_3x_hrsc2016-d1b4bd29.pth --out work_dirs/ReDet_re50_refpn_3x_hrsc2016/results.pkl >xxxcbtest.log 2>&1 &

行,之前显示nohup: ignoring input是有点慢,现在行了,待会开始训练。
先把work_dirs的东西清空
训练开始给我卡住了,到处都不会


positional arguments:config                train config file pathoptional arguments:-h, --help            show this help message and exit--work_dir WORK_DIR   the dir to save logs and models--resume_from RESUME_FROMthe checkpoint file to resume from--validate            whether to evaluate the checkpoint during training--gpus GPUS           number of gpus to use (only applicable to non-distributed training)--seed SEED           random seed--launcher {none,pytorch,slurm,mpi}job launcher--local_rank LOCAL_RANK

必选参数我没加,谁叫咱不懂什么是必选参数呢

 python tools/train.py configs/ReDet/ReDet_re50_refpn_3x_hrsc2016.py

还要上传预训练模型到work_dirs

修改ReDet_re50_refpn_3x_hrsc2016。py文件中的路径和刚才上传的与训练pth文件相同。

pretrained='work_dirs/ReResNet_pretrain/re_resnet50_c8_batch256-25b16846.pth',

然后开始训练
他说
0.01 for 4 GPUs
and
0.04 for 16 GPUs.
但是我1 GPUs,也没改lr,目前是1,可能改了训练慢了,就这样吧,后
改了学习率为0.005,用两块RTX 3090 开始训练
142行 img_per_gpu是batch_size

现在img_per_gpu==4
lr == 0.005

nohup python tools/train.py configs/ReDet/ReDet_re50_refpn_3x_hrsc2016.py --gpus 2 >xxxcbtest.log 2>&1 &
[2] 3974

试试distribute train

bash tools/dist_train.sh configs/ReDet/ReDet_re50_refpn_3x_hrsc2016.py 2

不行,卡在ReResNet Orientation: 8 Fix Params: False不动了。
还是单卡吧。

nohup python tools/train.py configs/ReDet/ReDet_re50_refpn_3x_hrsc2016.py >xxxcbtest.log 2>&1 &
GPU总是使用6G左右,还得改。
Sun Mar 27 18:32:11 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.82.00    Driver Version: 470.82.00    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:3D:00.0 Off |                  N/A |
| 30%   43C    P2   216W / 350W |   6402MiB / 24268MiB |     94%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

然后不会了。卡住了,我猜可能要测试评估、转化结果、再验证,试试吧。

python tools/test.py configs/ReDet/ReDet_re50_refpn_3x_hrsc2016.py work_dirs/ReDet_re50_refpn_3x_hrsc2016/latest.pth --out work_dirs/ReDet_re50_refpn_3x_hrsc2016/results.pkl

本想无卡模式运行上边这个,结果太慢了,还是3090吧,
完成了,生成了pkl文件,
然后运行

parser.add_argument('--config', default='configs/ReDet/ReDet_re50_refpn_3x_hrsc2016.py')

上边的parse_results.py文件会把pkl格式输出为txt格式,最后用txt的文件评价,自己调一下文件路径。

出结果了:

(redet) root@container-2f3811a53c-c526a191:~/ReDet# python DOTA_devkit/hrsc2016_evaluation.py
DOTA_devkit/hrsc2016_evaluation.py:153: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecationsdifficult = np.array([x['difficult'] for x in R]).astype(np.bool)
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [1. 0. 0. ... 1. 1. 1.]
check tp [0. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [1. 1. 1. ... 1. 1. 1.]
check tp [0. 0. 0. ... 0. 0. 0.]
npos num: 1188
AP50: 90.37     AP75: 88.93      mAP: 69.46

还是一堆看不懂的东西,不过最后的AP50变了,变小了。自认为复现完成了。我先回顾回顾。

以下是流水账部分

1.从autodl租了一台机器,配Requirements

先租一个2080Ti,
环境选择

先用无卡模式配置环境
费用如下:

然后检查GitHub要求的库

查看NCCL:

查看GCC
命令是`

gcc -v


查看mmcv,我没找到查看方法,我直接安装了

pip install mmcv==0.2.14

2.安装库Install ReDet

完成

因为我的环境是:

,所以安装的命令是:

conda install pytorch==1.1.0 torchvision==0.3.0 cudatoolkit=10.0 -c pytorch

这一步(c)最好安装到根目录下,不然autodl的卡被别人占用了,无法数据迁移,自己就必须重新配环境

这一步经常卡住,卡住的包就自己用pip install

对数据集的处理:

就是先运行HRSC2DOTA。py,这个文件我在他别的repo里边找到的,然后按照缺少的文件去他GitHub其它程序中找找,搬过来,然后运行,最后修改一下文件名。

开始租一个3090跑一下

2022年3月23日

第一次运行测试HRSC2016的语句

python tools/test.py configs/ReDet/ReDet_re50_refpn_3x_hrsc2016.py work_dirs/ReDet_re50_refpn_3x_hrsc2016/ReDet_re50_refpn_3x_hrsc2016-d1b4bd29.pth --out work_dirs/ReDet_re50_refpn_3x_hrsc2016/results.pkl

出现了错误:

Traceback (most recent call last):File "tools/test.py", line 9, in <module>from mmcv.runner import load_checkpoint, get_dist_infoFile "/root/miniconda3/lib/python3.7/site-packages/mmcv/runner/__init__.py", line 1, in <module>from .runner import RunnerFile "/root/miniconda3/lib/python3.7/site-packages/mmcv/runner/runner.py", line 9, in <module>from .checkpoint import load_checkpoint, save_checkpointFile "/root/miniconda3/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 10, in <module>import torchvisionFile "/root/miniconda3/lib/python3.7/site-packages/torchvision/__init__.py", line 2, in <module>from torchvision import datasetsFile "/root/miniconda3/lib/python3.7/site-packages/torchvision/datasets/__init__.py", line 9, in <module>from .fakedata import FakeDataFile "/root/miniconda3/lib/python3.7/site-packages/torchvision/datasets/fakedata.py", line 3, in <module>from .. import transformsFile "/root/miniconda3/lib/python3.7/site-packages/torchvision/transforms/__init__.py", line 1, in <module>from .transforms import *File "/root/miniconda3/lib/python3.7/site-packages/torchvision/transforms/transforms.py", line 17, in <module>from . import functional as FFile "/root/miniconda3/lib/python3.7/site-packages/torchvision/transforms/functional.py", line 5, in <module>from PIL import Image, ImageOps, ImageEnhance, PILLOW_VERSION
ImportError: cannot import name 'PILLOW_VERSION' from 'PIL' (/root/miniconda3/lib/python3.7/site-packages/PIL/__init__.py)

看看他的issue里边有没有这个问题。
(先把checkpoints下下来试试)

还是不行,接着看issue吧。
忘了激活conda环境了,(虽然不是这个的问题)

source activate redet

issue没找到,去看百度。
晚上说pillow库的版本过高导致的,然后我降了版本。

conda install pillow==6.2.0

然后报错,可能是路径不对

(redet) root@container-e19b1182ac-a18adac2:~/ReDet# python tools/test.py configs/ReDet/ReDet_re50_refpn_3x_hrsc2016.py work_dirs/ReDet_re50_refpn_3x_hrsc2016/ReDet_re50_refpn_3x_hrsc2016-d1b4bd29.pth --out work_dirs/ReDet_re50_refpn_3x_hrsc2016/results.pkl
ReResNet Orientation: 8 Fix Params: False
loading annotations into memory...
Traceback (most recent call last):File "tools/test.py", line 208, in <module>main()File "tools/test.py", line 158, in maindataset = get_dataset(cfg.data.test)File "/root/ReDet/mmdet/datasets/utils.py", line 109, in get_datasetdset = obj_from_dict(data_info, datasets)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/mmcv-0.2.13-py3.7-linux-x86_64.egg/mmcv/runner/utils.py", line 78, in obj_from_dictreturn obj_type(**args)File "/root/ReDet/mmdet/datasets/custom.py", line 68, in __init__self.img_infos = self.load_annotations(ann_file)File "/root/ReDet/mmdet/datasets/coco.py", line 25, in load_annotationsself.coco = COCO(ann_file)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/pycocotools-2.0.4-py3.7-linux-x86_64.egg/pycocotools/coco.py", line 81, in __init__with open(annotation_file, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'data/HRSC2016/Test/HRSC_L1_test.json'

然后找这个。(后悔啊,重新把文件和数据集按照他的要求放吧,至少少出问题。

把HRSC2016数据集放到/root/ReDet/data/HRSC2016

(redet) root@container-e19b1182ac-a18adac2:~/ReDet# python tools/test.py configs/ReDet/ReDet_re50_refpn_3x_hrsc2016.py work_dirs/ReDet_re50_refpn_3x_hrsc2016/ReDet_re50_refpn_3x_hrsc2016-d1b4bd29.pth --out work_dirs/ReDet_re50_refpn_3x_hrsc2016/results.pkl
ReResNet Orientation: 8 Fix Params: False
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!The model and loaded state dict do not match exactlymissing keys in source state_dict: neck.lateral_convs.1.conv.filter, backbone.layer4.2.conv2.filter, backbone.layer3.3.conv3.filter, backbone.layer3.0.conv1.filter, backbone.layer2.1.conv1.filter, backbone.layer3.5.conv3.filter, backbone.layer4.1.conv2.filter, backbone.layer4.0.conv1.filter, backbone.layer3.4.conv1.filter,
neck.lateral_convs.2.conv.expanded_bias, neck.lateral_convs.3.conv.filter, backbone.layer2.0.downsample.0.filter, backbone.conv1.filter, backbone.layer4.0.downsample.0.filter, backbone.layer2.2.conv2.filter, backbone.layer3.1.conv2.filter, backbone.layer2.3.conv1.filter, backbone.layer2.0.conv1.filter, neck.lateral_convs.1.conv.expanded_bias,
backbone.layer4.0.conv3.filter, backbone.layer4.2.conv3.filter, backbone.layer3.1.conv3.filter, backbone.layer3.5.conv2.filter, backbone.layer3.2.conv3.filter, neck.fpn_convs.2.conv.filter, backbone.layer2.0.conv3.filter, neck.fpn_convs.3.conv.filter, backbone.layer3.4.conv2.filter,
backbone.layer3.0.conv2.filter, backbone.layer4.1.conv1.filter, neck.fpn_convs.0.conv.filter, backbone.layer4.2.conv1.filter, backbone.layer3.0.conv3.filter, backbone.layer4.0.conv2.filter, backbone.layer3.5.conv1.filter, backbone.layer2.1.conv3.filter, backbone.layer2.1.conv2.filter, neck.fpn_convs.2.conv.expanded_bias, neck.fpn_convs.3.conv.expanded_bias, backbone.layer3.1.conv1.filter, backbone.layer4.1.conv3.filter, neck.lateral_convs.2.conv.filter, neck.fpn_convs.1.conv.expanded_bias, neck.fpn_convs.1.conv.filter,
backbone.layer2.2.conv1.filter, neck.lateral_convs.0.conv.expanded_bias, backbone.layer3.2.conv1.filter,
backbone.layer3.4.conv3.filter, neck.lateral_convs.0.conv.filter, neck.fpn_convs.0.conv.expanded_bias, backbone.layer2.3.conv3.filter, backbone.layer2.0.conv2.filter,
neck.lateral_convs.3.conv.expanded_bias, backbone.layer3.3.conv1.filter, backbone.layer3.2.conv2.filter, backbone.layer2.3.conv2.filter, backbone.layer3.0.downsample.0.filter, backbone.layer2.2.conv3.filter, backbone.layer3.3.conv2.filter

看issue里边说这是正常现象,15:33开始,再试一遍,可能是刚才时间太长,没把握。

一块GPU 3090 运行了15分钟,还没有结果,

还没反应,关掉试试test
一直没成功,改hrsc2016_evalxxxx.py没成功

root@container-e19b1182ac-a18adac2:~/ReDet# source activate redet
(redet) root@container-e19b1182ac-a18adac2:~/ReDet# python DOTA_devkit/hrsc2016_evaluation.py
Traceback (most recent call last):File "DOTA_devkit/hrsc2016_evaluation.py", line 293, in <module>main()File "DOTA_devkit/hrsc2016_evaluation.py", line 282, in mainrec, prec, ap = voc_eval(detpath, annopath, imagesetfile, 'ship', ovthresh=iou_thr, use_07_metric=True)File "DOTA_devkit/hrsc2016_evaluation.py", line 125, in voc_evalwith open(imagesetfile, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'data/HRSC2016/Test/test.txt'
(redet) root@container-e19b1182ac-a18adac2:~/ReDet# python DOTA_devkit/hrsc2016_evaluation.py
Traceback (most recent call last):File "DOTA_devkit/hrsc2016_evaluation.py", line 297, in <module>main()File "DOTA_devkit/hrsc2016_evaluation.py", line 286, in mainrec, prec, ap = voc_eval(detpath, annopath, imagesetfile, 'ship', ovthresh=iou_thr, use_07_metric=True)File "DOTA_devkit/hrsc2016_evaluation.py", line 134, in voc_evalrecs[imagename] = parse_gt(annopath.format(imagename))File "DOTA_devkit/hrsc2016_evaluation.py", line 28, in parse_gtwith  open(filename, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'data/HRSC2016/Test/labelTxt/100000624.txt'

问了学姐,可能是generate没完成,txt文件中才500行。
是数据集生成、转换的格式有问题,原文没提供hrsc2dota.py文件,去其他地方找的有问题,待会在搞。

开始读HRSC2DOTA.PY,
其中的if difficult==0 处理,==1进行忽视,有点疑问。

HRSC2DOTA.py读完了,没问题,运行结果:
查看当前文件夹下有多少文件(夹)

ls | wc -w

Train
AllImages 626个文件
Annotations 626
labelTxt 626
没问题
Test
AllImages 444
Annotations 444
labelTxt 444
同时自己windows下载了数据集检查了Train 626 Test 444个文件,是对的
再次使用HRSC2DOTA.py出现

(redet) root@container-e19b1182ac-a18adac2:~/ReDet/DOTA_devkit# python HRSC2DOTA.py
Traceback (most recent call last):File "HRSC2DOTA.py", line 79, in <module>generate_txt_labels('/root/ReDet/data/HRSC2016/Train')File "HRSC2DOTA.py", line 59, in generate_txt_labelsf_label = open(label)#打开原来的.xml文件
FileNotFoundError: [Errno 2] No such file or directory: '/root/ReDet/data/HRSC2016/Train/Annotations/.ipynb_checkpoints.xml'

结果发现从jupyterlab打开一次图片就会留下一个.ipy…的文件夹,里边有100000624-checkpoint.bmp文件,应该是图片的 缓存。
ll命令后显示的total是占用的空间,默认是Bytes

目前不懂

开始读HRSC2COCO.py
然后运行完了。
hrsc2016_evalate.py还是不行,缺少624.bmp
重新下一便数据集试试

看ReDet论文中显示的数据集信息如下:

总共1061张,理论上 train val test 分别有 436 181 444 张图片,woc,我之前下载的是个包团md。
好像他把train和val合并当作train了,我再瞅瞅。好像还是有问题,回想起,我当时复现DAL他的txt是自己写了套代码生成的,我去找找。
把DAL的generate_images.py拷贝过来了。
运行完了发现并没有卵用,这属于瞎搞了,南辕北辙了属于是,老老实实写写获取当前列表的脚本方法吧。

find  -name '*.bmp' > train.txt

先把目录下的文件名都搞进txt去,然后用python处理,吃完饭回来再处理。
2022年3月24日19点12分
今晚上的任务就是把generate_txt.py写好,把hrsc2016_evalate.py运行起来。


#2022年3月24日19点15分 手写import os
import re
images_path = '/root/ReDet/data/HRSC2016/Train/images'   # 图片存放目录
txt_save_path = '/root/ReDet/data/HRSC2016/Train/images/train.txt'  # 生成的图片列表清单txt文件名
fw = open(txt_save_path, "w")
for filename in os.listdir(images_path):print(filename.split(".")[0])  fw.write(filename.split(".")[0] + '\n')  

然后把这个运行一手。
再把Train改为Test再运行一手。

train。txt 626个
test。txt 444个

(redet) root@container-e19b1182ac-a18adac2:~/ReDet# python DOTA_devkit/hrsc2016_evaluation.py
DOTA_devkit/hrsc2016_evaluation.py:153: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecationsdifficult = np.array([x['difficult'] for x in R]).astype(np.bool)
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 1. 0. ... 1. 1. 1.]
check tp [1. 0. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [1. 1. 0. ... 1. 1. 1.]
check tp [0. 0. 1. ... 0. 0. 0.]
npos num: 1188
AP50: 90.46     AP75: 89.46      mAP: 70.41
(redet) root@container-e19b1182ac-a18adac2:~/

能运行起来了,歇会。

2022年3月25日10点50分
还没看hrsc2016_evlate.py的代码,先运行着test.py试试几分钟有反应,上次15分钟没反应

python tools/test.py configs/ReDet/ReDet_re50_refpn_3x_hrsc2016.py work_dirs/ReDet_re50_refpn_3x_hrsc2016/ReDet_re50_refpn_3x_hrsc2016-d1b4bd29.pth --out work_dirs/ReDet_re50_refpn_3x_hrsc2016/results.pkl

nohup python tools/test.py configs/ReDet/ReDet_re50_refpn_3x_hrsc2016.py work_dirs/ReDet_re50_refpn_3x_hrsc2016/ReDet_re50_refpn_3x_hrsc2016-d1b4bd29.pth --out work_dirs/ReDet_re50_refpn_3x_hrsc2016/results.pkl >redet3251056.log 2>&1 &

nohup python tools/test.py configs/ReDet/ReDet_re50_refpn_3x_hrsc2016.py work_dirs/ReDet_re50_refpn_3x_hrsc2016/ReDet_re50_refpn_3x_hrsc2016-d1b4bd29.pth --out work_dirs/ReDet_re50_refpn_3x_hrsc2016/results.pkl >redett.log 2>&1 &

nohup python tesss.py >redett.log 2>&1 &

暂时没用nohup,老显示nohup: ignoring input,不知道什么原因

(redet) root@container-e19b1182ac-a18adac2:~/ReDet# python tools/test.py configs/ReDet/ReDet_re50_refpn_3x_hrsc2016.py work_dirs/ReDet_re50_refpn_3x_hrsc2016/ReDet_re50_refpn_3x_hrsc2016-d1b4bd29.pth --out work_dirs/ReDet_re50_refpn_3x_hrsc2016/results.pkl
ReResNet Orientation: 8 Fix Params: False
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
The model and loaded state dict do not match exactlymissing keys in source state_dict: neck.fpn_convs.0.conv.expanded_bias, backbone.layer3.2.conv3.filter, backbone.layer2.2.conv2.filter, backbone.layer3.5.conv2.filter, backbone.layer4.2.conv1.filter, backbone.layer4.1.conv2.filter, backbone.layer4.2.conv2.filter, backbone.layer4.0.conv3.filter, backbone.layer2.0.downsample.0.filter, backbone.layer4.0.conv1.filter, backbone.layer2.0.conv3.filter, backbone.conv1.filter, backbone.layer3.1.conv3.filter, backbone.layer2.1.conv3.filter, neck.lateral_convs.0.conv.expanded_bias, backbone.layer3.2.conv2.filter, neck.fpn_convs.1.conv.filter, backbone.layer4.2.conv3.filter, neck.lateral_convs.1.conv.filter, backbone.layer2.1.conv2.filter, backbone.layer2.0.conv1.filter, backbone.layer3.4.conv3.filter, backbone.layer3.0.downsample.0.filter, backbone.layer3.0.conv1.filter, backbone.layer3.0.conv2.filter, backbone.layer3.4.conv1.filter, backbone.layer4.1.conv3.filter, backbone.layer2.1.conv1.filter, backbone.layer3.1.conv2.filter, backbone.layer3.3.conv1.filter, backbone.layer3.3.conv3.filter, backbone.layer2.2.conv3.filter, backbone.layer3.3.conv2.filter, backbone.layer3.2.conv1.filter, neck.fpn_convs.3.conv.expanded_bias, backbone.layer4.0.downsample.0.filter, backbone.layer4.1.conv1.filter, neck.fpn_convs.2.conv.expanded_bias, backbone.layer2.3.conv1.filter, neck.lateral_convs.2.conv.filter, backbone.layer2.2.conv1.filter, neck.fpn_convs.0.conv.filter, backbone.layer3.5.conv3.filter, backbone.layer3.5.conv1.filter, neck.fpn_convs.3.conv.filter, backbone.layer3.1.conv1.filter, backbone.layer4.0.conv2.filter, neck.lateral_convs.1.conv.expanded_bias, backbone.layer2.0.conv2.filter, neck.lateral_convs.2.conv.expanded_bias, backbone.layer2.3.conv2.filter, backbone.layer3.4.conv2.filter, backbone.layer2.3.conv3.filter, neck.fpn_convs.2.conv.filter, neck.lateral_convs.3.conv.expanded_bias, neck.lateral_convs.3.conv.filter, neck.fpn_convs.1.conv.expanded_bias, neck.lateral_convs.0.conv.filter, backbone.layer3.0.conv3.filterTraceback (most recent call last):File "tools/test.py", line 208, in <module>main()File "tools/test.py", line 178, in mainoutputs = single_gpu_test(model, data_loader, args.show, args.log_dir)File "tools/test.py", line 22, in single_gpu_testmodel.eval()File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1009, in evalreturn self.train(False)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 998, in trainmodule.train(mode)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 998, in trainmodule.train(mode)File "/root/ReDet/mmdet/models/backbones/re_resnet.py", line 726, in trainsuper(ReResNet, self).train(mode)File "/root/ReDet/mmdet/models/backbones/base_backbone.py", line 56, in trainsuper(BaseBackbone, self).train(mode)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 998, in trainmodule.train(mode)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/e2cnn-0.2.1-py3.7.egg/e2cnn/nn/modules/r2_conv/r2convolution.py", line 387, in train_filter, _bias = self.expand_parameters()File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/e2cnn-0.2.1-py3.7.egg/e2cnn/nn/modules/r2_conv/r2convolution.py", line 304, in expand_parameters_filter = self.basisexpansion(self.weights)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__result = self.forward(*input, **kwargs)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/e2cnn-0.2.1-py3.7.egg/e2cnn/nn/modules/r2_conv/basisexpansion_blocks.py", line 327, in forward_filter = self._expand_block(weights, io_pair).reshape(out_indices[2], in_indices[2], self.S)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/e2cnn-0.2.1-py3.7.egg/e2cnn/nn/modules/r2_conv/basisexpansion_blocks.py", line 294, in _expand_block_filter = block_expansion(coefficients)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__result = self.forward(*input, **kwargs)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/e2cnn-0.2.1-py3.7.egg/e2cnn/nn/modules/r2_conv/basisexpansion_singleblock.py", line 115, in forwardreturn torch.einsum('boi...,kb->koi...', self.sampled_basis, weights) #.transpose(1, 2).contiguous()File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/torch/functional.py", line 211, in einsumreturn torch._C._VariableFunctions.einsum(equation, operands)
RuntimeError: cublas runtime error : the GPU program failed to execute at /opt/conda/conda-bld/pytorch_1556653114079/work/aten/src/THC/THCBlas.cu:450
[1]+  Terminated              nohup python tools/test.py configs/ReDet/ReDet_re50_refpn_3x_hrsc2016.py work_dirs/ReDet_re50_refpn_3x_hrsc2016/ReDet_re50_refpn_3x_hrsc2016-d1b4bd29.pth --out work_dirs/ReDet_re50_refpn_3x_hrsc2016/results.pkl > redett.log 2>&1

应该是显卡不够用了。偶不,应该是GPU 3090和pytorch 1.1.0不匹配,

root@container-93a511873c-4353c534:~# python
Python 3.7.9 (default, Aug 31 2020, 12:42:55)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> torch.cuda.is_available()
Traceback (most recent call last):File "<stdin>", line 1, in <module>
NameError: name 'torch' is not defined
>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.cuda.device_count()
1
>>> torch.cuda.get_device_name(0)
'NVIDIA GeForce RTX 3090'
>>> torch.__version__
'1.1.0'
>>>

说明租用的GPU环境可以正常使用3090,只是代码中的AT_CHECK在 torch 1.5 #36581 中已弃用,所以要修改所有的编译源文件,改成TORCH_CHECK。
作者的回复:
I guess the reason is: cuda11.0 requires higher version pytorch (>1.3), while some ops in our code are designed for pytorch<1.5.

If so, to fix this, you need to replace all AT_CHECK with TORCH_CHECK in the source code (.cpp and .cu). See pytorch/pytorch#36581

从issue里找到了一行代码,不知道啥意思。

find . -type f -exec sed -i 's/AT_CHECK/TORCH_CHECK/g' {} +

运行完之后好像是有用,mmdet/ops/文件夹下运行了一下,又不放心,在ReDet里运行了一下(ps,因为在系统下运行简直时间太久了,等不及了就ctrl c了

(redet) root@container-e19b1182ac-a18adac2:~/ReDet# python tools/test.py configs/ReDet/ReDet_re50_refpn_3x_hrsc2016.py work_dirs/ReDet_re50_refpn_3x_hrsc2016/ReDet_re50_refpn_3x_hrsc2016-d1b4bd29.pth --out work_dirs/ReDet_re50_refpn_3x_hrsc2016/results.pkl
ReResNet Orientation: 8 Fix Params: False
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
The model and loaded state dict do not match exactlymissing keys in source state_dict: neck.fpn_convs.3.conv.filter, backbone.layer3.1.conv1.filter, backbone.layer3.4.conv2.filter, backbone.layer2.0.downsample.0.filter, backbone.layer4.0.conv2.filter, backbone.layer3.3.conv3.filter, backbone.layer2.1.conv2.filter, backbone.layer2.0.conv3.filter, backbone.layer4.1.conv2.filter, backbone.layer3.0.conv2.filter, neck.fpn_convs.0.conv.expanded_bias, backbone.layer2.3.conv3.filter, backbone.layer2.0.conv1.filter, neck.fpn_convs.1.conv.filter, backbone.layer2.3.conv2.filter, neck.lateral_convs.3.conv.filter, backbone.layer4.0.downsample.0.filter, backbone.layer3.4.conv1.filter, backbone.layer4.0.conv3.filter, backbone.layer3.0.conv1.filter, neck.lateral_convs.0.conv.filter, backbone.layer2.0.conv2.filter, neck.lateral_convs.2.conv.expanded_bias, backbone.layer3.3.conv1.filter, backbone.layer4.1.conv1.filter, neck.lateral_convs.3.conv.expanded_bias, backbone.layer3.5.conv3.filter, backbone.layer3.2.conv1.filter, backbone.layer4.0.conv1.filter, backbone.layer2.1.conv3.filter, backbone.layer3.1.conv2.filter, backbone.layer2.3.conv1.filter, backbone.layer3.5.conv1.filter, backbone.layer4.1.conv3.filter, neck.lateral_convs.0.conv.expanded_bias, backbone.layer4.2.conv2.filter, backbone.layer4.2.conv3.filter, neck.lateral_convs.2.conv.filter, neck.lateral_convs.1.conv.expanded_bias, neck.fpn_convs.0.conv.filter, backbone.layer3.0.conv3.filter, neck.fpn_convs.2.conv.expanded_bias, backbone.layer2.1.conv1.filter, backbone.layer2.2.conv2.filter, backbone.layer3.0.downsample.0.filter, backbone.layer2.2.conv3.filter, neck.fpn_convs.1.conv.expanded_bias, backbone.layer3.2.conv2.filter, backbone.layer2.2.conv1.filter, neck.fpn_convs.3.conv.expanded_bias, backbone.layer3.1.conv3.filter, backbone.layer3.5.conv2.filter, backbone.layer3.4.conv3.filter, neck.fpn_convs.2.conv.filter, backbone.layer4.2.conv1.filter, backbone.layer3.2.conv3.filter, neck.lateral_convs.1.conv.filter, backbone.layer3.3.conv2.filter, backbone.conv1.filterTraceback (most recent call last):File "tools/test.py", line 208, in <module>main()File "tools/test.py", line 178, in mainoutputs = single_gpu_test(model, data_loader, args.show, args.log_dir)File "tools/test.py", line 22, in single_gpu_testmodel.eval()File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1009, in evalreturn self.train(False)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 998, in trainmodule.train(mode)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 998, in trainmodule.train(mode)File "/root/ReDet/mmdet/models/backbones/re_resnet.py", line 726, in trainsuper(ReResNet, self).train(mode)File "/root/ReDet/mmdet/models/backbones/base_backbone.py", line 56, in trainsuper(BaseBackbone, self).train(mode)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 998, in trainmodule.train(mode)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/e2cnn-0.2.1-py3.7.egg/e2cnn/nn/modules/r2_conv/r2convolution.py", line 387, in train_filter, _bias = self.expand_parameters()File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/e2cnn-0.2.1-py3.7.egg/e2cnn/nn/modules/r2_conv/r2convolution.py", line 304, in expand_parameters_filter = self.basisexpansion(self.weights)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__result = self.forward(*input, **kwargs)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/e2cnn-0.2.1-py3.7.egg/e2cnn/nn/modules/r2_conv/basisexpansion_blocks.py", line 327, in forward_filter = self._expand_block(weights, io_pair).reshape(out_indices[2], in_indices[2], self.S)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/e2cnn-0.2.1-py3.7.egg/e2cnn/nn/modules/r2_conv/basisexpansion_blocks.py", line 294, in _expand_block_filter = block_expansion(coefficients)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__result = self.forward(*input, **kwargs)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/e2cnn-0.2.1-py3.7.egg/e2cnn/nn/modules/r2_conv/basisexpansion_singleblock.py", line 115, in forwardreturn torch.einsum('boi...,kb->koi...', self.sampled_basis, weights) #.transpose(1, 2).contiguous()File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/torch/functional.py", line 211, in einsumreturn torch._C._VariableFunctions.einsum(equation, operands)
RuntimeError: cublas runtime error : the GPU program failed to execute at /opt/conda/conda-bld/pytorch_1556653114079/work/aten/src/THC/THCBlas.cu:450
(redet) root@container-e19b1182ac-a18adac2:~/ReDet# python
Python 3.7.11 (default, Jul 27 2021, 14:32:16)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version>>File "<stdin>", line 1torch.__version>>^
SyntaxError: invalid syntax
>>> torch.__version__
'1.1.0'

还是不行,先歇会,码码字。

2022年3月26日14点21分
看了autodl租的3090的环境,我泪目了

这版本高的离谱,pytorch1.1.0必不支持啊。
换块2080Ti看看驱动版本。

昨天改完了TORCH_CHECK好像没编译啊,重新试试。


确实要修改AT_CHECK
昨天改的mmdet/ops/src TORCH_CHECK不全
好像是 3090的pytorch必须1.7以上???环境白瞎了啊。


行 重新安装吧,看了看A40,虽然有空闲,但是很多信息不知道啊,比如A40适配哪些pytorch版本,这方面还是3090的信息相对多一点,还有潮汐算力。

复现ReDet RTX 3090 pytorch1.8.1相关推荐

  1. GeForce RTX 3090深度学习测评

    GeForce RTX 3090深度学习测评 环境踩坑 八卡GeForce RTX 3090+Pytorch1.7+cuda11.1+对应cudnn pytorch 1.7以下版本无法对显卡写入数据 ...

  2. 因买不到 RTX 3090,他花 19 万搭了一个专业级机器学习工作站

    点击上方"视学算法",选择加"星标"或"置顶" 重磅干货,第一时间送达 作者 | Emil Wallner 编译 | 青暮.陈大鑫 转自 | ...

  3. 因买不到RTX 3090,小哥自己搭建了一个专业级机器学习工作站

    点击上方"AI遇见机器学习",选择"星标"公众号 重磅干货,第一时间送达 来自|知乎   作者|Emil Wallner 来源 AI科技评论 编辑丨极市平台 极 ...

  4. 时代变了,大人:RTX 3090时代,哪款显卡配得上我的炼丹炉?

    点击上方,选择星标或置顶,不定期资源大放送! 阅读大概需要15分钟 Follow小博主,每天更新前沿干货 黄老板的 RTX 30 系列显卡 9 月 17 日就要发售了,现在我要怎么买 GPU?很急很关 ...

  5. 如何评价英伟达9月2日凌晨发布的最强消费级显卡RTX 3090?

    点击上方,选择星标或置顶,不定期资源大放送! 阅读大概需要15分钟 Follow小博主,每天更新前沿干货 本文整理自知乎问答,仅用于学术分享.如有侵权,请联系后台作删文处理. 编辑|极市平台 导读 就 ...

  6. RTX 3090 AI性能实测:FP32训练速度提升50%,张量核心缩水

    晓查 发自 凹非寺  量子位 报道 | 公众号 QbitAI NVIDIA最近发布了备受期待的RTX 30系列显卡. 其中,性能最强大的RTX 3090具有24GB显存和10496个CUDA核心.而2 ...

  7. cache性能分析实验 北邮_AMD RX 6000游戏性能实测:全面领先RTX 3090

    AMD 官网现已公布 RX 6000 显卡的多款游戏测试数据,测试平台采用了 AMD 的 "Zen3"Ryzen 9 5900X CPU. 在<战地 5>.<无主 ...

  8. conda安装cuda_记一次在 RTX 3090 上安装 APEX

    0. 背景 最近炼丹开始用一块 RTX 3090 (24 G),因为代码里用 ALBERT-base-v2 处理了很多东西导致显存爆炸,于是开始谋求可以节约显存的办法. 网上的一些方法例如及时 del ...

  9. 卷成这样,非逼我用RTX 3090?(深度学习GPU平台种草

    我是一枚做AI算法的,已经在这片红海里,卷了好些年,身边总有一些想转AI的盆友,对此,本人的拷问也从不缺席,数学能行吗,coding厉害吗,最重要的是,有GPU资源吗? 曾几何时,实验室有限的资源,让 ...

最新文章

  1. van 自定义组件_vant 自定义 van-dropdown-item的用法
  2. php获取xml某个节点的所有内容,怎样输出XML所有的同名节点内容?
  3. 【转】CLR Profiler 性能分析工具 (转)
  4. 面向对象——抽象基类
  5. 201621123083 《Java程序设计》第9周学习总结
  6. php中$t=date()函数参数意义及时间更改
  7. ai人工智能可以干什么_我们可以使人工智能更具道德性吗?
  8. Overture五线谱乐理小课堂——音程 (上)
  9. Word文档引用EndNote中文献的方法
  10. DIH-全量导入总结
  11. oracle公共同义词查找,[Oracle]同义词(synonym)
  12. Android 刷机/Root/安装Xposed
  13. 流计算 Oceanus | 巧用 Flink 构建高性能 ClickHouse 实时数仓
  14. Java后端简历中的项目经验,斩获offer
  15. 傻瓜式视频转换软件FormatFactory(格式工厂)
  16. openstack create flavor.sh
  17. 博客排名,终于进入前100啦!
  18. hexo d 部署报错求解决
  19. 新概念英语1-4册全套flash+美音听力mp3+全套笔记+电子书
  20. 下载素材资源的网站有 哪些?

热门文章

  1. div显示在上层_怎样设置一个DIV在所有层的最上层,最上层DIV
  2. xcode 配置wechat_Xcode 真机调试微信支付 提示 mainfest.json配置APPID和订单的appid 不一致...
  3. Excel提示“此工作簿包含一个或多个无法更新的链接”怎么办
  4. 三维空间中,向量在另外一个向量或者面上的投影
  5. 【零基础Eviews实例】02自相关(序列相关)的检验与修正
  6. 搭档之家:李佳琦“双11”直播最低价,还是贵了?
  7. php mpdf导航栏信息,php – 我想使用mpdf在pdf中设置页眉和页脚
  8. 原生js实现对未来dom的事件绑定
  9. 在家也能健身(05):腹肌
  10. 第四局 借问酒家何处有?牧童遥指杏花村 下