今天开始复现2021 CVPR ReDet
ReDet: A Rotation-equivariant Detector for Aerial Object Detection
原文GitHub：https://github.com/csuhan/ReDet
复现环境基于autodl租用的RTX3090（2.6￥/小时），数据集是HRSC2016，36epochs大约两小时，从调试到跑通共消费在20元左右，如果环境没问题，单训练36epochs大约5块钱。
注意。我第一次没跑成功以及遇到的问题在后边的记流水账版本。前边的这个版本跑成功了。
数据集HRSC2016好多人都找不到，我在这里贴一下我存到百度网盘的链接：
链接：https://pan.baidu.com/s/1saGxrQ6B0MWhc_DvR6Uf1A?pwd=zwg6
提取码：zwg6
–来自百度网盘超级会员V6的分享
他里边有好几个压缩包，你要全部选中，一起解压

1.Installation

Requirements
Linux
Python 3.5/3.6/3.7
PyTorch 1.1/1.3.1
CUDA 10.0/10.1
NCCL 2+
GCC 4.9+
mmcv<=0.2.14

官方提示的Requirements如上所示，我选配的AUTODL的服务器是
RTX3090 ，
PyTorch 1.8.1
Python 3.8
Cuda 11.1

Install ReDet

a. Create a conda virtual environment and activate it. Then install Cython.

先创建一个conda环境名叫redet，python版本3.7，然后安装cython

conda create -n redet python=3.7 -y
source activate redet
conda install cython

然后补充上边的mmcv==0.2.13(后边mmdet 0.6.0不支持0.2.14，所以用0.2.13）

pip install mmcv==0.2.13

b. Install PyTorch and torchvision following the official instructions.

这里因为我们的版本不同，用
从pytorch官网找官方语句
https://pytorch.org/get-started/previous-versions/

pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html

tips：这里官方提示了许多版本的相关问题，我上一次调试就是因为版本不对，pytorch1.1.0却用的3090，肯定不行，这里有想法的同学可以自己去看，这里不再赘述。

Note:
1.If you want to use Pytorch>1.5, you have to made some modifications to the cuda ops. See here for a reference.
2.There is a known bug happened to some users but not all (As I have successfully run it on V100 and Titan Xp). If it occurs, please refer to here.
3.If you want to use Python<=3.6, you need to install e2cnn@legacy_py3.6 mamually, see here for an instruction.

c. Clone the ReDet repository.

上一步没操作完不要紧，可以再开一个链接操作着这个（如果听不懂就当我没说）

git clone https://github.com/csuhan/ReDet.git
cd ReDet

在这里创建一个文件夹data（他源代码就是这样的，方便调bug）

然后我们可以先把数据集同步进来，我自己上传到autodl的网盘上的，开的多线（窗）程（口），（不懂就当我没说）

cp /root/autodl-nas/HRSC2016 /root/ReDet/data/ -r

d. Compile cuda extensions.

然后就到了激动人心的bash了，前情介绍，这里bug巨多，如果你成功了算你幸运，我的环境需要替换mmdet/ops里的所有AT_CHECK为TORCH_CHECK。
这里用一下GitHub在issue里边一位大神的代码，作用是把该文件夹中所有文件遍历，然后修改文件中的AT_CHECK为TORCH_CHECK。我是在Redet/mmdet/ops里边运行的，因为在系统里运行太漫长了

find . -type f -exec sed -i 's/AT_CHECK/TORCH_CHECK/g' {} +

然后再进行编译：

bash compile.sh

报错：

(redet) root@container-2f3811a53c-c526a191:~/ReDet# bash compile.sh
Building roi align op...
Traceback (most recent call last):File "setup.py", line 2, in <module>from torch.utils.cpp_extension import BuildExtension, CUDAExtension
ModuleNotFoundError: No module named 'torch'

居然是没有安装好pytorch
回过头去看:

(redet) root@container-2f3811a53c-c526a191:~/redet# pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
Looking in indexes: http://mirrors.aliyun.com/pypi/simple
Looking in links: https://download.pytorch.org/whl/torch_stable.html
Collecting torch==1.8.1+cu111Downloading https://download.pytorch.org/whl/cu111/torch-1.8.1%2Bcu111-cp37-cp37m-linux_x86_64.whl (1982.2 MB)|███████████████████████████████ | 1922.0 MB 31 kB/s eta 0:31:26Killed

好家伙，刚才科学上网不小心把远程连接断开了，失误失误。
安装等待ing 16：22开始，看视频ing（不看了，调了一个小时了，站起来歇歇老腰）
ma de 又一遍还是killled，网上找的解决办法是后边加个尾缀试试

pip install xxxx--no-cache-dir
pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html --no-cache-dir

网友评论显示很有用，包括下载pip包也是，我没试过。
pytorch安装能行了，但不确定是不是这个原因。
这下提示安装成功了

Successfully installed pillow-9.0.1 torch-1.8.1+cu111 torchaudio-0.8.1 torchvision-0.9.1+cu111 typing-extensions-4.1.1

编译一手。
注意了一下编译报的错，编译

/root/miniconda3/envs/redet/lib/python3.7/site-packages/torch/include/ATen/core/TensorBody.h:303:30: note: declared hereDeprecatedTypeProperties & type() const {^~~~File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1561, in _get_cuda_arch_flagsarch_list[-1] += '+PTX'
IndexError: list index out of range

好像影响不大，有个疑问就是我用的是无卡模式，不知道是否影响编译。

gcc: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-7/README.Bugs> for instructions.
error: command 'gcc' failed with exit status 4

这也有个大红error,网友说内存不够，那有卡的时候再跑一边。
先不管了，下一步走起。

e. Install ReDet (other dependencies will be installed automatically).

python setup.py develop
# or "pip install -e ."

中途有卡住的地方自己手动pip就行

Install DOTA_devkit

sudo apt-get install swig
cd DOTA_devkit
swig -c++ -python polyiou.i
python setup.py build_ext --inplace

第一行我得用conda install swig

吃饭去了。

2022年3月27日，早上起来，10点20分，租了RTX 3090，再bash compile.sh一遍，看看跟内存相关的那个大红error还有没有。
目前没有，希望一切正常.目前我没看到错误，已经编译完成了hhh。
然后准备txt，

2.get start

准备数据集，
我的程序放在

/root/ReDet

我的数据集放在

/root/ReDet/data/HRSC2016

由于HRSC2016带着的imageSets不行，和Train、Test里边对应的图片不符，自己手写了generate_txt.py来生成train.txt和test.txt

import os
import re
images_path = '/root/ReDet/data/HRSC2016/Train/images'   # 图片存放目录
txt_save_path = '/root/ReDet/data/HRSC2016/train.txt'  # 生成的图片列表清单txt文件名
fw = open(txt_save_path, "w")
for filename in os.listdir(images_path):print(filename.split(".")[0])  fw.write(filename.split(".")[0] + '\n') images_path = '/root/ReDet/data/HRSC2016/Test/images'   # 图片存放目录
txt_save_path = '/root/ReDet/data/HRSC2016/test.txt'  # 生成的图片列表清单txt文件名
fw = open(txt_save_path, "w")
for filename in os.listdir(images_path):print(filename.split(".")[0])  fw.write(filename.split(".")[0] + '\n')

然后运行

python DOTA_devkit/HRSC20162COCO.py

然后,把他提供的文件放到新建的work_dirs里边

 cp /root/autodl-nas/ReDet_re50_refpn_3x_hrsc2016/ /root/ReDet/work_dirs/ -r

测试test.py

python tools/test.py configs/ReDet/ReDet_re50_refpn_3x_hrsc2016.py work_dirs/ReDet_re50_refpn_3x_hrsc2016/ReDet_re50_refpn_3x_hrsc2016-d1b4bd29.pth --out work_dirs/ReDet_re50_refpn_3x_hrsc2016/results.pkl

输出：


ReResNet Orientation: 8 Fix Params: False
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
/root/miniconda3/envs/redet/lib/python3.7/site-packages/e2cnn-0.2.1-py3.7.egg/e2cnn/nn/modules/r2_conv/basisexpansion_singleblock.py:80:
UserWarning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead. (Triggered internally at
/pytorch/aten/src/ATen/native/IndexingUtils.h:30.)full_mask[mask] = norms.to(torch.uint8)
The model and loaded state dict do not match exactlymissing keys in source state_dict: neck.fpn_convs.0.conv.expanded_bias, backbone.layer3.5.conv3.filter, neck.fpn_convs.0.conv.filter,
此处略过20行
backbone.layer4.0.conv3.filter, backbone.conv1.filter

尼玛，终于显示了：

[                                                  ] 0/444, elapsed: 0s, ETA:/root/ReDet/mmdet/core/bbox/transforms.py:56: UserWarning: This overload of addcmul is deprecated:addcmul(Tensor input, Number value, Tensor tensor1, Tensor tensor2, *, Tensor out)
Consider using one of the following signatures instead:addcmul(Tensor input, Tensor tensor1, Tensor tensor2, *, Number value, Tensor out) (Triggered internally at  /pytorch/torch/csrc/utils/python_arg_parser.cpp:1005.)gx = torch.addcmul(px, 1, pw, dx)  # gx = px + pw * dx
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 444/444, 3.2 task/s, elapsed: 138s, ETA:     0s
writing results to work_dirs/ReDet_re50_refpn_3x_hrsc2016/results.pkl
(redet) root@container-2f3811a53c-c526a191:~/ReDet#

我猜应该是行了。泪目啊。

试试评价
先把hrsc2016_evaluation.py里边的几行改了

 detpath = r'work_dirs/Task1_{:s}.txt'#annopath = r'data/HRSC2016/Test/labelTxt/{:s}.txt'  # change the directory to the path of val/labelTxt, if you want to do evaluation on the valsetimagesetfile = r'data/HRSC2016/test.txt'

然后运行

python DOTA_devkit/hrsc2016_evaluation.py

显示的东西咱也看不懂。只认得最后那个ap50是90.46，是论文中的结果。

(redet) root@container-2f3811a53c-c526a191:~/ReDet# python DOTA_devkit/hrsc2016_evaluation.py
DOTA_devkit/hrsc2016_evaluation.py:153: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecationsdifficult = np.array([x['difficult'] for x in R]).astype(np.bool)
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 1. 0. ... 1. 1. 1.]
check tp [1. 0. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [1. 1. 0. ... 1. 1. 1.]
check tp [0. 0. 1. ... 0. 0. 0.]
npos num: 1188
AP50: 90.46     AP75: 89.46      mAP: 70.41

测试大尺寸图像中的推理演示。

python demo_large_image.py

报错：

ReResNet Orientation: 8 Fix Params: False
loading annotations into memory...
Traceback (most recent call last):File "demo_large_image.py", line 137, in <module>r"work_dirs/ReDet_re50_refpn_1x_dota15_ms/ReDet_re50_refpn_1x_dota15_ms-9d1a523c.pth")File "demo_large_image.py", line 89, in __init__self.dataset = get_dataset(self.data_test)File "/root/ReDet/mmdet/datasets/utils.py", line 109, in get_datasetdset = obj_from_dict(data_info, datasets)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/mmcv/runner/utils.py", line 78, in obj_from_dictreturn obj_type(**args)File "/root/ReDet/mmdet/datasets/custom.py", line 68, in __init__self.img_infos = self.load_annotations(ann_file)File "/root/ReDet/mmdet/datasets/coco.py", line 25, in load_annotationsself.coco = COCO(ann_file)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/pycocotools-2.0.4-py3.7-linux-x86_64.egg/pycocotools/coco.py", line 81, in __init__with open(annotation_file, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/workfs/jmhan/dota15_1024_ms/test1024/DOTA1_5_test1024.json'

不改了，吃饭了
2022年3月27日15点55分
开始训练吧，

测试了大图片推理（预测）

把测试文件的路径稍作修改：

 model = DetectorModel(r"configs/ReDet/ReDet_re50_refpn_3x_hrsc2016.py",r"work_dirs/ReDet_re50_refpn_3x_hrsc2016/ReDet_re50_refpn_3x_hrsc2016-d1b4bd29.pth")img_dir = "byHand/largeImage"out_dir = 'byHand'

就放了一张图，1000011.bmp
然后运行
输出如下：

(redet) root@container-2f3811a53c-c526a191:~/ReDet# python demo_large_image.py
ReResNet Orientation: 8 Fix Params: False
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
/root/miniconda3/envs/redet/lib/python3.7/site-packages/e2cnn-0.2.1-py3.7.egg/e2cnn/nn/modules/r2_conv/basisexpansion_singleblock.py:80: UserWarning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead. (Triggered internally at  /pytorch/aten/src/ATen/native/IndexingUtils.h:30.)full_mask[mask] = norms.to(torch.uint8)
The model and loaded state dict do not match exactlymissing keys in source state_dict:backbone.layer3.4.conv2.filter,backbone.layer3.5.conv1.filter, backbone.layer3.5.conv3.filter,neck.lateral_convs.3.conv.filter此处上略一万行100000011.bmp0%|                                                                                                      | 0/2 [00:00<?, ?it/s]/root/ReDet/mmdet/core/bbox/transforms.py:56: UserWarning: This overload of addcmul is deprecated:addcmul(Tensor input, Number value, Tensor tensor1, Tensor tensor2, *, Tensor out)
Consider using one of the following signatures instead:addcmul(Tensor input, Tensor tensor1, Tensor tensor2, *, Number value, Tensor out) (Triggered internally at  /pytorch/torch/csrc/utils/python_arg_parser.cpp:1005.)gx = torch.addcmul(px, 1, pw, dx)  # gx = px + pw * dx
100%|██████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  8.33it/s]
(redet) root@container-2f3811a53c-c526a191:~/ReDet#

然后查看文件夹中生成的图片：

贴上原图做对比

我很激动，能推理了，表明大概理也能训练了，这么激动的时刻，先多测试几张，好写实验报告hhh。

后台挂起训练

nohup试一下

nohup python tools/test.py configs/ReDet/ReDet_re50_refpn_3x_hrsc2016.py work_dirs/ReDet_re50_refpn_3x_hrsc2016/ReDet_re50_refpn_3x_hrsc2016-d1b4bd29.pth --out work_dirs/ReDet_re50_refpn_3x_hrsc2016/results.pkl >xxxcbtest.log 2>&1 &

行，之前显示nohup: ignoring input是有点慢，现在行了，待会开始训练。
先把work_dirs的东西清空
训练开始给我卡住了，到处都不会


positional arguments:config                train config file pathoptional arguments:-h, --help            show this help message and exit--work_dir WORK_DIR   the dir to save logs and models--resume_from RESUME_FROMthe checkpoint file to resume from--validate            whether to evaluate the checkpoint during training--gpus GPUS           number of gpus to use (only applicable to non-distributed training)--seed SEED           random seed--launcher {none,pytorch,slurm,mpi}job launcher--local_rank LOCAL_RANK

必选参数我没加，谁叫咱不懂什么是必选参数呢

 python tools/train.py configs/ReDet/ReDet_re50_refpn_3x_hrsc2016.py

还要上传预训练模型到work_dirs

修改ReDet_re50_refpn_3x_hrsc2016。py文件中的路径和刚才上传的与训练pth文件相同。

pretrained='work_dirs/ReResNet_pretrain/re_resnet50_c8_batch256-25b16846.pth',

然后开始训练
他说
0.01 for 4 GPUs
and
0.04 for 16 GPUs.
但是我1 GPUs，也没改lr，目前是1，可能改了训练慢了，就这样吧，后
改了学习率为0.005，用两块RTX 3090 开始训练
142行 img_per_gpu是batch_size

现在img_per_gpu==4
lr == 0.005

nohup python tools/train.py configs/ReDet/ReDet_re50_refpn_3x_hrsc2016.py --gpus 2 >xxxcbtest.log 2>&1 &
[2] 3974

试试distribute train

bash tools/dist_train.sh configs/ReDet/ReDet_re50_refpn_3x_hrsc2016.py 2

不行，卡在ReResNet Orientation: 8 Fix Params: False不动了。
还是单卡吧。

nohup python tools/train.py configs/ReDet/ReDet_re50_refpn_3x_hrsc2016.py >xxxcbtest.log 2>&1 &

GPU总是使用6G左右，还得改。

Sun Mar 27 18:32:11 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.82.00    Driver Version: 470.82.00    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:3D:00.0 Off |                  N/A |
| 30%   43C    P2   216W / 350W |   6402MiB / 24268MiB |     94%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

然后不会了。卡住了，我猜可能要测试评估、转化结果、再验证，试试吧。

python tools/test.py configs/ReDet/ReDet_re50_refpn_3x_hrsc2016.py work_dirs/ReDet_re50_refpn_3x_hrsc2016/latest.pth --out work_dirs/ReDet_re50_refpn_3x_hrsc2016/results.pkl

本想无卡模式运行上边这个，结果太慢了，还是3090吧，
完成了，生成了pkl文件，
然后运行

parser.add_argument('--config', default='configs/ReDet/ReDet_re50_refpn_3x_hrsc2016.py')

上边的parse_results.py文件会把pkl格式输出为txt格式，最后用txt的文件评价，自己调一下文件路径。

出结果了：

(redet) root@container-2f3811a53c-c526a191:~/ReDet# python DOTA_devkit/hrsc2016_evaluation.py
DOTA_devkit/hrsc2016_evaluation.py:153: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecationsdifficult = np.array([x['difficult'] for x in R]).astype(np.bool)
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [1. 0. 0. ... 1. 1. 1.]
check tp [0. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [1. 1. 1. ... 1. 1. 1.]
check tp [0. 0. 0. ... 0. 0. 0.]
npos num: 1188
AP50: 90.37     AP75: 88.93      mAP: 69.46

还是一堆看不懂的东西，不过最后的AP50变了，变小了。自认为复现完成了。我先回顾回顾。

以下是流水账部分

1.从autodl租了一台机器，配Requirements

先租一个2080Ti，
环境选择

先用无卡模式配置环境
费用如下：

然后检查GitHub要求的库

查看NCCL：

查看GCC
命令是`

gcc -v

查看mmcv，我没找到查看方法，我直接安装了

pip install mmcv==0.2.14

2.安装库Install ReDet

完成

因为我的环境是：

，所以安装的命令是：

conda install pytorch==1.1.0 torchvision==0.3.0 cudatoolkit=10.0 -c pytorch

这一步（c）最好安装到根目录下，不然autodl的卡被别人占用了，无法数据迁移，自己就必须重新配环境

这一步经常卡住，卡住的包就自己用pip install

对数据集的处理：

就是先运行HRSC2DOTA。py，这个文件我在他别的repo里边找到的，然后按照缺少的文件去他GitHub其它程序中找找，搬过来，然后运行，最后修改一下文件名。

开始租一个3090跑一下

2022年3月23日

第一次运行测试HRSC2016的语句

python tools/test.py configs/ReDet/ReDet_re50_refpn_3x_hrsc2016.py work_dirs/ReDet_re50_refpn_3x_hrsc2016/ReDet_re50_refpn_3x_hrsc2016-d1b4bd29.pth --out work_dirs/ReDet_re50_refpn_3x_hrsc2016/results.pkl

出现了错误：

Traceback (most recent call last):File "tools/test.py", line 9, in <module>from mmcv.runner import load_checkpoint, get_dist_infoFile "/root/miniconda3/lib/python3.7/site-packages/mmcv/runner/__init__.py", line 1, in <module>from .runner import RunnerFile "/root/miniconda3/lib/python3.7/site-packages/mmcv/runner/runner.py", line 9, in <module>from .checkpoint import load_checkpoint, save_checkpointFile "/root/miniconda3/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 10, in <module>import torchvisionFile "/root/miniconda3/lib/python3.7/site-packages/torchvision/__init__.py", line 2, in <module>from torchvision import datasetsFile "/root/miniconda3/lib/python3.7/site-packages/torchvision/datasets/__init__.py", line 9, in <module>from .fakedata import FakeDataFile "/root/miniconda3/lib/python3.7/site-packages/torchvision/datasets/fakedata.py", line 3, in <module>from .. import transformsFile "/root/miniconda3/lib/python3.7/site-packages/torchvision/transforms/__init__.py", line 1, in <module>from .transforms import *File "/root/miniconda3/lib/python3.7/site-packages/torchvision/transforms/transforms.py", line 17, in <module>from . import functional as FFile "/root/miniconda3/lib/python3.7/site-packages/torchvision/transforms/functional.py", line 5, in <module>from PIL import Image, ImageOps, ImageEnhance, PILLOW_VERSION
ImportError: cannot import name 'PILLOW_VERSION' from 'PIL' (/root/miniconda3/lib/python3.7/site-packages/PIL/__init__.py)

看看他的issue里边有没有这个问题。
（先把checkpoints下下来试试）

还是不行，接着看issue吧。
忘了激活conda环境了，（虽然不是这个的问题）

source activate redet

issue没找到，去看百度。
晚上说pillow库的版本过高导致的，然后我降了版本。

conda install pillow==6.2.0

然后报错，可能是路径不对

(redet) root@container-e19b1182ac-a18adac2:~/ReDet# python tools/test.py configs/ReDet/ReDet_re50_refpn_3x_hrsc2016.py work_dirs/ReDet_re50_refpn_3x_hrsc2016/ReDet_re50_refpn_3x_hrsc2016-d1b4bd29.pth --out work_dirs/ReDet_re50_refpn_3x_hrsc2016/results.pkl
ReResNet Orientation: 8 Fix Params: False
loading annotations into memory...
Traceback (most recent call last):File "tools/test.py", line 208, in <module>main()File "tools/test.py", line 158, in maindataset = get_dataset(cfg.data.test)File "/root/ReDet/mmdet/datasets/utils.py", line 109, in get_datasetdset = obj_from_dict(data_info, datasets)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/mmcv-0.2.13-py3.7-linux-x86_64.egg/mmcv/runner/utils.py", line 78, in obj_from_dictreturn obj_type(**args)File "/root/ReDet/mmdet/datasets/custom.py", line 68, in __init__self.img_infos = self.load_annotations(ann_file)File "/root/ReDet/mmdet/datasets/coco.py", line 25, in load_annotationsself.coco = COCO(ann_file)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/pycocotools-2.0.4-py3.7-linux-x86_64.egg/pycocotools/coco.py", line 81, in __init__with open(annotation_file, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'data/HRSC2016/Test/HRSC_L1_test.json'

然后找这个。（后悔啊，重新把文件和数据集按照他的要求放吧，至少少出问题。

把HRSC2016数据集放到/root/ReDet/data/HRSC2016

(redet) root@container-e19b1182ac-a18adac2:~/ReDet# python tools/test.py configs/ReDet/ReDet_re50_refpn_3x_hrsc2016.py work_dirs/ReDet_re50_refpn_3x_hrsc2016/ReDet_re50_refpn_3x_hrsc2016-d1b4bd29.pth --out work_dirs/ReDet_re50_refpn_3x_hrsc2016/results.pkl
ReResNet Orientation: 8 Fix Params: False
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!The model and loaded state dict do not match exactlymissing keys in source state_dict: neck.lateral_convs.1.conv.filter, backbone.layer4.2.conv2.filter, backbone.layer3.3.conv3.filter, backbone.layer3.0.conv1.filter, backbone.layer2.1.conv1.filter, backbone.layer3.5.conv3.filter, backbone.layer4.1.conv2.filter, backbone.layer4.0.conv1.filter, backbone.layer3.4.conv1.filter,
neck.lateral_convs.2.conv.expanded_bias, neck.lateral_convs.3.conv.filter, backbone.layer2.0.downsample.0.filter, backbone.conv1.filter, backbone.layer4.0.downsample.0.filter, backbone.layer2.2.conv2.filter, backbone.layer3.1.conv2.filter, backbone.layer2.3.conv1.filter, backbone.layer2.0.conv1.filter, neck.lateral_convs.1.conv.expanded_bias,
backbone.layer4.0.conv3.filter, backbone.layer4.2.conv3.filter, backbone.layer3.1.conv3.filter, backbone.layer3.5.conv2.filter, backbone.layer3.2.conv3.filter, neck.fpn_convs.2.conv.filter, backbone.layer2.0.conv3.filter, neck.fpn_convs.3.conv.filter, backbone.layer3.4.conv2.filter,
backbone.layer3.0.conv2.filter, backbone.layer4.1.conv1.filter, neck.fpn_convs.0.conv.filter, backbone.layer4.2.conv1.filter, backbone.layer3.0.conv3.filter, backbone.layer4.0.conv2.filter, backbone.layer3.5.conv1.filter, backbone.layer2.1.conv3.filter, backbone.layer2.1.conv2.filter, neck.fpn_convs.2.conv.expanded_bias, neck.fpn_convs.3.conv.expanded_bias, backbone.layer3.1.conv1.filter, backbone.layer4.1.conv3.filter, neck.lateral_convs.2.conv.filter, neck.fpn_convs.1.conv.expanded_bias, neck.fpn_convs.1.conv.filter,
backbone.layer2.2.conv1.filter, neck.lateral_convs.0.conv.expanded_bias, backbone.layer3.2.conv1.filter,
backbone.layer3.4.conv3.filter, neck.lateral_convs.0.conv.filter, neck.fpn_convs.0.conv.expanded_bias, backbone.layer2.3.conv3.filter, backbone.layer2.0.conv2.filter,
neck.lateral_convs.3.conv.expanded_bias, backbone.layer3.3.conv1.filter, backbone.layer3.2.conv2.filter, backbone.layer2.3.conv2.filter, backbone.layer3.0.downsample.0.filter, backbone.layer2.2.conv3.filter, backbone.layer3.3.conv2.filter

看issue里边说这是正常现象，15：33开始，再试一遍，可能是刚才时间太长，没把握。

一块GPU 3090 运行了15分钟，还没有结果，

还没反应，关掉试试test
一直没成功，改hrsc2016_evalxxxx.py没成功

root@container-e19b1182ac-a18adac2:~/ReDet# source activate redet
(redet) root@container-e19b1182ac-a18adac2:~/ReDet# python DOTA_devkit/hrsc2016_evaluation.py
Traceback (most recent call last):File "DOTA_devkit/hrsc2016_evaluation.py", line 293, in <module>main()File "DOTA_devkit/hrsc2016_evaluation.py", line 282, in mainrec, prec, ap = voc_eval(detpath, annopath, imagesetfile, 'ship', ovthresh=iou_thr, use_07_metric=True)File "DOTA_devkit/hrsc2016_evaluation.py", line 125, in voc_evalwith open(imagesetfile, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'data/HRSC2016/Test/test.txt'
(redet) root@container-e19b1182ac-a18adac2:~/ReDet# python DOTA_devkit/hrsc2016_evaluation.py
Traceback (most recent call last):File "DOTA_devkit/hrsc2016_evaluation.py", line 297, in <module>main()File "DOTA_devkit/hrsc2016_evaluation.py", line 286, in mainrec, prec, ap = voc_eval(detpath, annopath, imagesetfile, 'ship', ovthresh=iou_thr, use_07_metric=True)File "DOTA_devkit/hrsc2016_evaluation.py", line 134, in voc_evalrecs[imagename] = parse_gt(annopath.format(imagename))File "DOTA_devkit/hrsc2016_evaluation.py", line 28, in parse_gtwith  open(filename, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'data/HRSC2016/Test/labelTxt/100000624.txt'

问了学姐，可能是generate没完成，txt文件中才500行。
是数据集生成、转换的格式有问题，原文没提供hrsc2dota.py文件，去其他地方找的有问题，待会在搞。

开始读HRSC2DOTA.PY，
其中的if difficult==0 处理，==1进行忽视，有点疑问。

HRSC2DOTA.py读完了，没问题，运行结果：
查看当前文件夹下有多少文件（夹）

ls | wc -w

Train
AllImages 626个文件
Annotations 626
labelTxt 626
没问题
Test
AllImages 444
Annotations 444
labelTxt 444
同时自己windows下载了数据集检查了Train 626 Test 444个文件，是对的
再次使用HRSC2DOTA.py出现

(redet) root@container-e19b1182ac-a18adac2:~/ReDet/DOTA_devkit# python HRSC2DOTA.py
Traceback (most recent call last):File "HRSC2DOTA.py", line 79, in <module>generate_txt_labels('/root/ReDet/data/HRSC2016/Train')File "HRSC2DOTA.py", line 59, in generate_txt_labelsf_label = open(label)#打开原来的.xml文件
FileNotFoundError: [Errno 2] No such file or directory: '/root/ReDet/data/HRSC2016/Train/Annotations/.ipynb_checkpoints.xml'

结果发现从jupyterlab打开一次图片就会留下一个.ipy…的文件夹，里边有100000624-checkpoint.bmp文件，应该是图片的缓存。
ll命令后显示的total是占用的空间，默认是Bytes

目前不懂

开始读HRSC2COCO.py
然后运行完了。
hrsc2016_evalate.py还是不行，缺少624.bmp
重新下一便数据集试试

看ReDet论文中显示的数据集信息如下：

总共1061张，理论上 train val test 分别有 436 181 444 张图片，woc，我之前下载的是个包团md。
好像他把train和val合并当作train了，我再瞅瞅。好像还是有问题，回想起，我当时复现DAL他的txt是自己写了套代码生成的，我去找找。
把DAL的generate_images.py拷贝过来了。
运行完了发现并没有卵用，这属于瞎搞了，南辕北辙了属于是，老老实实写写获取当前列表的脚本方法吧。

find  -name '*.bmp' > train.txt

先把目录下的文件名都搞进txt去，然后用python处理，吃完饭回来再处理。
2022年3月24日19点12分
今晚上的任务就是把generate_txt.py写好，把hrsc2016_evalate.py运行起来。


#2022年3月24日19点15分 手写import os
import re
images_path = '/root/ReDet/data/HRSC2016/Train/images'   # 图片存放目录
txt_save_path = '/root/ReDet/data/HRSC2016/Train/images/train.txt'  # 生成的图片列表清单txt文件名
fw = open(txt_save_path, "w")
for filename in os.listdir(images_path):print(filename.split(".")[0])  fw.write(filename.split(".")[0] + '\n')

然后把这个运行一手。
再把Train改为Test再运行一手。

train。txt 626个
test。txt 444个

(redet) root@container-e19b1182ac-a18adac2:~/ReDet# python DOTA_devkit/hrsc2016_evaluation.py
DOTA_devkit/hrsc2016_evaluation.py:153: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecationsdifficult = np.array([x['difficult'] for x in R]).astype(np.bool)
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 0. 0. ... 1. 1. 1.]
check tp [1. 1. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [0. 1. 0. ... 1. 1. 1.]
check tp [1. 0. 1. ... 0. 0. 0.]
npos num: 1188
check fp: [1. 1. 0. ... 1. 1. 1.]
check tp [0. 0. 1. ... 0. 0. 0.]
npos num: 1188
AP50: 90.46     AP75: 89.46      mAP: 70.41
(redet) root@container-e19b1182ac-a18adac2:~/

能运行起来了，歇会。

2022年3月25日10点50分
还没看hrsc2016_evlate.py的代码，先运行着test.py试试几分钟有反应，上次15分钟没反应

python tools/test.py configs/ReDet/ReDet_re50_refpn_3x_hrsc2016.py work_dirs/ReDet_re50_refpn_3x_hrsc2016/ReDet_re50_refpn_3x_hrsc2016-d1b4bd29.pth --out work_dirs/ReDet_re50_refpn_3x_hrsc2016/results.pkl

nohup python tools/test.py configs/ReDet/ReDet_re50_refpn_3x_hrsc2016.py work_dirs/ReDet_re50_refpn_3x_hrsc2016/ReDet_re50_refpn_3x_hrsc2016-d1b4bd29.pth --out work_dirs/ReDet_re50_refpn_3x_hrsc2016/results.pkl >redet3251056.log 2>&1 &

nohup python tools/test.py configs/ReDet/ReDet_re50_refpn_3x_hrsc2016.py work_dirs/ReDet_re50_refpn_3x_hrsc2016/ReDet_re50_refpn_3x_hrsc2016-d1b4bd29.pth --out work_dirs/ReDet_re50_refpn_3x_hrsc2016/results.pkl >redett.log 2>&1 &

nohup python tesss.py >redett.log 2>&1 &

暂时没用nohup，老显示nohup: ignoring input，不知道什么原因

(redet) root@container-e19b1182ac-a18adac2:~/ReDet# python tools/test.py configs/ReDet/ReDet_re50_refpn_3x_hrsc2016.py work_dirs/ReDet_re50_refpn_3x_hrsc2016/ReDet_re50_refpn_3x_hrsc2016-d1b4bd29.pth --out work_dirs/ReDet_re50_refpn_3x_hrsc2016/results.pkl
ReResNet Orientation: 8 Fix Params: False
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
The model and loaded state dict do not match exactlymissing keys in source state_dict: neck.fpn_convs.0.conv.expanded_bias, backbone.layer3.2.conv3.filter, backbone.layer2.2.conv2.filter, backbone.layer3.5.conv2.filter, backbone.layer4.2.conv1.filter, backbone.layer4.1.conv2.filter, backbone.layer4.2.conv2.filter, backbone.layer4.0.conv3.filter, backbone.layer2.0.downsample.0.filter, backbone.layer4.0.conv1.filter, backbone.layer2.0.conv3.filter, backbone.conv1.filter, backbone.layer3.1.conv3.filter, backbone.layer2.1.conv3.filter, neck.lateral_convs.0.conv.expanded_bias, backbone.layer3.2.conv2.filter, neck.fpn_convs.1.conv.filter, backbone.layer4.2.conv3.filter, neck.lateral_convs.1.conv.filter, backbone.layer2.1.conv2.filter, backbone.layer2.0.conv1.filter, backbone.layer3.4.conv3.filter, backbone.layer3.0.downsample.0.filter, backbone.layer3.0.conv1.filter, backbone.layer3.0.conv2.filter, backbone.layer3.4.conv1.filter, backbone.layer4.1.conv3.filter, backbone.layer2.1.conv1.filter, backbone.layer3.1.conv2.filter, backbone.layer3.3.conv1.filter, backbone.layer3.3.conv3.filter, backbone.layer2.2.conv3.filter, backbone.layer3.3.conv2.filter, backbone.layer3.2.conv1.filter, neck.fpn_convs.3.conv.expanded_bias, backbone.layer4.0.downsample.0.filter, backbone.layer4.1.conv1.filter, neck.fpn_convs.2.conv.expanded_bias, backbone.layer2.3.conv1.filter, neck.lateral_convs.2.conv.filter, backbone.layer2.2.conv1.filter, neck.fpn_convs.0.conv.filter, backbone.layer3.5.conv3.filter, backbone.layer3.5.conv1.filter, neck.fpn_convs.3.conv.filter, backbone.layer3.1.conv1.filter, backbone.layer4.0.conv2.filter, neck.lateral_convs.1.conv.expanded_bias, backbone.layer2.0.conv2.filter, neck.lateral_convs.2.conv.expanded_bias, backbone.layer2.3.conv2.filter, backbone.layer3.4.conv2.filter, backbone.layer2.3.conv3.filter, neck.fpn_convs.2.conv.filter, neck.lateral_convs.3.conv.expanded_bias, neck.lateral_convs.3.conv.filter, neck.fpn_convs.1.conv.expanded_bias, neck.lateral_convs.0.conv.filter, backbone.layer3.0.conv3.filterTraceback (most recent call last):File "tools/test.py", line 208, in <module>main()File "tools/test.py", line 178, in mainoutputs = single_gpu_test(model, data_loader, args.show, args.log_dir)File "tools/test.py", line 22, in single_gpu_testmodel.eval()File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1009, in evalreturn self.train(False)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 998, in trainmodule.train(mode)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 998, in trainmodule.train(mode)File "/root/ReDet/mmdet/models/backbones/re_resnet.py", line 726, in trainsuper(ReResNet, self).train(mode)File "/root/ReDet/mmdet/models/backbones/base_backbone.py", line 56, in trainsuper(BaseBackbone, self).train(mode)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 998, in trainmodule.train(mode)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/e2cnn-0.2.1-py3.7.egg/e2cnn/nn/modules/r2_conv/r2convolution.py", line 387, in train_filter, _bias = self.expand_parameters()File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/e2cnn-0.2.1-py3.7.egg/e2cnn/nn/modules/r2_conv/r2convolution.py", line 304, in expand_parameters_filter = self.basisexpansion(self.weights)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__result = self.forward(*input, **kwargs)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/e2cnn-0.2.1-py3.7.egg/e2cnn/nn/modules/r2_conv/basisexpansion_blocks.py", line 327, in forward_filter = self._expand_block(weights, io_pair).reshape(out_indices[2], in_indices[2], self.S)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/e2cnn-0.2.1-py3.7.egg/e2cnn/nn/modules/r2_conv/basisexpansion_blocks.py", line 294, in _expand_block_filter = block_expansion(coefficients)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__result = self.forward(*input, **kwargs)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/e2cnn-0.2.1-py3.7.egg/e2cnn/nn/modules/r2_conv/basisexpansion_singleblock.py", line 115, in forwardreturn torch.einsum('boi...,kb->koi...', self.sampled_basis, weights) #.transpose(1, 2).contiguous()File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/torch/functional.py", line 211, in einsumreturn torch._C._VariableFunctions.einsum(equation, operands)
RuntimeError: cublas runtime error : the GPU program failed to execute at /opt/conda/conda-bld/pytorch_1556653114079/work/aten/src/THC/THCBlas.cu:450
[1]+  Terminated              nohup python tools/test.py configs/ReDet/ReDet_re50_refpn_3x_hrsc2016.py work_dirs/ReDet_re50_refpn_3x_hrsc2016/ReDet_re50_refpn_3x_hrsc2016-d1b4bd29.pth --out work_dirs/ReDet_re50_refpn_3x_hrsc2016/results.pkl > redett.log 2>&1

应该是显卡不够用了。偶不，应该是GPU 3090和pytorch 1.1.0不匹配，

root@container-93a511873c-4353c534:~# python
Python 3.7.9 (default, Aug 31 2020, 12:42:55)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> torch.cuda.is_available()
Traceback (most recent call last):File "<stdin>", line 1, in <module>
NameError: name 'torch' is not defined
>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.cuda.device_count()
1
>>> torch.cuda.get_device_name(0)
'NVIDIA GeForce RTX 3090'
>>> torch.__version__
'1.1.0'
>>>

说明租用的GPU环境可以正常使用3090，只是代码中的AT_CHECK在 torch 1.5 #36581 中已弃用，所以要修改所有的编译源文件，改成TORCH_CHECK。
作者的回复：
I guess the reason is: cuda11.0 requires higher version pytorch (>1.3), while some ops in our code are designed for pytorch<1.5.

If so, to fix this, you need to replace all AT_CHECK with TORCH_CHECK in the source code (.cpp and .cu). See pytorch/pytorch#36581

从issue里找到了一行代码，不知道啥意思。

find . -type f -exec sed -i 's/AT_CHECK/TORCH_CHECK/g' {} +

运行完之后好像是有用，mmdet/ops/文件夹下运行了一下,又不放心，在ReDet里运行了一下（ps，因为在系统下运行简直时间太久了，等不及了就ctrl c了

(redet) root@container-e19b1182ac-a18adac2:~/ReDet# python tools/test.py configs/ReDet/ReDet_re50_refpn_3x_hrsc2016.py work_dirs/ReDet_re50_refpn_3x_hrsc2016/ReDet_re50_refpn_3x_hrsc2016-d1b4bd29.pth --out work_dirs/ReDet_re50_refpn_3x_hrsc2016/results.pkl
ReResNet Orientation: 8 Fix Params: False
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
The model and loaded state dict do not match exactlymissing keys in source state_dict: neck.fpn_convs.3.conv.filter, backbone.layer3.1.conv1.filter, backbone.layer3.4.conv2.filter, backbone.layer2.0.downsample.0.filter, backbone.layer4.0.conv2.filter, backbone.layer3.3.conv3.filter, backbone.layer2.1.conv2.filter, backbone.layer2.0.conv3.filter, backbone.layer4.1.conv2.filter, backbone.layer3.0.conv2.filter, neck.fpn_convs.0.conv.expanded_bias, backbone.layer2.3.conv3.filter, backbone.layer2.0.conv1.filter, neck.fpn_convs.1.conv.filter, backbone.layer2.3.conv2.filter, neck.lateral_convs.3.conv.filter, backbone.layer4.0.downsample.0.filter, backbone.layer3.4.conv1.filter, backbone.layer4.0.conv3.filter, backbone.layer3.0.conv1.filter, neck.lateral_convs.0.conv.filter, backbone.layer2.0.conv2.filter, neck.lateral_convs.2.conv.expanded_bias, backbone.layer3.3.conv1.filter, backbone.layer4.1.conv1.filter, neck.lateral_convs.3.conv.expanded_bias, backbone.layer3.5.conv3.filter, backbone.layer3.2.conv1.filter, backbone.layer4.0.conv1.filter, backbone.layer2.1.conv3.filter, backbone.layer3.1.conv2.filter, backbone.layer2.3.conv1.filter, backbone.layer3.5.conv1.filter, backbone.layer4.1.conv3.filter, neck.lateral_convs.0.conv.expanded_bias, backbone.layer4.2.conv2.filter, backbone.layer4.2.conv3.filter, neck.lateral_convs.2.conv.filter, neck.lateral_convs.1.conv.expanded_bias, neck.fpn_convs.0.conv.filter, backbone.layer3.0.conv3.filter, neck.fpn_convs.2.conv.expanded_bias, backbone.layer2.1.conv1.filter, backbone.layer2.2.conv2.filter, backbone.layer3.0.downsample.0.filter, backbone.layer2.2.conv3.filter, neck.fpn_convs.1.conv.expanded_bias, backbone.layer3.2.conv2.filter, backbone.layer2.2.conv1.filter, neck.fpn_convs.3.conv.expanded_bias, backbone.layer3.1.conv3.filter, backbone.layer3.5.conv2.filter, backbone.layer3.4.conv3.filter, neck.fpn_convs.2.conv.filter, backbone.layer4.2.conv1.filter, backbone.layer3.2.conv3.filter, neck.lateral_convs.1.conv.filter, backbone.layer3.3.conv2.filter, backbone.conv1.filterTraceback (most recent call last):File "tools/test.py", line 208, in <module>main()File "tools/test.py", line 178, in mainoutputs = single_gpu_test(model, data_loader, args.show, args.log_dir)File "tools/test.py", line 22, in single_gpu_testmodel.eval()File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1009, in evalreturn self.train(False)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 998, in trainmodule.train(mode)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 998, in trainmodule.train(mode)File "/root/ReDet/mmdet/models/backbones/re_resnet.py", line 726, in trainsuper(ReResNet, self).train(mode)File "/root/ReDet/mmdet/models/backbones/base_backbone.py", line 56, in trainsuper(BaseBackbone, self).train(mode)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 998, in trainmodule.train(mode)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/e2cnn-0.2.1-py3.7.egg/e2cnn/nn/modules/r2_conv/r2convolution.py", line 387, in train_filter, _bias = self.expand_parameters()File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/e2cnn-0.2.1-py3.7.egg/e2cnn/nn/modules/r2_conv/r2convolution.py", line 304, in expand_parameters_filter = self.basisexpansion(self.weights)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__result = self.forward(*input, **kwargs)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/e2cnn-0.2.1-py3.7.egg/e2cnn/nn/modules/r2_conv/basisexpansion_blocks.py", line 327, in forward_filter = self._expand_block(weights, io_pair).reshape(out_indices[2], in_indices[2], self.S)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/e2cnn-0.2.1-py3.7.egg/e2cnn/nn/modules/r2_conv/basisexpansion_blocks.py", line 294, in _expand_block_filter = block_expansion(coefficients)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__result = self.forward(*input, **kwargs)File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/e2cnn-0.2.1-py3.7.egg/e2cnn/nn/modules/r2_conv/basisexpansion_singleblock.py", line 115, in forwardreturn torch.einsum('boi...,kb->koi...', self.sampled_basis, weights) #.transpose(1, 2).contiguous()File "/root/miniconda3/envs/redet/lib/python3.7/site-packages/torch/functional.py", line 211, in einsumreturn torch._C._VariableFunctions.einsum(equation, operands)
RuntimeError: cublas runtime error : the GPU program failed to execute at /opt/conda/conda-bld/pytorch_1556653114079/work/aten/src/THC/THCBlas.cu:450
(redet) root@container-e19b1182ac-a18adac2:~/ReDet# python
Python 3.7.11 (default, Jul 27 2021, 14:32:16)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version>>File "<stdin>", line 1torch.__version>>^
SyntaxError: invalid syntax
>>> torch.__version__
'1.1.0'

还是不行，先歇会，码码字。

2022年3月26日14点21分
看了autodl租的3090的环境，我泪目了

这版本高的离谱，pytorch1.1.0必不支持啊。
换块2080Ti看看驱动版本。

昨天改完了TORCH_CHECK好像没编译啊，重新试试。

确实要修改AT_CHECK
昨天改的mmdet/ops/src TORCH_CHECK不全
好像是 3090的pytorch必须1.7以上？？？环境白瞎了啊。

行重新安装吧，看了看A40，虽然有空闲，但是很多信息不知道啊，比如A40适配哪些pytorch版本，这方面还是3090的信息相对多一点，还有潮汐算力。

复现ReDet RTX 3090 pytorch1.8.1相关推荐

GeForce RTX 3090深度学习测评
GeForce RTX 3090深度学习测评环境踩坑八卡GeForce RTX 3090+Pytorch1.7+cuda11.1+对应cudnn pytorch 1.7以下版本无法对显卡写入数据 ...
因买不到 RTX 3090，他花 19 万搭了一个专业级机器学习工作站
点击上方"视学算法",选择加"星标"或"置顶" 重磅干货,第一时间送达作者 | Emil Wallner 编译 | 青暮.陈大鑫转自 | ...
因买不到RTX 3090，小哥自己搭建了一个专业级机器学习工作站
点击上方"AI遇见机器学习",选择"星标"公众号重磅干货,第一时间送达来自|知乎作者|Emil Wallner 来源 AI科技评论编辑丨极市平台极 ...
时代变了，大人：RTX 3090时代，哪款显卡配得上我的炼丹炉？
点击上方,选择星标或置顶,不定期资源大放送! 阅读大概需要15分钟 Follow小博主,每天更新前沿干货黄老板的 RTX 30 系列显卡 9 月 17 日就要发售了,现在我要怎么买 GPU?很急很关 ...
如何评价英伟达9月2日凌晨发布的最强消费级显卡RTX 3090？
点击上方,选择星标或置顶,不定期资源大放送! 阅读大概需要15分钟 Follow小博主,每天更新前沿干货本文整理自知乎问答,仅用于学术分享.如有侵权,请联系后台作删文处理. 编辑|极市平台导读就 ...
RTX 3090 AI性能实测：FP32训练速度提升50%，张量核心缩水
晓查发自凹非寺量子位报道 | 公众号 QbitAI NVIDIA最近发布了备受期待的RTX 30系列显卡. 其中,性能最强大的RTX 3090具有24GB显存和10496个CUDA核心.而2 ...
cache性能分析实验北邮_AMD RX 6000游戏性能实测：全面领先RTX 3090
AMD 官网现已公布 RX 6000 显卡的多款游戏测试数据,测试平台采用了 AMD 的 "Zen3"Ryzen 9 5900X CPU. 在<战地 5>.<无主 ...
conda安装cuda_记一次在 RTX 3090 上安装 APEX
0. 背景最近炼丹开始用一块 RTX 3090 (24 G),因为代码里用 ALBERT-base-v2 处理了很多东西导致显存爆炸,于是开始谋求可以节约显存的办法. 网上的一些方法例如及时 del ...
卷成这样，非逼我用RTX 3090?（深度学习GPU平台种草
我是一枚做AI算法的,已经在这片红海里,卷了好些年,身边总有一些想转AI的盆友,对此,本人的拷问也从不缺席,数学能行吗,coding厉害吗,最重要的是,有GPU资源吗? 曾几何时,实验室有限的资源,让 ...

复现ReDet RTX 3090 pytorch1.8.1