CDeC-Net代码实现

CDeC-Net是用于检测文档图像中表格的端到端网络。该网络由Mask R-CNN的多级扩展组成，具有双主干，具有可变形卷积，用于检测在较高IoU阈值下具有高检测精度的规模变化的表。CDeC-Net在各种公开的基准数据集上实现了最先进的结果。该代码使用MMdetection framework（版本2.0.0）在Pytorch中实现。

依赖：
Python = 3.6+
PyTorch = 1.4.0
Torchvision = 0.5.0
Cuda = 10.0
MMdetection = 2.0.0
mmcv = 0.5.4

1.克隆代码库git clone https://github.com/mdv3101/CDeCNet
2.安装依赖项pip install torch==1.4.0 torchvision==0.5.0

cd CDecNet/
pip install -r requirements/build.txt
pip install "git+https://github.com/open-mmlab/cocoapi.git#subdirectory=pycocotools"
pip install -v -e .

详细安装步骤见下：

要求：

Linux or macOS (Windows is not currently officially supported)
Python 3.6+
PyTorch 1.3+
CUDA 9.2+ (如果您从源代码构建PyTorch，CUDA 9.0也兼容)
GCC 5+
mmcv

安装mmdetection

a：创建并激活conda虚拟环境
b：按照官方说明安装PyTorch和torchvision
注：确保编译CUDA版本和运行时CUDA版本匹配。您可以在Pytorch网站上查看支持的CUDA版本中的预编译包。
c：克隆mmdetection仓库

git clone https://github.com/open-mmlab/mmdetection.git
cd mmdetection

d：安装生成要求，然后安装mmdetection。（我们通过github repo而不是Pypi安装我们的分叉版本的PycoTools，以更好地与我们的repo兼容。）

pip install -r requirements/build.txt
pip install "git+https://github.com/open-mmlab/cocoapi.git#subdirectory=pycocotools"
pip install -v -e .  # or "python setup.py develop"

注：
1.git提交id将通过步骤d写入版本号，例如0.6.0+2e7045c。该版本还将保存在经过训练的模型中。建议每次从github获取一些更新时都运行步骤d。如果修改了C++/CUDA代码，则必须执行此步骤。

重要提示：如果使用不同的CUDA/PyTorch版本重新安装mmdet，请务必移除./build文件夹。

pip uninstall mmdet
rm -rf ./build
find . -name "*.so" | xargs rm

2.按照上述说明，mmdetection安装在devmode上，对代码所做的任何本地修改都将生效，而无需重新安装（除非您提交一些提交并希望更新版本号）。

3.如果您想使用opencv-python-headless而不是opencv-python，您可以在安装MMCV之前安装它

4.某些依赖项是可选的。简单运行pip install -v -e .将仅安装最低运行时要求。使用可选依赖项，如albumentations和imagecorruptions，可以使用pip install-r requirements/optional.txt手动安装它们。在调用pip时指定所需的额外功能（例如pip install -v -e .[optional]）。

仅使用CPU安装（我用不到，我有GPU）略

另一个选项：Docker图像
我们提供了一个dockerfile来构建图像。

# build an image with PyTorch 1.5, CUDA 10.1
docker build -t mmdetection docker/

与下面一起运行：

docker run --gpus all --shm-size=8g -it -v {DATA_DIR}:/mmdetection/data mmdetection

从头开始的设置脚本

下面是使用conda设置mmdetection的完整脚本。

conda create -n open-mmlab python=3.7 -y
conda activate open-mmlab# install latest pytorch prebuilt with the default prebuilt CUDA version (usually the latest)
conda install -c pytorch pytorch torchvision -y
git clone https://github.com/open-mmlab/mmdetection.git
cd mmdetection
pip install -r requirements/build.txt
pip install "git+https://github.com/open-mmlab/cocoapi.git#subdirectory=PythonAPI"
pip install -v -e .

使用多个MMDetection版本

训练和测试脚本已经修改了Python路径，以确保脚本在当前目录中使用MMDetection。

要使用安装在环境中而不是您正在使用的默认MMDetection，可以删除这些脚本中的以下行

PYTHONPATH="$(dirname $0)/..":$PYTHONPATH

训练

1.在CDeCNet 中创建 'dataset’文件夹，并放入你的数据data。你的数据集必须是MS-coco格式。目录结构应为：

dataset├── coco| ├── annotations| ├── train2014| ├── val2014| ├── logs

2.在CDeCNet 中创建 'model’文件夹，并把在MS-Coco上与训练的模型放入此目录中。模型文件链接

3.在default_runtime.py中setload_from= /path/of/pre-trained/model
4.要在CDeC网络上训练模型，请使用以下命令

python -u tools/train.py configs/dcn/db_cascade_mask_rcnn_x101_fpn_dconv_c3-c5_1x_coco.py --work-dir dataset/coco/logs/

注意：注意，步骤2和3是可选的。如果您想从头开始训练模型，那么可以跳过这两个步骤。（从头开始训练模型需要更长的时间才能收敛）

评估

要评估经过训练的模型，请运行以下命令：

python tools/test.py configs/dcn/db_cascade_mask_rcnn_x101_fpn_dconv_c3-c5_1x_coco.py dataset/coco/logs/latest.pth \--format-only --options "jsonfile_prefix=evaluation_result"

有关各种培训和评估方法的详细信息，请参阅这里

demo

要在单个图像上运行推断，使用image_demo.py file 通过运行以下命令：

python demo/image_demo.py demo_image.jpg configs/dcn/db_cascade_mask_rcnn_x101_fpn_dconv_c3-c5_1x_coco.py dataset/coco/logs/latest.pth \--score-thr 0.95 --output-img 'output_demo.jpg'

训练、评估及mmdet的基本使用

1.准备数据集

建议将数据集根符号链接到$MMDETECTION/data。如果文件夹结构不同，则可能需要更改配置文件中的相应路径。

mmdetection
├── mmdet
├── tools
├── configs
├── data
│   ├── coco
│   │   ├── annotations
│   │   ├── train2017
│   │   ├── val2017
│   │   ├── test2017
│   ├── cityscapes
│   │   ├── annotations
│   │   ├── leftImg8bit
│   │   │   ├── train
│   │   │   ├── val
│   │   ├── gtFine
│   │   │   ├── train
│   │   │   ├── val
│   ├── VOCdevkit
│   │   ├── VOC2007
│   │   ├── VOC2012

The cityscapes 标注必须使用tools/convert_datasets/cityscapes.py转换为coco格式

pip install cityscapesscripts
python tools/convert_datasets/cityscapes.py ./data/cityscapes --nproc 8 --out-dir ./data/cityscapes/annotations

目前cityscapes中的config文件使用coco预训练权重去初始化。如果网络不可用或速度较慢，您可以提前下载预训练的模型，否则会在训练开始时出错。
有关使用自定义数据集的信息，请参阅这里

使用预训练模型进行推理

我们提供了测试脚本来评估整个数据集（COCO、PASCAL VOC、Cityscapes等），还提供了一些高级API，以便于与其他项目集成。

测试数据集
可以使用以下命令测试数据集。

# single-gpu testing
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] [--show]# multi-gpu testing
./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}]

可选参数：

RESULT_FILE: Filename of the output results in pickle format. If not specified, the results will not be saved to a file.
EVAL_METRICS: Items to be evaluated on the results. Allowed values depend on the dataset, e.g., proposal_fast, proposal, bbox, segm are available for COCO, mAP, recall for PASCAL VOC. Cityscapes could be evaluated by cityscapes as well as all COCO metrics.
--show: If specified, detection results will be plotted on the images and shown in a new window. It is only applicable to single GPU testing and used for debugging and visualization. Please make sure that GUI is available in your environment, otherwise you may encounter the error like cannot connect to X server.
--show-dir: If specified, detection results will be plotted on the images and saved to the specified directory. It is only applicable to single GPU testing and used for debugging and visualization. You do NOT need a GUI available in your environment for using this option.
--show-score-thr: If specified, detections with score below this threshold will be removed.

举栗子：
假设您已经将检查点下载到checkpoints/.目录。：
1.测试Faster R-CNN并将结果可视化。按任意键进入下一个图像。

python tools/test.py configs/faster_rcnn_r50_fpn_1x_coco.py \checkpoints/faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth \--show

2.测试Faster R-CNN，并保存绘制的图像，以便后期可视化。

python tools/test.py configs/faster_rcnn_r50_fpn_1x.py \checkpoints/faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth \--show-dir faster_rcnn_r50_fpn_1x_results

3.在PASCAL VOC上测试Faster R-CNN（不保存测试结果）并评估mAP。

python tools/test.py configs/pascal_voc/faster_rcnn_r50_fpn_1x_voc.py \checkpoints/SOME_CHECKPOINT.pth \--eval mAP

4.用8个GPU测试Mask R-CNN，并评估bbox和Mask AP。

./tools/dist_test.sh configs/mask_rcnn_r50_fpn_1x_coco.py \checkpoints/mask_rcnn_r50_fpn_1x_20181010-069fa190.pth \8 --out results.pkl --eval bbox segm --options "classwise=True"

5.用8个GPU测试Mask R-CNN，并评估类classwise bbox和Mask AP。

./tools/dist_test.sh configs/mask_rcnn_r50_fpn_1x_coco.py \checkpoints/mask_rcnn_r50_fpn_1x_20181010-069fa190.pth \8 --out results.pkl --eval bbox segm --options "classwise=True"

6.使用8个GPU在COCO Test dev上测试mask R-CNN，并生成json文件以提交给官方评估服务器。

./tools/dist_test.sh configs/mask_rcnn_r50_fpn_1x_coco.py \checkpoints/mask_rcnn_r50_fpn_1x_20181010-069fa190.pth \8 --format-only --options "jsonfile_prefix=./mask_rcnn_test-dev_results"

您将获得两个json文件，mask_rcnn_test-dev_results.bbox.json 和mask_rcnn_test-dev_results.segm.json

7.使用8个GPU在Cityscapes上测试Mask R-CNN，并生成txt和png文件以提交给官方评估服务器。

./tools/dist_test.sh configs/cityscapes/mask_rcnn_r50_fpn_1x_cityscapes.py \checkpoints/mask_rcnn_r50_fpn_1x_cityscapes_20200227-afe51d5a.pth \8  --format-only --options "txtfile_prefix=./mask_rcnn_cityscapes_test_results"

生成的png和txt将在./mask_rcnn_cityscapes_test_results目录下

Image demo 图片演示

我们提供了一个演示脚本来测试单个图像

python demo/image_demo.py ${IMAGE_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} [--device ${GPU_ID}] [--score-thr ${SCORE_THR}] [--output-img ${OUTPUT_IMG}]

例如：

python demo/image_demo.py demo/demo.jpg configs/faster_rcnn_r50_fpn_1x_coco.py \checkpoints/faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth --device cpu --score-thr 0.9 --output-img 'output_test.jpg'

网络摄像头演示demo

我们提供了一个网络摄像头演示来说明结果。

python demo/webcam_demo.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--device ${GPU_ID}] [--camera-id ${CAMERA-ID}] [--score-thr ${SCORE_THR}]

例如：

python demo/webcam_demo.py configs/faster_rcnn_r50_fpn_1x_coco.py \checkpoints/faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth

用于测试图像的高级API

同步接口

下面是一个构建模型并测试给定图像的示例。

from mmdet.apis import init_detector, inference_detector
import mmcvconfig_file = 'configs/faster_rcnn_r50_fpn_1x_coco.py'
checkpoint_file = 'checkpoints/faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth'# build the model from a config file and a checkpoint file
model = init_detector(config_file, checkpoint_file, device='cuda:0')# test a single image and show the results
img = 'test.jpg'  # or img = mmcv.imread(img), which will only load it once
result = inference_detector(model, img)
# visualize the results in a new window
model.show_result(img, result)
# or save the visualization results to image files
model.show_result(img, result, out_file='result.jpg')# test a video and show the results
video = mmcv.VideoReader('video.mp4')
for frame in video:result = inference_detector(model, frame)model.show_result(frame, result, wait_time=1)

笔记本演示可以在这个链接里找到

异步接口-支持Python 3.7+

异步接口允许不阻塞GPU绑定的推理代码上的CPU，并为单线程应用程序提供更好的CPU/GPU利用率。推理可以在不同的输入数据样本之间或某些推理管道的不同模型之间同时进行。

请参阅tests/async_benchmark.py比较同步和异步接口的速度。

import asyncio
import torch
from mmdet.apis import init_detector, async_inference_detector
from mmdet.utils.contextmanagers import concurrentasync def main():config_file = 'configs/faster_rcnn_r50_fpn_1x_coco.py'checkpoint_file = 'checkpoints/faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth'device = 'cuda:0'model = init_detector(config_file, checkpoint=checkpoint_file, device=device)# queue is used for concurrent inference of multiple imagesstreamqueue = asyncio.Queue()# queue size defines concurrency levelstreamqueue_size = 3for _ in range(streamqueue_size):streamqueue.put_nowait(torch.cuda.Stream(device=device))# test a single image and show the resultsimg = 'test.jpg'  # or img = mmcv.imread(img), which will only load it onceasync with concurrent(streamqueue):result = await async_inference_detector(model, img)# visualize the results in a new windowmodel.show_result(img, result)# or save the visualization results to image filesmodel.show_result(img, result, out_file='result.jpg')asyncio.run(main())

训练一个模型

MMDetection实现了分布式训练和非分布式训练，分别使用MMDistributedDataParallel和MMDataParallel。

所有输出（日志文件和检查点）将保存到工作目录，该目录由配置文件中的work_dir指定。

默认情况下，我们在每个历元后在验证集上评估模型，您可以通过在训练配置中添加interval区间参数来更改评估间隔。

evaluation = dict(interval=12)  # This evaluate the model per 12 epoch.

重要提示：配置文件中的默认学习速率适用于8个gpu和2个img/gpu（批量大小=82=16）。根据线性缩放规则，如果使用不同的GPU或每个GPU的图像，则需要将学习速率设置为与批量大小成比例，例如，对于4 GPU2 img/GPU，lr=0.01，对于16 GPU*4 img/GPU，lr=0.08。

单GPU训练

python tools/train.py ${CONFIG_FILE} [optional arguments]

如果要在命令中指定工作目录，可以添加一个参数--work_dir ${YOUR_WORK_DIR}.

多GPU训练

./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]

可选参数包括：

--no-validate (not suggested):默认情况下，代码库将在训练期间的每k个时期（默认值为1，可以这样修改）执行评估。要禁用此行为，请使用--no-validate.
--work-dir ${WORK_DIR}: 覆盖配置文件中指定的工作目录。
--resume-from ${CHECKPOINT_FILE}: 从上一个检查点文件恢复。

resume-from和load-from之间的区别：resume-from加载模型权重和优化器状态，并且epoch也从指定的检查点继承。它通常用于恢复意外中断的训练过程。load from仅加载模型权重，训练时间从0开始。它通常用于微调。

多机器训练（我用不到，直接略过）

在一台机器上启动多个任务（好像也暂时用不到）

有用的工具

我们在tools/目录下提供了许多有用的工具

分析日志

您可以在给定训练日志文件的情况下绘制损耗/映射曲线。首先运行pip install seaborn来安装依赖项。

python tools/analyze_logs.py plot_curve [--keys ${KEYS}] [--title ${TITLE}] [--legend ${LEGEND}] [--backend ${BACKEND}] [--style ${STYLE}] [--out ${OUT_FILE}]

例如：
绘制一些运行的分类损失。

python tools/analyze_logs.py plot_curve log.json --keys loss_cls --legend loss_cls

绘制一些运行的分类和回归损失，并将图保存到pdf中。

python tools/analyze_logs.py plot_curve log.json --keys loss_cls loss_bbox --out losses.pdf

比较同一图中两次运行的bbox mAP 。

python tools/analyze_logs.py plot_curve log1.json log2.json --keys bbox_mAP --legend run1 run2

你也可以计算平均训练速度。

python tools/analyze_logs.py cal_train_time log.json [--include-outliers]

预期输出如下。

-----Analyze train time of work_dirs/some_exp/20190611_192040.log.json-----
slowest epoch 11, average time is 1.2024
fastest epoch 1, average time is 1.1909
time std over epochs is 0.0028
average iter time: 1.1959 s/iter

获取浮点和参数（实验）

我们提供了一个改编自flops计数器的脚本。Pytorch计算给定模型的浮点和参数。

python tools/get_flops.py ${CONFIG_FILE} [--shape ${INPUT_SHAPE}]

你会得到这样的结果。

==============================
Input shape: (3, 1280, 800)
Flops: 239.32 GMac
Params: 37.74 M
==============================

**注意：**这个工具仍然是实验性的，我们不能保证数字是正确的。您可以将结果用于简单的比较，但在技术报告或论文中采用之前，请仔细检查。
（1）触发器与输入形状有关，而参数与输入形状无关。默认输入形状为（1、3、1280、800）。（2）有些运算符不计入浮点运算，如GN和自定义运算符。您可以通过修改mmdet/utils/flops_counter.py来添加对新运算符的支持（3）两级检测器的失败取决于方案的数量。

发布一个模型

在将模型上载到AWS之前，您可能需要（1）将模型权重转换为CPU张量，（2）删除优化器状态，（3）计算检查点文件的哈希，并将哈希id附加到文件名。

python tools/publish_model.py ${INPUT_FILENAME} ${OUTPUT_FILENAME}

例如：

python tools/publish_model.py work_dirs/faster_rcnn/latest.pth faster_rcnn_r50_fpn_1x_20190801.pth

最终输出文件名faster_rcnn_r50_fpn_1x_20190801-{hash id}.pth.

测试检测器的鲁棒性
请参阅这里

转换为ONNX（实验），省略了

Tutorials

目前，我们为用户提供了四个教程，用于微调模型、添加新数据集、设计数据流程和添加新模块。我们还提供了关于配置系统的完整描述。