CRAFT字符检测算法和SynthText合成文本数据集

CRAFT：Character Region Awareness for Text Detection，Naver公司的CLOVA研究院，CVPR2019

其他文本检测模型，参考
CRAFT算法：论文阅读 Character Region Awareness for Text Detection(CRAFT)
文字识别/文本检测数据集

复现工程：

工程：CRAFT-Reimplementation
数据集：Syndata - SynthText，数据集比较大，大约41G
所使用的PyTorch版本：Which version of python and CUDA is used ?

环境：

pip3 install --user torch==1.7.0 torchvision==0.8.1 seaborn==0.11.0 tqdm==4.56.0 Polygon3==3.0.9.1

下载PyTorch的CUDA10版本

conda install pytorch==1.0.1 torchvision==0.2.2 cudatoolkit=10.0

PyTorch的历史版本，找到cu100版本，下载torch-1.0.1-cp37-cp37m-linux_x86_64.whl，torchvision正常pip安装即可。

即：

pip3 install --user torch-1.0.1-cp37-cp37m-linux_x86_64.whl
pip3 install --user torchvision==0.2.2

在CRAFT-Reimplementation工程中，修改对应的数据和模型路径之后

即可训练。

中文数据集

CASIA Online and Offline Chinese Handwriting Databases
ICDAR 2019 Robust Reading Challenge on Multi-lingual scene text detection and recognition

ICDAR 2019样本Chinese部分（下载需要注册），非常具有挑战：

CRAFT-pytorch

论文作者的工程：

CRAFT-pytorch：https://github.com/clovaai/CRAFT-pytorch

测试模型：

下载地址：https://drive.google.com/file/d/1Jk4eGD7crsqCCg9C9VjCLkMN3ze8kutZ/view

测试逻辑

python test.py --trained_model=mydata/models/craft_mlt_25k.pth --test_folder=mydata/imgs/ --cuda=False

测试过程中，一些没有见过的图像，效果较差，需要fine-tune训练：

测试模型General：

特征图：

检测图：

CRAFT-Re-reimplementation

弱监督学习的模型：

Syndata.pth，已训练好的合成模型，用于监督数据。
vgg16_bn-6c64b313.pth，需要fine-tune的模型。
ICDAR 2015，需要弱监督的数据，只有word，没有character。

模型下载链接：craft_models.zip

替换相应位置即可。

工程参考：CRAFT-Re-reimplementation

trainSyndata.py，训练合成数据，模型只需要vgg16_bn-6c64b313.pth
trainic15data.py，弱监督训练词，模型需要Syndata.pth和vgg16_bn-6c64b313.pth

环境2080x4：

修改自定义的数据输入：data_loader.py

继承craft_base_dataset类，自定义数据集加载类，主要覆写load_image_gt_and_confidencemask函数，匹配输出格式如下：

image: 例如(778, 1114, 3)，图像
character_bboxes: list，15个nx4x2的数组，多个文字框；
words: list，15个str
ones图像尺寸数组
confidences：list，15个1.0

数据如下：

其他细节：

self.charbox是arr的box数组，需要合并和拼接，同时格式是np.float64格式，参考

line_dict[line_num].append([word, bbox2rec(bbox)])  # box转换成rec
bbox_arr = np.array(bbox_list)  # rec转换为数组
sample_boxes_arr = np.concatenate(sample_boxes, axis=0).astype(np.float64)  # 数组合并到一起

ICDAR 2015

ICDAR 2015数据集是ICDAR于2015年举办的场景文本检测竞赛中使用的官方数据集，包含了1000张训练图和500张测试图。

下载地址：ICDAR_2015.zip

数据集比较小，比较适合训练弱监督学习，参考trainic15data.py。

样本都是困难样本：

SynthText

SynthText数据来源牛津大学（University of Oxford）的视觉几何组（Visual Geometry Group）。

Synthetic Data for Text Localisation in Natural Images，CVPR 2016

样本：

样本数量	词数	字符数	尺寸
85w(858,750)	726w(7,266,866)	2897w(28,971,487)	41G

下载链接：SynthText.zip，慢慢下载即可。

文本处理逻辑：

img_texts = gt_data['txt'][0]
sample_img_text = img_texts[0]
words = [re.split(' \n|\n |\n| ', t.strip()) for t in sample_img_text]
words = list(itertools.chain(*words))  # 二维数组转换为1维

词样本的效果（8/ballet_106_0.jpg）：

文字：

words: ['Lines:', 'I', 'lost', 'Kevin', 'will', 'line', 'and', 'and', 'the', '(and', 'the', 'out', 'you', "don't", 'pkg']

数据集的使用脚本，参考myutils：

#!/usr/bin/env python
# -- coding: utf-8 --
"""
Copyright (c) 2021. All rights reserved.
Created by C. L. Wang on 25.5.21
"""
import itertools
import os
import reimport cv2
import scipy.io as sci_iofrom myutils.cv_utils import show_img_bgr, draw_rec_list
from root_dir import DATA_DIRclass SynthTextChecker(object):"""检测数据集的格式"""def __init__(self):passdef check_sample(self):print('[Info] check sample')synth_text_folder = os.path.join(DATA_DIR, 'SynthText')data_path = os.path.join(synth_text_folder, 'gt.mat')gt_data = sci_io.loadmat(data_path)print('[Info] data: {}'.format(type(gt_data)))print('[Info] data: {}'.format(gt_data.keys()))char_boxes = gt_data['charBB'][0]word_boxes = gt_data['wordBB'][0]img_names = gt_data['imnames'][0]img_texts = gt_data['txt'][0]print('[Info] char_box {}'.format(char_boxes.shape))print('[Info] image {}'.format(len(img_names)))print('[Info] img_txt {}'.format(len(img_texts)))sample_char_box = char_boxes[0]sample_char_box = sample_char_box.transpose((2, 1, 0))  # 转换位置sample_word_box = word_boxes[0]sample_word_box = sample_word_box.transpose((2, 1, 0))sample_img_name = img_names[0]sample_img_text = img_texts[0]print('[Info] sample_char_box: {}'.format(sample_char_box.shape))print('[Info] sample_word_box: {}'.format(sample_word_box.shape))print('[Info] sample_img_name: {}'.format(sample_img_name))print('[Info] sample_img_text: {}'.format(sample_img_text))rec_list = []for word_box in sample_word_box:rec_list.append(word_box.astype(int).tolist())words = [re.split(' \n|\n |\n| ', t.strip()) for t in sample_img_text]words = list(itertools.chain(*words))  # 二维数组转换为1维print('[Info] num of words: {}'.format(len(words)))print('[Info] words: {}'.format(words))img_path = os.path.join(synth_text_folder, sample_img_name[0])img_bgr = cv2.imread(img_path)show_img_bgr(img_bgr)draw_rec_list(img_bgr, rec_list, is_show=True)print('[Info] 处理完成: {}'.format(data_path))passdef main():stc = SynthTextChecker()stc.check_sample()if __name__ == '__main__':main()