【本周学习】光学字符识别（OCR）

光学字符识别最早是指针对印刷体字符，采用光学的方式将纸质文档中的文字转换成为黑白点阵的图像文件，并通过识别软件将图像中的文字转换成文本格式，供文字处理软件进一步编辑加工的技术，现在已经拓展为通过深度学习等技术对图像中的字符内容进行检测，返回文本内容和文本所在图片中的位置信息，通常为四个边界的坐标（后一段解释为个人理解）。

原图（左）和识别结果可视化（右）

以本文所使用的是百度飞浆的PaddleOCR工具库，理由如下：

1.国内公司开发的项目，提供了大量的中文操作和学习文档，方便使用与学习，属于小白友好型项目；

2.可拓展性良好，接口均已预留可直接调用，提供了适用于各种部署场景的轻量级网络和开发模组，属于开发者友好型项目。

GitHub - PaddlePaddle/PaddleOCR: Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices) - GitHub - PaddlePaddle/PaddleOCR: Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)https://github.com/PaddlePaddle/PaddleOCR 可通过终端使用：cd到项目根目录，输入

#layout analysis + table recognition
paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/1.png --type=structure
#layout analysis
paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/1.png --type=structure --table=false --ocr=false
#table recognition
paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/table.jpg --type=structure --layout=false

paddleocr.py是主模块，--image_dir 是待识别图片地址 --type、--table与--layout共同控制识别模式选择。

操作界面展示

我通过调用接口编写了一个简单地识别模块predict.py

"""
1.img_path 为您想要识别的图片所在地址（注意：路径不能有中文！）
2.Exit?输入exit即可退出
"""
import os
import cv2
from paddleocr import PPStructure, draw_structure_result, save_structure_restable_engine = PPStructure(show_log=True)while True:save_folder = 'Output'img_path = input('\nPlease enter img path:')if img_path == '':img_path = 'Input/emotion/ocr6.jpg'print(f'Image path: {img_path}')img = cv2.imread(img_path)result = table_engine(img)save_name = os.path.basename(img_path).split('.')[0]save_structure_res(result, save_folder, save_name)for line in result:line.pop('img')print(line)from PIL import Imagefont_path = 'PaddleOCR/doc/fonts/simfang.ttf'  # PaddleOCR下提供字体包image = Image.open(img_path).convert('RGB')im_show = draw_structure_result(image, result, font_path=font_path)im_show = Image.fromarray(im_show)im_show.save(f'Output/{save_name}/result.jpg')print('\n------------------------------------------------It is show time !--------------------------------------------------------')for i in range(result[0]['res'].__len__()):ocr_res = result[0]['res'][i]['text']print(f'ocr result[{i + 1}]: {ocr_res}')exit = input(f'\nExit?')if exit == 'exit':break

输入

输入地址：Input/emotion/ocr13.jpg

原始输出

重点在result = table_engine(img)，输入图片地址img，返回结果result

1.result为长度为1的列表（list）变量

2.result[0]为长度为4的字典（dict）变量

3.result[0]['res']为长度为2（即识别到的字符块个数）的列表（list）变量

4.result[0]['res'][0]为长度为3的字典（dict）变量，包含了识别到的第一个代码块的所有信息

4.1 result[0]['res'][0]['text']：第一个字符块的文本识别结果

4.2 result[0]['res'][0]['confidence']：第一个字符块的文本识别置信度

4.3 result[0]['res'][0]['text_region']：第一个字符块的旋转矩形检测框四个边界点坐标

4.3.1 可通过result[0]['res'][0]['text_region'][0][0]和result[0]['res'][0]['text_region'][0][1]来调用检测框边界点坐标

终端输出

D:\DLSoftware\Anaconda3\envs\paddle\python.exe C:/Users/cleste/Desktop/PaddleOCR-release-2.5/predict.py
[2022/06/17 12:44:20] ppocr DEBUG: Namespace(Output='./Output', alpha=1.0, benchmark=False, beta=1.0, cls_batch_num=6, cls_image_shape='3, 48, 192', cls_model_dir=None, cls_thresh=0.9, cpu_threads=10, crop_res_save_dir='./Output', det=True, det_algorithm='DB', det_db_box_thresh=0.6, det_db_score_mode='fast', det_db_thresh=0.3, det_db_unclip_ratio=1.5, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_east_score_thresh=0.8, det_fce_box_type='poly', det_limit_side_len=960, det_limit_type='max', det_model_dir='C:\\Users\\cleste/.paddleocr/whl\\det\\ch\\ch_PP-OCRv3_det_infer', det_pse_box_thresh=0.85, det_pse_box_type='quad', det_pse_min_area=16, det_pse_scale=1, det_pse_thresh=0, det_sast_nms_thresh=0.2, det_sast_polygon=False, det_sast_score_thresh=0.5, draw_img_save_dir='./inference_results', drop_score=0.5, e2e_algorithm='PGNet', e2e_char_dict_path='./ppocr/utils/ic15_dict.txt', e2e_limit_side_len=768, e2e_limit_type='max', e2e_model_dir=None, e2e_pgnet_mode='fast', e2e_pgnet_score_thresh=0.5, e2e_pgnet_valid_set='totaltext', enable_mkldnn=False, fourier_degree=5, gpu_mem=500, help='==SUPPRESS==', image_dir=None, ir_optim=True, label_list=['0', '180'], lang='ch', layout=True, layout_label_map=None, layout_path_model='lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config', max_batch_size=10, max_text_length=25, min_subgraph_size=15, mode='structure', ocr=True, ocr_version='PP-OCRv3', precision='fp32', process_id=0, rec=True, rec_algorithm='SVTR_LCNet', rec_batch_num=6, rec_char_dict_path='C:\\Users\\cleste\\Desktop\\PaddleOCR-release-2.5\\ppocr\\utils\\ppocr_keys_v1.txt', rec_image_shape='3, 48, 320', rec_model_dir='C:\\Users\\cleste/.paddleocr/whl\\rec\\ch\\ch_PP-OCRv3_rec_infer', save_crop_res=False, save_log_path='./log_output/', scales=[8, 16, 32], show_log=True, structure_version='PP-STRUCTURE', table=True, table_char_dict_path='C:\\Users\\cleste\\Desktop\\PaddleOCR-release-2.5\\ppocr\\utils\\dict\\table_structure_dict.txt', table_max_len=488, table_model_dir='C:\\Users\\cleste/.paddleocr/whl\\table\\en_ppocr_mobile_v2.0_table_structure_infer', total_process_num=1, type='ocr', use_angle_cls=False, use_dilation=False, use_gpu=True, use_mp=False, use_onnx=False, use_pdserving=False, use_space_char=True, use_tensorrt=False, use_xpu=False, vis_font_path='./doc/fonts/simfang.ttf', warmup=False)Please enter img path:Input/emotion/ocr13.jpg
Image path: Input/emotion/ocr13.jpg
[2022/06/17 12:44:36] ppocr DEBUG: dt_boxes num : 5, elapse : 0.040994882583618164
[2022/06/17 12:44:36] ppocr DEBUG: rec_res num  : 5, elapse : 0.028000593185424805
[2022/06/17 12:44:36] ppocr DEBUG: dt_boxes num : 4, elapse : 0.02500295639038086
[2022/06/17 12:44:36] ppocr DEBUG: rec_res num  : 4, elapse : 0.016002893447875977
{'type': 'Figure', 'bbox': [12, 49, 466, 460], 'res': [{'text': "WhenI'mbored nobody", 'confidence': 0.9314340353012085, 'text_region': [[20.0, 54.0], [458.0, 59.0], [458.0, 93.0], [20.0, 87.0]]}, {'text': 'textme,butassoonas', 'confidence': 0.947204053401947, 'text_region': [[21.0, 102.0], [448.0, 102.0], [448.0, 133.0], [21.0, 133.0]]}, {'text': "I'm busy.....", 'confidence': 0.8454315662384033, 'text_region': [[21.0, 141.0], [236.0, 148.0], [235.0, 179.0], [20.0, 171.0]]}, {'text': 'still nobodytext me', 'confidence': 0.8983144164085388, 'text_region': [[20.0, 182.0], [391.0, 185.0], [391.0, 218.0], [20.0, 215.0]]}, {'text': 'Haha ;)', 'confidence': 0.9406118988990784, 'text_region': [[17.0, 413.0], [121.0, 418.0], [119.0, 451.0], [16.0, 446.0]]}]}
{'type': 'Title', 'bbox': [12, 53, 454, 396], 'res': [{'text': "WhenI'mborednobod", 'confidence': 0.9059685468673706, 'text_region': [[21.0, 58.0], [447.0, 61.0], [447.0, 90.0], [21.0, 87.0]]}, {'text': 'textme,butassoonas', 'confidence': 0.9522207975387573, 'text_region': [[21.0, 103.0], [448.0, 103.0], [448.0, 133.0], [21.0, 133.0]]}, {'text': "I'mbusy..", 'confidence': 0.9493505954742432, 'text_region': [[22.0, 140.0], [206.0, 148.0], [204.0, 178.0], [20.0, 171.0]]}, {'text': 'stillnobodytextme', 'confidence': 0.9422968626022339, 'text_region': [[20.0, 183.0], [388.0, 186.0], [388.0, 218.0], [20.0, 215.0]]}]}------------------------------------------------It is show time !--------------------------------------------------------
ocr result[1]: WhenI'mbored nobody
ocr result[2]: textme,butassoonas
ocr result[3]: I'm busy.....
ocr result[4]: still nobodytext me
ocr result[5]: Haha ;)Exit?exit进程已结束,退出代码0

文件夹输出

输出地址：Output/ocr13

输出的内容

（1）原始与预测结果对比图

（2）包含res的文本文件

【本周学习】光学字符识别（OCR）相关推荐

Python，OpenCV中的光学字符识别OCR文章汇总
Python,OpenCV中的光学字符识别OCR文章汇总 Python,OpenCV中的光学字符识别(OCR Optical Character Recognition) 使用Python,OpenC ...
银行卡号识别python_银行卡号识别基于 OpenCV 光学字符识别(OCR)
银行卡号识别基于 OpenCV 光学字符识别(OCR) 今天的博客文章是我们最近关于光学字符识别(OCR)和计算机视觉的系列的延续. 在之前的博客文章中,我们学习了如何安装Tesseract二进制文 ...
使用Tesseract+OpenCV+Python进行光学字符识别 (OCR)
介绍我们人类几乎每时每刻都在阅读文本.如果我们的机器或系统也能像我们一样阅读文本,那不是很好吗?但更大的问题是"我们如何让我们的机器阅读"?这就是光学字符识别 (OCR) 出现的 ...
光学字符识别 OCR
光学字符识别(OCR,Optical Character Recognition)是指对文本资料进行扫描,然后对图像文件进行分析处理,获取文字及版面信息的过程.OCR技术非常专业,一般多是印刷.打印行 ...
深度学习光学字符识别（OCR）
一.基本理论 1. 什么是OCR 1)定义 OCR (Optical Character Recognition,光学字符识别)是指对图片中的文字进行查找.提取.识别的一种技术,通过检测暗.亮的模式确 ...
用于食品标签的光学字符识别(OCR)视觉系统
点击上方"小白学视觉",选择加"星标"或"置顶" 重磅干货,第一时间送达在食品制造工厂,系统需要从传送器中剔除带有错误打印数据代码的包装食 ...
光学字符识别 OCR （Optical Character Recognition）是什么？
OCR (Optical Character Recognition,光学字符识别)是指电子设备(例如扫描仪或数码相机)检查纸上打印的字符,通过检测暗.亮的模式确定其形状,然后用字符识别方法将形状翻译 ...
Halcon学习---光学字符识别（OCR）
1. text_line_orientation text_line_orientation(Region, // 输入文本行所在区域Image, // 输入图像CharHeight, Orienta ...
Python，OpenCV中的光学字符识别（OCR Optical Character Recognition)
Python,OpenCV中的光学字符识别(OCR Optical Character Recognition 1. 什么是OCR? 2. 光学字符识别简史 3. 光学字符识别的应用 4. OSD 方 ...
OCR——光学字符识别
OCR (Optical Character Recognition,光学字符识别)是指电子设备(例如扫描仪或数码相机)检查纸上打印的字符,通过检测暗.亮的模式确定其形状,然后用字符识别方法将形状翻译 ...

【本周学习】光学字符识别（OCR）

【本周学习】光学字符识别（OCR）相关推荐

最新文章

热门文章