python 字符识别

语言模型设计 (Language Model Designing)

Optical Character Recognition is the conversion of 2-Dimensional text data into a form of machine-encoded text by the use of an electronic or mechanical device. The 2-Dimensional text data can be obtained from various sources such as scanned documents like PDF files, images with text data in formats such as .png or .jpeg, signposts like traffic posts, or any other images with any form of textual data. There is a wide range of interesting applications for optical character recognition.

øptical字符识别是2维的文本数据转换成通过使用电子或机械设备的机器编码文本的形式。 二维文本数据可以从各种来源获得,例如扫描的文档(如PDF文件),带有文本数据(如.png或.jpeg格式)的图像,路标(如交通路标)或任何其他形式的文本数据的图像。 光学字符识别有许多有趣的应用。

The first time I came across optical character recognition was in my school days where our answer scripts with multiple choice type questions (MCQ’s) would be analyzed by these devices. The data extraction from these answer scripts could be done and the answers would be marked according to the answer key. Uhm, albeit a majority of people used to always complain that the result they got was not the one they desired. This could have been due to the fault/ incompetence of the device or the students perhaps were slightly misleading. However, with modern technologies, optical character recognition is used in a variety of applications and they are much more advanced. The accuracy of these devices has vastly improved.

我第一次遇到光学字符识别是在上学的时候,这些设备会分析带有多种选择类型问题(MCQ)的答案脚本。 可以从这些答案脚本中提取数据,并根据答案键标记答案。 嗯,尽管大多数人过去总是抱怨他们得到的结果不是他们想要的结果。 这可能是由于设备的故障/功能不全造成的,或者是学生可能会误导他人。 但是,随着现代技术的发展,光学字符识别已被用于各种应用中,并且它们的技术要先进得多。 这些设备的准确性已大大提高。

In this article, we will cover the basics of optical character recognition. Then, we will proceed to install the pytesseract module which we will be using for performing the optical character recognition. Initially, the installation was quite annoying and troublesome for me when I was getting started. So, I will try to simplify the steps for the installation process. We will then understand the various functions in the pytesseract module using python. Finally, we will end it with a code snippet covering the use of the optical character recognition alongside the google text to speech module combined.

在本文中,我们将介绍光学字符识别的基础知识。 然后,我们将继续安装pytesseract模块,该模块将用于执行光学字符识别。 刚开始时,安装对我来说很烦人且麻烦。 因此,我将尝试简化安装过程的步骤。 然后,我们将使用python了解pytesseract模块中的各种功能。 最后,我们将以一个代码片段结尾,该代码片段涵盖将光学字符识别与Google文本到语音模块结合使用的功能。

Note: The final code will be a combined code using both the text to speech and character recognition. This is the second part of the language model designing series. If you have no clue about the gTTS module, I would highly recommend the viewers to check out the below link. In the next part of this series, we will try to combine speech translation and optical character recognition with deep learning. To view this series in the order of your preference you can click the link here.

注意:最终代码将是结合使用文本到语音和字符识别的代码。 这是语言模型设计系列的第二部分。 如果您对gTTS模块一无所知,我强烈建议观众查看以下链接。 在本系列的下一部分中,我们将尝试将语音翻译和光学字符识别与深度学习相结合。 要按您的喜好顺序查看此系列,可以单击此处的链接。

Photo by Dariusz Sankowski on Unsplash
Dariusz Sankowski在Unsplash上拍摄的照片

光学字符识别如何正确工作? (How does optical character recognition work exactly?)

MediumMedium

The optical character recognition process flow is demonstrated in the above block diagram. An API request is sent for the OCR operation to be performed. The input image is read and pre-processed accordingly. The text is formatted and extracted from the image. Using the trained dataset the image sent into the OCR engine is computed. The OCR engine tries to analyze the characters in the image and find the appropriate solutions. Once the engine finishes the analysis, it sends the data for another step of pre-processing and formatting to exclude any unnecessary items. Once This process is completed we will finally have the text data required. After this, an API response can be generated back to the user with the converted text data from the image.

在上述框图中说明了光学字符识别处理流程。 发送API请求以执行OCR操作。 读取输入图像并进行相应的预处理。 文本被格式化并从图像中提取。 使用训练后的数据集,可以计算出发送到OCR引擎的图像。 OCR引擎尝试分析图像中的字符并找到适当的解决方案。 引擎完成分析后,它将发送数据以进行下一步的预处理和格式化,以排除任何不必要的项目。 完成此过程后,我们将最终获得所需的文本数据。 此后,可以使用来自图像的转换后的文本数据将API响应返回给用户。

安装: (Installation:)

The installation might be a bit tricky. However, I will try to simplify the steps for installation so that you can get started as soon as possible. The first step is simple, you just install the pytesseract module using the pip command. Type the following command in the command prompt terminal/virtual environment —

安装可能有些棘手。 但是,我将尝试简化安装步骤,以便您尽快上手。 第一步很简单,您只需使用pip命令安装pytesseract模块。 在命令提示符终端/虚拟环境中键入以下命令-

pip install pytesseract

We have successfully installed the pytesseract module but when you try to run the code right away you will receive an error message which states that the tesseract module is not installed in your system. For this step to be completed, visit this site. This is the official site for windows tesseract. The installer for Windows for Tesseract 3.05, Tesseract 4, and development version 5.00 Alpha are available from Tesseract at UB Mannheim. These include the training tools. Both 32-bit and 64-bit installers are available.

我们已经成功安装了pytesseract模块,但是当您尝试立即运行代码时,您将收到一条错误消息,指出您的系统中未安装tesseract模块。 要完成此步骤,请访问此网站 。 这是Windows tesseract的官方网站。 可从UB Mannheim的 Tesseract获得Tesseract 3.05,Tesseract 4和Windows开发版本5.00 Alpha的Windows安装程序。 这些包括培训工具。 32位和64位安装程序均可用。

Choose the Installation of your preference and install it accordingly. You can add the pytesseract module to your path or just use it directly. I hope this solves most of the issues for the installation process. If you have any other queries feel free to let me know.

选择首选项的“安装”并进行相应的安装。 您可以将pytesseract模块添加到您的路径中,或直接使用它。 我希望这可以解决安装过程中的大多数问题。 如果您有任何其他疑问,请随时告诉我。

了解pytesseract模块: (Understanding the pytesseract module:)

Python-tesseract is a wrapper for Google’s Tesseract-OCR Engine. It is also useful as a stand-alone invocation script to tesseract, as it can read all image types supported by the Pillow and Leptonica imaging libraries, including jpeg, png, gif, bmp, tiff, and others. Additionally, if used as a script, Python-tesseract will print the recognized text instead of writing it to a file.

Python-tesseract是Google Tesseract-OCR引擎的包装。 它也可以用作tesseract的独立调用脚本,因为它可以读取Pillow和Leptonica图像库支持的所有图像类型,包括jpeg,png,gif,bmp,tiff等。 此外,如果将Python-tesseract用作脚本,它将打印识别的文本,而不是将其写入文件。

To understand this more intuitively let us look at the following simple code block —

为了更直观地理解这一点,让我们看下面的简单代码块-

# Importing the OCR library
import pytesseract# Specifying the path
pytesseract.pytesseract.tesseract_cmd = r'C:/Program Files/Tesseract-OCR/tesseract.exe'# Reading the image
image = cv2.imread('1.png')# Extraction of text from image
text = pytesseract.image_to_string(image)

We import the pytesseract module and specify the path. We then read the image using the cv2 module. Finally we extract these images and return the text data. The image_to_string Returns the result of a Tesseract OCR run on the image to string. For more information on the tesseract OCR and its functions like image_to_string visit here.

我们导入pytesseract模块并指定路径。 然后,我们使用cv2模块读取图像。 最后,我们提取这些图像并返回文本数据。 image_to_string返回在图像上运行的Tesseract OCR的结果,以字符串形式显示。 有关tesseract OCR及其功能(如image_to_string)的更多信息,请访问此处 。

码: (Code:)

This section will contain the final code snippet combining both text to speech and optical character recognition. We will be using the recently installed pytesseract module alongside modules like gTTS and PIL. PIL stands for python imaging library which will be used for loading our images. The open-cv module cv2 can also be used for reading images. Let us look at how the entire code works in 3 parts.

本部分将包含最终的代码段,将文本到语音以及光学字符识别结合在一起。 我们将使用最近安装的pytesseract模块以及gTTS和PIL等模块。 PIL代表python映像库,该库将用于加载我们的图像。 open-cv模块cv2也可以用于读取图像。 让我们看一下整个代码如何由3部分组成。

1. Reading the Image —

1.阅读图像—

#Importing the libraries
import cv2
import pytesseract
from PIL import Image# Specifying the path
pytesseract.pytesseract.tesseract_cmd = r'C:/Program Files/Tesseract-OCR/tesseract.exe'# Reading the image
image = cv2.imread('1.png')# Extraction of text from image
text = pytesseract.image_to_string(image)# Printing the text
print(text)

In this code block, we are importing the required libraries and specifying the path to the Tesseract location. We will then proceed to read the image with the cv2 module. You can also use the PIL library for this. The command would be “Image.open()”. After this step, we will use the OCR library for the conversion of image into text data and print the required output.

在此代码块中,我们将导入所需的库并指定Tesseract位置的路径。 然后,我们将继续使用cv2模块读取图像。 您也可以为此使用PIL库。 该命令将是“ Image.open()”。 完成此步骤后,我们将使用OCR库将图像转换为文本数据并打印所需的输出。

2. Formatting the data —

2.格式化数据—

# Create the voice_text variable to store the data.voice_text = ""# Pre-processing the datafor i in text.split():voice_text += i + ' 'voice_text = voice_text[:-1]
voice_text

In this next code block, we are formatting the text data to get it one single line. We are basically performing pre-processing on the already obtained text data before passing it into the gTTS module for the process of speech translation.

在下一个代码块中,我们将格式化文本数据以使其一行。 在将其传递到gTTS模块进行语音翻译之前,我们基本上是对已经获得的文本数据进行预处理。

3. Converting into Speech —

3.转换成语音-

from gtts import gTTS
from playsound import playsoundtts = gTTS(voice_text)
tts.save("test.mp3")
playsound("test.mp3")

Finally, we will import the Google text-to-speech module and convert the text data in the form of an audio message. This is extremely useful for hearing a vocal audio for text data in PDF’s as well as images. If you are having any confusion related to the gTTS module, then refer to one of my previous articles to understand this concept better.

最后,我们将导入Google文本语音转换模块,并以音频消息的形式转换文本数据。 这对于收听PDF和图像中的文本数据的人声音频非常有用。 如果您对gTTS模块有任何困惑,请参阅我以前的一篇文章,以更好地理解此概念。

Photo by Alice Donovan Rouse on Unsplash
照片由Alice Donovan Rouse 摄于Unsplash

结论: (Conclusion:)

We have covered some of the concepts of optical character recognition with an intuitive understanding of how exactly OCR process flow works. I hope the installation procedure to get started with OCR technology with python was simplified and all of you could achieve the desired results. We understood a few functions of the pytesseract module and finally wrote a code combining both the gTTS and pytesseract module.

我们通过对光学字符识别(OCR)处理流程的工作原理的直观了解,涵盖了光学字符识别的一些概念。 我希望简化使用OCR技术和python的安装过程,并且所有人都能达到预期的效果。 我们了解了pytesseract模块的一些功能,并最终编写了结合gTTS和pytesseract模块的代码。

In the next part of this topic language model designing, we will look into how we can use deep learning technologies as well as OCR and TTS (text-to-speech) to develop a cool project.

在本主题语言模型设计的下一部分,我们将研究如何使用深度学习技术以及OCR和TTS(文本到语音)来开发一个很棒的项目。

I would highly recommend all of you to check out the below references for grasping the concepts and learning them better. Let me know if you have any queries and have a wonderful day!

我强烈建议大家阅读以下参考资料,以掌握这些概念并更好地学习它们。 让我知道您是否有任何疑问,祝您有美好的一天!

翻译自: https://towardsdatascience.com/getting-started-with-optical-character-recognition-using-python-e4a9851ddfab

python 字符识别


http://www.taodudu.cc/news/show-5019901.html

相关文章:

  • 光学字符识别的 5 个最佳免费数据集
  • 使用 Pytesseract 进行光学字符识别
  • 【OCR技术系列一】光学字符识别技术介绍
  • 【Python】利用高斯朴素贝叶斯模型实现光学字符识别
  • A Survey on Optical Character Recognition System 光学字符识别系统综述
  • Python,OpenCV中的光学字符识别(OCR Optical Character Recognition)
  • Halcon学习---光学字符识别(OCR)
  • TrOCR:基于Transformer的新一代光学字符识别
  • python集成Tesseract-OCR实现光学字符识别
  • Python数据可视化:SVM算法实现光学字符识别(实战篇—3)
  • OCR光学字符识别
  • Python应用:矩阵的乘法—乘积 点乘
  • docker 打包python 应用
  • 山东省一流本科课程“Python应用开发”课程中的思政元素
  • 【Python】如何发布编写好的Python应用程序之Python Release for Windows(附踩坑经验)
  • python几个应用实例
  • Python有哪些应用?学完Python能做什么工作?
  • vs 提示图标的含义
  • h5、select下拉框右边加图标,深度美化页面增进用户体验
  • c#窗体换图标
  • iconfont添加新图标_老项目中的iconfont字体图标添加新的图标
  • ionic4设置图标的大小
  • AWS架构图 - 包含2019个新图标和50多个示例
  • 【已解决】Vue3+Element-plus中input输入框中图标不显示
  • android去除多用户功能并且隐藏状态栏去掉机主图标
  • android10 隐藏SystemUI锁屏下的多用户图标
  • 免费素材:强烈推荐一套完整的免费矢量用户图标
  • 33-38-Elasticsearch-部分相关概念-01
  • Elasticsearch入门(一)基本介绍与安装
  • Elastic Search学习笔记

python 字符识别_使用python进行光学字符识别入门相关推荐

  1. 第一章 第一节:Python基础_认识Python

    Python基础入门(全套保姆级教程) 第一章 第一节:Python基础_认识Python 1. 什么是编程 通俗易懂,编程就是用代码编写程序,编写程序有很多种办法,像c语言,javaPython语言 ...

  2. 日本python教材_自学python:完整入门python书单!

    小伙伴总在询问Python的书,哎呀,动力所致,书单来了.9本,涵盖范围蛮大的.Python热持续中,入门计算机首选语言... 1.<父与子的编程之旅> 关注威信工宗号:程序员大牛,即可领 ...

  3. java python算法_用Python,Java和C ++示例解释的排序算法

    java python算法 什么是排序算法? (What is a Sorting Algorithm?) Sorting algorithms are a set of instructions t ...

  4. excel python插件_利用 Python 插件 xlwings 读写 Excel

    Python 通过 xlwings 读取 Excel 数据 去年底公司让我做设备管理,多次委婉拒绝,最终还是做了.其实我比较喜欢技术.做管理后发现现场没有停机率统计,而原始数据有,每次要自己在Exce ...

  5. 网络安全用python吗_使用Python进行网络安全渗透——密码攻击测试器

    相关文章: 本篇将会涉及: HTTP 基本认证 对HTTP Basic认证进行密码暴力攻击测试 什么是HTTP 基本认证 HTTP基本认证(HTTP Basic Authentication)是HTT ...

  6. 光学字符识别 android,基于Android的光学字符识别研究与实现

    摘要: 随着Android手机平台的普及和信息化进程的不断推进,利用手持设备高效地将文档信息录入已经成为一个亟需解决的问题,其关键技术光学字符识别在手机上的应用日益受到人们重视.本文所做工作正是针对这 ...

  7. 动态照片墙 python 实现_利用python生成照片墙的示例代码

    这篇文章主要介绍了利用python生成照片墙的示例代码,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友们下面随着小编来一起学习学习吧 PIL(Python Im ...

  8. python字符串_(Python基础教程之七)Python字符串操作

    Python基础教程 在SublimeEditor中配置Python环境 Python代码中添加注释 Python中的变量的使用 Python中的数据类型 Python中的关键字 Python字符串操 ...

  9. 类的继承python事例_【Python五篇慢慢弹(5)】类的继承案例解析,python相关知识延伸...

    作者:白宁超 2016年10月10日22:36:57 摘要:继一文之后,笔者又将python官方文档认真学习下.官方给出的pythondoc入门资料包含了基本要点.本文是对文档常用核心要点进行梳理,简 ...

最新文章

  1. 常用memcached命令详解
  2. FastReport 导出pdf时中文乱码的解决办法
  3. 构建Chua 混沌电路 - 基本测试
  4. Visual Stdio VS 错误 error : 0xC00000FD: Stack overflow. 更改堆栈空间解决栈溢出问题
  5. 五大存储模型关系模型、键值存储、文档存储、列式存储、图形数据
  6. 计算机桌面颜色如何设置标准,电脑调整桌面颜色设置_电脑桌面颜色设置
  7. 研发的那些事3--接口之本
  8. List类系列(二):List类的list()方法
  9. read write spinlock
  10. Kendo UI开发教程:Kendo UI模板概述
  11. HTML5 兼容IE浏览器
  12. Qt Widgets——子区域和子窗口
  13. SQL将A表的现有数据添加到B表,通过A表现有数据更新B表
  14. 今天开始进入cdsn~~~
  15. linux下hadoop 环境搭建
  16. 锐捷S12010交换机配置端口镜像
  17. 浅谈几个数学问题的认识
  18. 网络工程师和网络运维工程师,有什么区别?
  19. MobiCom2019几篇有意思的文章
  20. 【电子学会】2022年12月图形化一级 -- 和平使者

热门文章

  1. 阿里终结裁员危机!坚决不拿 10 万阿里人祭天!
  2. HTML:给自己设计一个简单的专属网页音乐播放器
  3. 六西格玛的六大主题六步法
  4. Google Map开发系列(六)——谷歌地图坐标定位
  5. 计算机二级试题怎么下载
  6. ajax请求 session过期跳转首页的两种处理方式
  7. LIS系统源码 实验室信息管理系统源码
  8. 澳大利亚访问学者申请流程总结
  9. Chip天线(WiFi/蓝牙陶瓷芯片天线) 选型
  10. Mysql-性能监控