Windows下使用Tesseract进行OCR文字识别

Tesseract最初由惠普实验室支持，用于电子版文字识别，1996年被移植到Windows上，1998年进行了C++化，在2005年Tesseract由惠普公司宣布开源。2006年到现在，由Google公司维护开发。

Tesseract可以处理很多自然语音，英语、葡萄牙语系、意第绪语等。截止到2015年为止支持超过100种书面语言，并且可以通过训练学习轻松掌握其他语言。

最初Tesseract是用C语言写的，在1998年改用C++。Tesseract是无GUI交互的，可以通过命令后被执行。但是有一些其他软件提供GUI对Tesseract进行了封装。

安装包：

pip install tesseract
pip install tesseract-ocr
pip install pytesseract

Windows本地tesseract程序安装：

通过在Stack Overflow上查询，去https://github.com/UB-Mannheim/tesseract/wiki；

根据自己笔记本的情况下载如下的exe文件。

安装之后并配置如下信息：

这里我们把

tesseract-ocr-w64-setup-v5.0.0-alpha.20210506.exe (64 bit) resp.

安装并存放在了C:\\Program Files\\Tesseract-OCR\\目录，

并摄者如下引导信息

pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'

python终端运行

python tesseract.py --image apple_support.png --min-conf 0

jupyter内运行：

%run tesseract.py --image apple_support.png --min-conf 0

代码：


# import the necessary packages
from pytesseract import Output
import pytesseract
import argparse
import cv2# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,help="path to input image to be OCR'd")
ap.add_argument("-c", "--min-conf", type=int, default=0,help="mininum confidence value to filter weak text detection")
args = vars(ap.parse_args())# load the input image, convert it from BGR to RGB channel ordering,
# and use Tesseract to localize each area of text in the input image
image = cv2.imread(args["image"])
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
results = pytesseract.image_to_data(rgb, output_type=Output.DICT)# loop over each of the individual text localizations
for i in range(0, len(results["text"])):# extract the bounding box coordinates of the text region from# the current resultx = results["left"][i]y = results["top"][i]w = results["width"][i]h = results["height"][i]# extract the OCR text itself along with the confidence of the# text localizationtext = results["text"][i]conf = int(float(results["conf"][i]))# filter out weak confidence text localizations
if conf > args["min_conf"]:# display the confidence and text to our terminalprint("Confidence: {}".format(conf))print("Text: {}".format(text))print("")# strip out non-ASCII text so we can draw the text on the image# using OpenCV, then draw a bounding box around the text along# with the text itselftext = "".join([c if ord(c) < 128 else "" for c in text]).strip()cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)cv2.putText(image, text, (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX,1.2, (0, 0, 255), 3)# show the output imagecv2.imshow("Image", image)
cv2.waitKey(0)

参考：UB-Mannheim/tesseract

参考：Tesseract

参考：Pytesseract : “TesseractNotFound Error: tesseract is not installed or it's not in your path”, how do I fix this?

参考：ValueError: invalid literal for int() with base 10: ''

参考：Tesseract OCR: Text localization and detection

参考：OCR：使用开源框架Tesseract做文字识别（安装）

参考：Installing Tesseract for OCR

参考：

Windows下使用Tesseract进行OCR文字识别相关推荐

vue Tesseract的 ocr 文字识别
npm结果页 https://www.npmjs.com/package/tesseract.js tesseract官网地址 https://tesseract.projectnaptha.com/ ...
ABBYY FineReader2023最新版本OCR文字识别软件PDF
很多小伙伴在下载OCR文字识别软件时,会习惯性去找绿色.那么到底什么是绿色的软呢?其实绿色的软见,都是通过非法的手段,破除的安全权限制作而成的.因此,使用这些绿色工具存在很多安全的问题. 为了自身设备 ...
Tesseract Ocr文字识别实战（新版本，扩展手写文字识别）
目录 1.Tesseract Ocr文字识别 1.1 运行环境 1.2 python模块 1.3 配置tesseract运行文件 1.4 代码识别 2. 手写汉字识别 2.1 下载库 2.2 代码 1 ...
淡谈自然场景下小样本OCR文字识别
淡谈自然场景下小样本OCR文字识别 1. 环境准备实验中使用了centos7.6,intel core i4710,gtx980M(老机器重装) 安装的时候要注意,linux内核版本和实际版本要一致 ...
图片识别技巧，OCR文字识别软件了解下
相信大家都遇到过图片文字不能编辑不能复制的问题,在之前你遇到这类的问题是怎么样解决的?很多朋友说是通过自己手动输入文字将图片文字提取出来.并且遇到特别多的文字的时候,表示特别累,还耗费大量的时间.那么 ...
mac 文字识别软件ocr_mac超快速ocr文字识别软件 mac上超好用的文字识别软件推荐...
OCR文字识别软件是在日常的生活和工作中十分常用的一款软件.而当下使用mac系统的用户也越来越多了,相比较于windows不同的是,mac上能够使用的ocr用具基本上会比较难找.这里就为大家推荐几款在 ...
用paddleocr识别汉字_基于Paddle的截图OCR文字识别的实现
一款截图识别文字的OCR工具主要涉及2个环境:截图 OCR识别前要 OCR的应用场景根据OCR的应用场景而言,我们可以大致分成识别特定场景下的专用OCR以及识别多种场景下的通用OCR.就前者而言, ...
OCR文字识别笔记总结
OCR的全称是Optical Character Recognition,光学字符识别技术.目前应用于各个领域方向,甚至这些应用就在我们的身边,比如身份证的识别,交通路牌的识别,车牌的自动识别等等.本 ...
用paddleocr识别汉字_使用飞桨一步步实现多语言OCR文字识别软件
目录急速版: 做了一个OCR文字识别工具. 好了,看到这里就行了,使用方法上面链接里有. ----------------------------------------- 如果您是普通用户,可以直 ...

Windows下使用Tesseract进行OCR文字识别

Windows下使用Tesseract进行OCR文字识别

Windows下使用Tesseract进行OCR文字识别相关推荐

最新文章

热门文章