DL之CNN:利用CNN(keras, CTC loss)算法实现OCR光学字符识别

目录

输出结果

实现的全部代码


输出结果

更新……

实现的全部代码

部分代码源自:GitHub
https://raw.githubusercontent.com/fchollet/keras/master/examples/image_ocr.py

# -*- coding: utf-8 -*-#image_ocr.py'''
# Optical character recognition
This example uses a convolutional stack followed by a recurrent stack
and a CTC logloss function to perform optical character recognition
of generated text images. I have no evidence of whether it actually
learns general shapes of text, or just is able to recognize all
the different fonts thrown at it...the purpose is more to demonstrate CTC
inside of Keras.  Note that the font list may need to be updated
for the particular OS in use.This starts off with 4 letter words.  For the first 12 epochs, the
difficulty is gradually increased using the TextImageGenerator class
which is both a generator class for test/train data and a Keras
callback class. After 20 epochs, longer sequences are thrown at it
by recompiling the model to handle a wider image and rebuilding
the word list to include two words separated by a space.The table below shows normalized edit distance values. Theano uses
a slightly different CTC implementation, hence the different results.Epoch |   TF   |   TH
-----:|-------:|-------:10|  0.027 | 0.06415|  0.038 | 0.03520|  0.043 | 0.04525|  0.014 | 0.019This requires ```cairo``` and ```editdistance``` packages:
```python
pip install cairocffi
pip install editdistance
```Created by Mike Henry
https://github.com/mbhenry/
'''
import os
import itertools
import codecs
import re
import datetime
import cairocffi as cairo
import editdistance
import numpy as np
from scipy import ndimage
import pylab
from keras import backend as K
from keras.layers.convolutional import Conv2D, MaxPooling2D
from keras.layers import Input, Dense, Activation
from keras.layers import Reshape, Lambda
from keras.layers.merge import add, concatenate
from keras.models import Model
from keras.layers.recurrent import GRU
from keras.optimizers import SGD
from keras.utils.data_utils import get_file
from keras.preprocessing import image
import keras.callbacksOUTPUT_DIR = 'image_ocr'# character classes and matching regex filter
regex = r'^[a-z ]+$'
alphabet = u'abcdefghijklmnopqrstuvwxyz 'np.random.seed(55)# this creates larger "blotches" of noise which look
# more realistic than just adding gaussian noise
# assumes greyscale with pixels ranging from 0 to 1def speckle(img):severity = np.random.uniform(0, 0.6)blur = ndimage.gaussian_filter(np.random.randn(*img.shape) * severity, 1)img_speck = (img + blur)img_speck[img_speck > 1] = 1img_speck[img_speck <= 0] = 0return img_speck# paints the string in a random location the bounding box
# also uses a random font, a slight random rotation,
# and a random amount of speckle noisedef paint_text(text, w, h, rotate=False, ud=False, multi_fonts=False):surface = cairo.ImageSurface(cairo.FORMAT_RGB24, w, h)with cairo.Context(surface) as context:context.set_source_rgb(1, 1, 1)  # Whitecontext.paint()# this font list works in CentOS 7if multi_fonts:fonts = ['Century Schoolbook', 'Courier', 'STIX','URW Chancery L', 'FreeMono']context.select_font_face(np.random.choice(fonts),cairo.FONT_SLANT_NORMAL,np.random.choice([cairo.FONT_WEIGHT_BOLD, cairo.FONT_WEIGHT_NORMAL]))else:context.select_font_face('Courier',cairo.FONT_SLANT_NORMAL,cairo.FONT_WEIGHT_BOLD)context.set_font_size(25)box = context.text_extents(text)border_w_h = (4, 4)if box[2] > (w - 2 * border_w_h[1]) or box[3] > (h - 2 * border_w_h[0]):raise IOError(('Could not fit string into image.''Max char count is too large for given image width.'))# teach the RNN translational invariance by# fitting text box randomly on canvas, with some room to rotatemax_shift_x = w - box[2] - border_w_h[0]max_shift_y = h - box[3] - border_w_h[1]top_left_x = np.random.randint(0, int(max_shift_x))if ud:top_left_y = np.random.randint(0, int(max_shift_y))else:top_left_y = h // 2context.move_to(top_left_x - int(box[0]), top_left_y - int(box[1]))context.set_source_rgb(0, 0, 0)context.show_text(text)buf = surface.get_data()a = np.frombuffer(buf, np.uint8)a.shape = (h, w, 4)a = a[:, :, 0]  # grab single channela = a.astype(np.float32) / 255a = np.expand_dims(a, 0)if rotate:a = image.random_rotation(a, 3 * (w - top_left_x) / w + 1)a = speckle(a)return adef shuffle_mats_or_lists(matrix_list, stop_ind=None):ret = []assert all([len(i) == len(matrix_list[0]) for i in matrix_list])len_val = len(matrix_list[0])if stop_ind is None:stop_ind = len_valassert stop_ind <= len_vala = list(range(stop_ind))np.random.shuffle(a)a += list(range(stop_ind, len_val))for mat in matrix_list:if isinstance(mat, np.ndarray):ret.append(mat[a])elif isinstance(mat, list):ret.append([mat[i] for i in a])else:raise TypeError('`shuffle_mats_or_lists` only supports ''numpy.array and list objects.')return ret# Translation of characters to unique integer values
def text_to_labels(text):ret = []for char in text:ret.append(alphabet.find(char))return ret# Reverse translation of numerical classes back to characters
def labels_to_text(labels):ret = []for c in labels:if c == len(alphabet):  # CTC Blankret.append("")else:ret.append(alphabet[c])return "".join(ret)# only a-z and space..probably not to difficult
# to expand to uppercase and symbolsdef is_valid_str(in_str):search = re.compile(regex, re.UNICODE).searchreturn bool(search(in_str))# Uses generator functions to supply train/test with
# data. Image renderings and text are created on the fly
# each time with random perturbationsclass TextImageGenerator(keras.callbacks.Callback):def __init__(self, monogram_file, bigram_file, minibatch_size,img_w, img_h, downsample_factor, val_split,absolute_max_string_len=16):self.minibatch_size = minibatch_sizeself.img_w = img_wself.img_h = img_hself.monogram_file = monogram_fileself.bigram_file = bigram_fileself.downsample_factor = downsample_factorself.val_split = val_splitself.blank_label = self.get_output_size() - 1self.absolute_max_string_len = absolute_max_string_lendef get_output_size(self):return len(alphabet) + 1# num_words can be independent of the epoch size due to the use of generators# as max_string_len grows, num_words can growdef build_word_list(self, num_words, max_string_len=None, mono_fraction=0.5):assert max_string_len <= self.absolute_max_string_lenassert num_words % self.minibatch_size == 0assert (self.val_split * num_words) % self.minibatch_size == 0self.num_words = num_wordsself.string_list = [''] * self.num_wordstmp_string_list = []self.max_string_len = max_string_lenself.Y_data = np.ones([self.num_words, self.absolute_max_string_len]) * -1self.X_text = []self.Y_len = [0] * self.num_wordsdef _is_length_of_word_valid(word):return (max_string_len == -1 ormax_string_len is None orlen(word) <= max_string_len)# monogram file is sorted by frequency in english speechwith codecs.open(self.monogram_file, mode='r', encoding='utf-8') as f:for line in f:if len(tmp_string_list) == int(self.num_words * mono_fraction):breakword = line.rstrip()if _is_length_of_word_valid(word):tmp_string_list.append(word)# bigram file contains common word pairings in english speechwith codecs.open(self.bigram_file, mode='r', encoding='utf-8') as f:lines = f.readlines()for line in lines:if len(tmp_string_list) == self.num_words:breakcolumns = line.lower().split()word = columns[0] + ' ' + columns[1]if is_valid_str(word) and _is_length_of_word_valid(word):tmp_string_list.append(word)if len(tmp_string_list) != self.num_words:raise IOError('Could not pull enough words''from supplied monogram and bigram files.')# interlace to mix up the easy and hard wordsself.string_list[::2] = tmp_string_list[:self.num_words // 2]self.string_list[1::2] = tmp_string_list[self.num_words // 2:]for i, word in enumerate(self.string_list):self.Y_len[i] = len(word)self.Y_data[i, 0:len(word)] = text_to_labels(word)self.X_text.append(word)self.Y_len = np.expand_dims(np.array(self.Y_len), 1)self.cur_val_index = self.val_splitself.cur_train_index = 0# each time an image is requested from train/val/test, a new random# painting of the text is performeddef get_batch(self, index, size, train):# width and height are backwards from typical Keras convention# because width is the time dimension when it gets fed into the RNNif K.image_data_format() == 'channels_first':X_data = np.ones([size, 1, self.img_w, self.img_h])else:X_data = np.ones([size, self.img_w, self.img_h, 1])labels = np.ones([size, self.absolute_max_string_len])input_length = np.zeros([size, 1])label_length = np.zeros([size, 1])source_str = []for i in range(size):# Mix in some blank inputs.  This seems to be important for# achieving translational invarianceif train and i > size - 4:if K.image_data_format() == 'channels_first':X_data[i, 0, 0:self.img_w, :] = self.paint_func('')[0, :, :].Telse:X_data[i, 0:self.img_w, :, 0] = self.paint_func('',)[0, :, :].Tlabels[i, 0] = self.blank_labelinput_length[i] = self.img_w // self.downsample_factor - 2label_length[i] = 1source_str.append('')else:if K.image_data_format() == 'channels_first':X_data[i, 0, 0:self.img_w, :] = (self.paint_func(self.X_text[index + i])[0, :, :].T)else:X_data[i, 0:self.img_w, :, 0] = (self.paint_func(self.X_text[index + i])[0, :, :].T)labels[i, :] = self.Y_data[index + i]input_length[i] = self.img_w // self.downsample_factor - 2label_length[i] = self.Y_len[index + i]source_str.append(self.X_text[index + i])inputs = {'the_input': X_data,'the_labels': labels,'input_length': input_length,'label_length': label_length,'source_str': source_str  # used for visualization only}outputs = {'ctc': np.zeros([size])}  # dummy data for dummy loss functionreturn (inputs, outputs)def next_train(self):while 1:ret = self.get_batch(self.cur_train_index,self.minibatch_size, train=True)self.cur_train_index += self.minibatch_sizeif self.cur_train_index >= self.val_split:self.cur_train_index = self.cur_train_index % 32(self.X_text, self.Y_data, self.Y_len) = shuffle_mats_or_lists([self.X_text, self.Y_data, self.Y_len], self.val_split)yield retdef next_val(self):while 1:ret = self.get_batch(self.cur_val_index,self.minibatch_size, train=False)self.cur_val_index += self.minibatch_sizeif self.cur_val_index >= self.num_words:self.cur_val_index = self.val_split + self.cur_val_index % 32yield retdef on_train_begin(self, logs={}):self.build_word_list(16000, 4, 1)self.paint_func = lambda text: paint_text(text, self.img_w, self.img_h,rotate=False, ud=False, multi_fonts=False)def on_epoch_begin(self, epoch, logs={}):# rebind the paint function to implement curriculum learningif 3 <= epoch < 6:self.paint_func = lambda text: paint_text(text, self.img_w, self.img_h,rotate=False, ud=True, multi_fonts=False)elif 6 <= epoch < 9:self.paint_func = lambda text: paint_text(text, self.img_w, self.img_h,rotate=False, ud=True, multi_fonts=True)elif epoch >= 9:self.paint_func = lambda text: paint_text(text, self.img_w, self.img_h,rotate=True, ud=True, multi_fonts=True)if epoch >= 21 and self.max_string_len < 12:self.build_word_list(32000, 12, 0.5)# the actual loss calc occurs here despite it not being
# an internal Keras loss functiondef ctc_lambda_func(args):y_pred, labels, input_length, label_length = args# the 2 is critical here since the first couple outputs of the RNN# tend to be garbage:y_pred = y_pred[:, 2:, :]return K.ctc_batch_cost(labels, y_pred, input_length, label_length)# For a real OCR application, this should be beam search with a dictionary
# and language model.  For this example, best path is sufficient.def decode_batch(test_func, word_batch):out = test_func([word_batch])[0]ret = []for j in range(out.shape[0]):out_best = list(np.argmax(out[j, 2:], 1))out_best = [k for k, g in itertools.groupby(out_best)]outstr = labels_to_text(out_best)ret.append(outstr)return retclass VizCallback(keras.callbacks.Callback):def __init__(self, run_name, test_func, text_img_gen, num_display_words=6):self.test_func = test_funcself.output_dir = os.path.join(OUTPUT_DIR, run_name)self.text_img_gen = text_img_genself.num_display_words = num_display_wordsif not os.path.exists(self.output_dir):os.makedirs(self.output_dir)def show_edit_distance(self, num):num_left = nummean_norm_ed = 0.0mean_ed = 0.0while num_left > 0:word_batch = next(self.text_img_gen)[0]num_proc = min(word_batch['the_input'].shape[0], num_left)decoded_res = decode_batch(self.test_func,word_batch['the_input'][0:num_proc])for j in range(num_proc):edit_dist = editdistance.eval(decoded_res[j],word_batch['source_str'][j])mean_ed += float(edit_dist)mean_norm_ed += float(edit_dist) / len(word_batch['source_str'][j])num_left -= num_procmean_norm_ed = mean_norm_ed / nummean_ed = mean_ed / numprint('\nOut of %d samples:  Mean edit distance:''%.3f Mean normalized edit distance: %0.3f'% (num, mean_ed, mean_norm_ed))def on_epoch_end(self, epoch, logs={}):self.model.save_weights(os.path.join(self.output_dir, 'weights%02d.h5' % (epoch)))self.show_edit_distance(256)word_batch = next(self.text_img_gen)[0]res = decode_batch(self.test_func,word_batch['the_input'][0:self.num_display_words])if word_batch['the_input'][0].shape[0] < 256:cols = 2else:cols = 1for i in range(self.num_display_words):pylab.subplot(self.num_display_words // cols, cols, i + 1)if K.image_data_format() == 'channels_first':the_input = word_batch['the_input'][i, 0, :, :]else:the_input = word_batch['the_input'][i, :, :, 0]pylab.imshow(the_input.T, cmap='Greys_r')pylab.xlabel('Truth = \'%s\'\nDecoded = \'%s\'' %(word_batch['source_str'][i], res[i]))fig = pylab.gcf()fig.set_size_inches(10, 13)pylab.savefig(os.path.join(self.output_dir, 'e%02d.png' % (epoch)))pylab.close()def train(run_name, start_epoch, stop_epoch, img_w):# Input Parametersimg_h = 64words_per_epoch = 16000val_split = 0.2val_words = int(words_per_epoch * (val_split))# Network parametersconv_filters = 16kernel_size = (3, 3)pool_size = 2time_dense_size = 32rnn_size = 512minibatch_size = 32if K.image_data_format() == 'channels_first':input_shape = (1, img_w, img_h)else:input_shape = (img_w, img_h, 1)fdir = os.path.dirname(get_file('wordlists.tgz',origin='http://www.mythic-ai.com/datasets/wordlists.tgz',untar=True))img_gen = TextImageGenerator(monogram_file=os.path.join(fdir, 'wordlist_mono_clean.txt'),bigram_file=os.path.join(fdir, 'wordlist_bi_clean.txt'),minibatch_size=minibatch_size,img_w=img_w,img_h=img_h,downsample_factor=(pool_size ** 2),val_split=words_per_epoch - val_words)act = 'relu'input_data = Input(name='the_input', shape=input_shape, dtype='float32')inner = Conv2D(conv_filters, kernel_size, padding='same',activation=act, kernel_initializer='he_normal',name='conv1')(input_data)inner = MaxPooling2D(pool_size=(pool_size, pool_size), name='max1')(inner)inner = Conv2D(conv_filters, kernel_size, padding='same',activation=act, kernel_initializer='he_normal',name='conv2')(inner)inner = MaxPooling2D(pool_size=(pool_size, pool_size), name='max2')(inner)conv_to_rnn_dims = (img_w // (pool_size ** 2),(img_h // (pool_size ** 2)) * conv_filters)inner = Reshape(target_shape=conv_to_rnn_dims, name='reshape')(inner)# cuts down input size going into RNN:inner = Dense(time_dense_size, activation=act, name='dense1')(inner)# Two layers of bidirectional GRUs# GRU seems to work as well, if not better than LSTM:gru_1 = GRU(rnn_size, return_sequences=True,kernel_initializer='he_normal', name='gru1')(inner)gru_1b = GRU(rnn_size, return_sequences=True,go_backwards=True, kernel_initializer='he_normal',name='gru1_b')(inner)gru1_merged = add([gru_1, gru_1b])gru_2 = GRU(rnn_size, return_sequences=True,kernel_initializer='he_normal', name='gru2')(gru1_merged)gru_2b = GRU(rnn_size, return_sequences=True, go_backwards=True,kernel_initializer='he_normal', name='gru2_b')(gru1_merged)# transforms RNN output to character activations:inner = Dense(img_gen.get_output_size(), kernel_initializer='he_normal',name='dense2')(concatenate([gru_2, gru_2b]))y_pred = Activation('softmax', name='softmax')(inner)Model(inputs=input_data, outputs=y_pred).summary()labels = Input(name='the_labels',shape=[img_gen.absolute_max_string_len], dtype='float32')input_length = Input(name='input_length', shape=[1], dtype='int64')label_length = Input(name='label_length', shape=[1], dtype='int64')# Keras doesn't currently support loss funcs with extra parameters# so CTC loss is implemented in a lambda layerloss_out = Lambda(ctc_lambda_func, output_shape=(1,),name='ctc')([y_pred, labels, input_length, label_length])# clipnorm seems to speeds up convergencesgd = SGD(lr=0.02, decay=1e-6, momentum=0.9, nesterov=True, clipnorm=5)model = Model(inputs=[input_data, labels, input_length, label_length],outputs=loss_out)# the loss calc occurs elsewhere, so use a dummy lambda func for the lossmodel.compile(loss={'ctc': lambda y_true, y_pred: y_pred}, optimizer=sgd)if start_epoch > 0:weight_file = os.path.join(OUTPUT_DIR,os.path.join(run_name, 'weights%02d.h5' % (start_epoch - 1)))model.load_weights(weight_file)# captures output of softmax so we can decode the output during visualizationtest_func = K.function([input_data], [y_pred])viz_cb = VizCallback(run_name, test_func, img_gen.next_val())model.fit_generator(generator=img_gen.next_train(),steps_per_epoch=(words_per_epoch - val_words) // minibatch_size,epochs=stop_epoch,validation_data=img_gen.next_val(),validation_steps=val_words // minibatch_size,callbacks=[viz_cb, img_gen],initial_epoch=start_epoch)if __name__ == '__main__':run_name = datetime.datetime.now().strftime('%Y:%m:%d:%H:%M:%S')train(run_name, 0, 20, 128)# increase to wider images and start at epoch 20.# The learned weights are reloadedtrain(run_name, 20, 25, 512)

DL之CNN:利用CNN(keras, CTC loss, {image_ocr})算法实现OCR光学字符识别相关推荐

  1. 深入浅出CTC loss

    前言   本片博客主要学习了CTC并在动态规划求CTC loss的理解上学习了这篇博客   由于在看的过程中,还是花了很长时间反复推敲作者的理解,因此在这边用更加简单的话来解释一下CTC loss 背 ...

  2. CV:基于Keras利用CNN主流架构之mini_XCEPTION训练性别分类模型hdf5并保存到指定文件夹下

    CV:基于Keras利用CNN主流架构之mini_XCEPTION训练性别分类模型hdf5并保存到指定文件夹下 目录 图示过程 核心代码 图示过程 核心代码 from keras.callbacks ...

  3. CV:基于Keras利用CNN主流架构之mini_XCEPTION训练情感分类模型hdf5并保存到指定文件夹下

    CV:基于Keras利用CNN主流架构之mini_XCEPTION训练情感分类模型hdf5并保存到指定文件夹下 目录 图示过程 核心代码 图示过程 核心代码 def mini_XCEPTION(inp ...

  4. DL之LSTM/GRU/CNN:基于tensorflow框架分别利用LSTM/GRU、CNN算法对上海最高气温(数据归一化+构造有监督数据集)实现回归预测案例

    DL之LSTM/GRU/CNN:基于tensorflow框架分别利用LSTM/GRU.CNN算法对上海最高气温(构造有监督数据集)实现回归预测案例 目录 基于tensorflow框架分别利用LSTM/ ...

  5. DL之CNN:利用CNN算法实现对句子分类+进行情感分析(预测句子情感)

    DL之CNN:利用CNN算法实现对句子分类+进行情感分析(预测句子情感) 目录 CNN算法设计思路 代码实现 CNN算法设计思路 b 代码实现 后期更新--

  6. Keras之CNN:基于Keras利用cv2建立训练存储卷积神经网络模型(2+1)并调用摄像头进行实时人脸识别

    Keras之CNN:基于Keras利用cv2建立训练存储卷积神经网络模型(2+1)并调用摄像头进行实时人脸识别 目录 输出结果 设计思路 核心代码 输出结果 设计思路 核心代码 # -*- codin ...

  7. 基于tensorflow2.0利用CNN与线性回归两种方法实现手写数字识别

    CNN实现手写数字识别 导入模块和数据集 import os import tensorflow as tf from tensorflow import keras from tensorflow. ...

  8. 利用CNN实现图像和数值数据融合

    利用CNN实现图像(MRI)和数值数据融合 一.背景 在很多实际任务当中,模型构建数据类型多样,有数值型.图像.音频等各式各样的数据,如果单纯利用某种类型的数据构建分类或回归模型,好处是构建简单,数据 ...

  9. 利用CNN进行手写数字识别

    资源下载地址:https://download.csdn.net/download/sheziqiong/85884967 资源下载地址:https://download.csdn.net/downl ...

最新文章

  1. 计算机技术在工程的应用浅论,《计算机技术在计算机应用技术中的应用浅论》...
  2. bzoj2190 [SDOI2008]仪仗队(欧拉函数)
  3. GDCM:处理JAI-JPEGLS错误的测试程序
  4. hivesql优化的深入解析
  5. 【HTML】获取当前时间并显示在网页上
  6. php查百度收录,php检查页面是否被百度收录,可整合到后台
  7. 关于unity2019.3.11.f在烘焙光照贴图时闪退的问题
  8. Python基于wordnet实现词语相似度计算分析
  9. 原创 | 职场二十年(一)电话风波
  10. 【IDEA类注释模板和方法注释模板】
  11. 在esp32开发板上实现的web_radio,基于wm8978 codec芯片
  12. 【兴趣书签】让人深陷其中的科幻小说
  13. [转]myip.cn-电影格式转换器www.sifangvideo.com
  14. Linux系统介绍:内核、shell及软件包管理
  15. 网易博客日志:《数字滤波器》交流-6-LMS算法的训练及工作阶段
  16. MFC字符串资源IDR_MAINFRAME和IDR_XXXXTYPE
  17. ocaml_管理OCaml软件包的好工具
  18. 云服务器安装MYSQL
  19. Mapped Statements collection does not contain value for XXX错误
  20. centos使用yum安装提示:removing mirrorlist with no valid mirrors

热门文章

  1. jpa的批量修改_SpringDataJpa的批量 保存 修改 操作
  2. 有效括号 python_python 有效的括号的实现代码示例
  3. 利用windows的rar工具创建自解压安装文件的方法
  4. HDU 6070 Dirt Ratio(线段树、二分)
  5. Spring Cloud构建微服务架构:服务消费(基础)
  6. umask及文件默认和原始权限说明
  7. Ansible自动化运维基础-------ploybook
  8. jenkins配置git出现ERROR: Timeout after 10 minutes 同时命令行出现:Enter passphrase for key 的提示
  9. str 类常用的函数
  10. 微服务架构,如何做分布式,通用缓存机制?