哈工大人工智能暑期课实践项目——手写体识别四则运算（项目计划）

项目介绍

手写数字识别增强版。在 MNist示例程序的基础上进一步扩展，

阶段要求：

能实现多个数字的手写体识别
能实现加减乘除符号的识别
能做一个手写体识别四则运算的APP （可以是网页服务或手机App)

对此，我们延续 MNist示例程序中使用的模型，将mnist数据集进行扩充，添加了'+', '-', '*', '/', '(', ')'这些符号数据，这通过鼠标手写的人工数据构成。另外，我们修改了使用的CNN中一些参数，包括input，epoch， optimizer等。主要使用的是python以及相应的tensorflow, cv2. PIL等库，最后通过django进行网页制作，简单制作了一个网页版的小画板，将鼠标手写的公式传达至后端进行处理之后，将识别出的公式以及计算结果传至前端。从而实现了一个简约而不简单的手写体识别四则运算的网页。

具体过程描述如下：

手写符号数据
添加图像处理模块，用于手写数据和手写公式的分割以及标准化（与MNIST数据一致）
使用tensorflow建立cnn模型并进行训练
添加计算器模块，使用两个栈来实现
将使用django整合在一起，并实现一个网页版的画板

项目的github地址为：https://github.com/Godforever/ms

小组成员为: 周雄，钟宇宏，杨帆，吴沛钢

贡献分值为:20.2, 20.1, 19.9, 19.8

各自的github用户名为: Godforever, nanua, Yangf010333, 1163710216

注：使用python环境是python3, 需要安装的库包括django, sklearn, tensorfow, cv2,还有一个tqdm(进度条库)。

二、项目实现

1. 数据获取与处理

结合mnist数据集和人工手写扩充的符号集，作为训练数据和测试数据。由于没有现成的手写符号数据集, 人工手写符号的过程是相当的痛苦的，（唉，允悲！），这里是我们人工手写的数据集并经过标准化处理之后的，点这里！0.0。密码: jods

对于原始输入的数据集，我们采用连通域算法进行图片切割，当然这有个弊端就是我们手写的公式不能连在一起，对于'÷'这个特殊符号，我们特地进行了一些额外的处理。这个算法也会在处理包含公式的图片中用到,下面是连通域算法的主要代码：

def get_x_y_cuts(data, n_lines=1):// 获取各个小图像的位置范围w, h = data.shapevisited = set()q = queue.Queue()offset = [(-1, -1), (0, -1), (1, -1), (-1, 0), (1, 0), (-1, 1), (0, 1), (1, 1)]cuts = []for y in range(h):for x in range(w):x_axis = []y_axis = []if data[x][y] < 200 and (x, y) not in visited:q.put((x, y))visited.add((x, y))while not q.empty():x_p, y_p = q.get()for x_offset, y_offset in offset:x_c, y_c = x_p + x_offset, y_p + y_offsetif (x_c, y_c) in visited:continuevisited.add((x_c, y_c))try:if data[x_c][y_c] < 200:q.put((x_c, y_c))x_axis.append(x_c)y_axis.append(y_c)except:passif x_axis:min_x, max_x = min(x_axis), max(x_axis)min_y, max_y = min(y_axis), max(y_axis)if max_x - min_x > 3 and max_y - min_y > 3:cuts.append([min_x, max_x + 1, min_y, max_y + 1])if n_lines == 1:cuts = sorted(cuts, key=lambda x: x[2])pr_item = cuts[0]count = 1len_cuts = len(cuts)new_cuts = [cuts[0]]pr_k = 0for i in range(1, len_cuts):pr_item = new_cuts[pr_k]now_item = cuts[i]if not (now_item[2] > pr_item[3]):new_cuts[pr_k][0] = min(pr_item[0], now_item[0])new_cuts[pr_k][1] = max(pr_item[1], now_item[1])new_cuts[pr_k][2] = min(pr_item[2], now_item[2])new_cuts[pr_k][3] = max(pr_item[3], now_item[3])else:new_cuts.append(now_item)pr_k += 1cuts = new_cutsreturn cutsdef get_image_cuts(image, dir=None, is_data=False, n_lines=1, data_needed=False, count=0):if is_data:data = imageelse:data = cv2.imread(image, 2)cuts = get_x_y_cuts(data, n_lines=n_lines)image_cuts = Nonefor i, item in enumerate(cuts):count += 1max_dim = max(item[1] - item[0], item[3] - item[2])new_data = np.ones((int(1.4 * max_dim), int(1.4 * max_dim))) * 255x_min, x_max = (max_dim - item[1] + item[0]) // 2, (max_dim - item[1] + item[0]) // 2 + item[1] - item[0]y_min, y_max = (max_dim - item[3] + item[2]) // 2, (max_dim - item[3] + item[2]) // 2 + item[3] - item[2]new_data[int(0.2 * max_dim) + x_min:int(0.2 * max_dim) + x_max, int(0.2 * max_dim) + y_min:int(0.2 * max_dim) + y_max] = data[item[0]:item[1], item[2]:item[3]]standard_data = cv2.resize(new_data, (28, 28))if not data_needed:cv2.imwrite(dir + str(count) + ".jpg", standard_data)if data_needed:data_flat = (255 - np.resize(standard_data, (1, 28 * 28))) / 255if image_cuts is None:image_cuts = data_flatelse:image_cuts = np.r_[image_cuts, data_flat]if data_needed:return image_cutsreturn count

其中, get_x_y_cuts(data, n_lines=1) 是获取图片中各个小分割图像的坐标范围(简单的说，就是用一个正方形框起来)，data代表着待分割图像的灰度值矩阵，n_lines是表示分割图像中符号的行数。

get_image_cuts(image, dir=None, is_data=False, n_lines=1, data_needed=False, count=0)是获取图像中的各个小分割图像的函数，它可以以数据的形式返回，也可以将之以图像的形式保存到磁盘。image代表着带分割图像；dir则是图像保存的目的路径；is_data表示image是灰度值矩阵还是一个文件名；n_lines定义同上一个函数；data_needed是表示是否需要以数据的形式返回图像的数据集；count是为了方便统计分割符号数量的一个parameter，可忽略。

接下来是数据预处理的代码：

class train_test(object):def __init__(self):self.images = Noneself.labels = Noneself.offset = 0def next_batch(self, batch_size):// 构造函数返回训练数据或测试数据的下一批次if self.offset + batch_size <= self.images.shape[0]:batch_images = self.images[self.offset:self.offset + batch_size]batch_labels = self.labels[self.offset:self.offset + batch_size]self.offset = (self.offset + batch_size) % self.images.shape[0]else:new_offset = self.offset + batch_size - self.images.shape[0]batch_images = self.images[self.offset:-1]batch_labels = self.labels[self.offset:-1]batch_images = np.r_[batch_images, self.images[0:new_offset]]batch_labels = np.r_[batch_labels, self.labels[0:new_offset]]self.offset = new_offsetreturn batch_images, batch_labelsclass digit_data(object):def __init__(self):self.train = train_test()self.test = train_test()def input_data(self):// 读取mnist数据集并将训练数据和测试数据进行整合mnist = input_data.read_data_sets("MNIST_data", one_hot=True)images = np.r_[mnist.train.images, mnist.test.images]labels = np.r_[mnist.train.labels, mnist.test.labels]// 扩大标签数据的维度，增加6个符号维度zeros = np.zeros((labels.shape[0], 6))labels = np.c_[labels, zeros]print("Loading the operators' datasets....")// 读取符号数据并与mnist数据集进行合并op_images, op_labels = get_images_labels()images, labels = np.r_[images, op_images], np.r_[labels, op_labels]print("Generating the train_data and test_data....")// 使用sklearn中的数据划分函数生成训练数据和测试数据sss = StratifiedShuffleSplit(n_splits=16, test_size=0.15, random_state=23)for train_index, test_index in sss.split(images, labels):self.train.images, self.test.images = images[train_index], images[test_index]self.train.labels, self.test.labels = labels[train_index], labels[test_index]

其中,class train_test(object)是将训练数据或测试数据封装的一个类，封装的属性images和labels均以np.ndarray的格式保存，它们的shape[0]相同；我们还构造了一个next_batch(self, batch_size)函数用于按批次读取数据。

而class digit_data(object)则是将训练数据和测试数据封装在一起的一个类，封装的是两个train_test()对象；提供input_data(self)
函数用于数据的处理：主要是将Mnist数据集和人工手写的符号集结合在一起，并按test_size=0.15的比例分割为训练集和测试集。

2.模型定义

手写符号识别模型采用的是CNN，显示定义了输入层，然后是两个卷积池化层用于特征提取，最后则接上一个全连接层用于分类。其中，池化层采用最大池化策略；第一个卷积层中有32个5×5的卷积核，；第二个卷积层中有64个卷积核对第一层的32个输出进行处；全连接层设置一个隐藏层，神经元个数默认为1024。详细代码如下：

class model(object):def __init__(self, batch_size=100, hidden_size=1024, n_output=16):self.HIDDEN_SIZE = hidden_sizeself.BATCH_SIZE = batch_sizeself.N_OUTPUT = n_outputself.N_BATCH = 0def weight_variable(self, shape):// 定义权重函数，并使用随机初始化权重initial = tf.truncated_normal(shape, stddev=0.10)  return tf.Variable(initial, name="w")def bias_variable(self, shape):// 定义偏置函数，并初始化为0.1initial = tf.constant(0.1, shape=shape)return tf.Variable(initial, name="b")def conv2d(self, x, W):// 定义卷积函数return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')def max_pool_2x2(self, x):// 定义池化函数return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')// 构造训练函数def train_model(self, EPOCH=21):// 读取训练数据和测试数据mnist_operator = digit_data()mnist_operator.input_data()// 计算批次数目self.N_BATCH = mnist_operator.train.images.shape[0] // self.BATCH_SIZE//定义输入x = tf.placeholder(tf.float32, [None, 784], name='image_input')y = tf.placeholder(tf.float32, [None, self.N_OUTPUT])keep_prob = tf.placeholder(tf.float32, name="keep_prob")x_image = tf.reshape(x, [-1, 28, 28, 1])// 定义第一个卷积池化层with tf.variable_scope("conv1"):W_conv1 = self.weight_variable([5, 5, 1, 32])b_conv1 = self.bias_variable([32])h_conv1 = tf.nn.relu(self.conv2d(x_image, W_conv1) + b_conv1)h_pool1 = self.max_pool_2x2(h_conv1)// 定义第二个卷积池化层with tf.variable_scope("conv2"):W_conv2 = self.weight_variable([5, 5, 32, 64])b_conv2 = self.bias_variable([64])h_conv2 = tf.nn.relu(self.conv2d(h_pool1, W_conv2) + b_conv2)h_pool2 = self.max_pool_2x2(h_conv2)// 定义第一个全连接层with tf.variable_scope("fc1"):W_fc1 = self.weight_variable([7 * 7 * 64, self.HIDDEN_SIZE])b_fc1 = self.bias_variable([self.HIDDEN_SIZE])h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)// 定义第二个全连接层with tf.variable_scope("fc2"):W_fc2 = self.weight_variable([self.HIDDEN_SIZE, self.N_OUTPUT])b_fc2 = self.bias_variable([self.N_OUTPUT])h_fc2 = tf.matmul(h_fc1_drop, W_fc2) + b_fc2// 定义正则项regularizers = (tf.nn.l2_loss(W_fc1) + tf.nn.l2_loss(b_fc1) + tf.nn.l2_loss(W_fc2) + tf.nn.l2_loss(b_fc1))prediction = tf.nn.softmax(h_fc2, name="prediction")predict_op = tf.argmax(prediction, 1, name="predict_op")// 定义损失函数loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=y, logits=prediction))loss_re = loss + 5e-4 * regularizerstrain_step = tf.train.AdamOptimizer(1e-4).minimize(loss_re)correct_prediction = tf.equal(predict_op, tf.argmax(y, 1))accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))saver = tf.train.Saver()tf.add_to_collection("predict_op", predict_op)print("Start training....")with tf.Session() as sess:sess.run(tf.global_variables_initializer())for i in tqdm(range(EPOCH * self.N_BATCH)):epoch = i // self.N_BATCHbatch_xs, batch_ys = mnist_operator.train.next_batch(self.BATCH_SIZE)sess.run(train_step, feed_dict={x: batch_xs, y: batch_ys, keep_prob: 0.7})if epoch % 10 == 0 and (i+1) % self.N_BATCH == 0:acc = []for i in range(mnist_operator.test.labels.shape[0]//self.BATCH_SIZE):batch_xs_test, batch_ys_test = mnist_operator.test.next_batch(self.BATCH_SIZE)test_acc = sess.run(accuracy, feed_dict={x: batch_xs_test, y: batch_ys_test, keep_prob: 1.0})acc.append(test_acc)print()print("Iter" + str(epoch) + ",Testing Accuracy = " + str(sum(acc) / len(acc)))if not os.path.exists('./model/'):os.mkdir('./model/')saver.save(sess, "./model/model", global_step=epoch)// 加载模型def load_model(self, meta, path):self.sess = tf.Session()saver = tf.train.import_meta_graph(meta)saver.restore(self.sess, tf.train.latest_checkpoint(path))// 构造预测函数def predict(self, X):predict = tf.get_collection('predict_op')[0]graph = tf.get_default_graph()input_X = graph.get_operation_by_name("image_input").outputs[0]keep_prob = graph.get_operation_by_name("keep_prob").outputs[0]return self.sess.run(predict, feed_dict={input_X: X, keep_prob: 1.0})[0:]

3. calculator定义

计算器的实现记得是当年的一个数据结构作业，实现起来也较为轻松。首先定义各操作符的优先级，然后使用两个栈（data_stack和operator_stack）来实现计算器。最后，对于正确的输入公式，返回正确的结果；对于不正确的公式，做出不合法性返回"?"。以下是相应代码：

import redef add(a, b):return a + bdef sub(a, b):return a - bdef mul(a, b):return a * bdef div(a, b):return a / bdef is_number(s):try:float(s)return Trueexcept ValueError:passtry:import unicodedataunicodedata.numeric(s)return Trueexcept (TypeError, ValueError):passreturn Falseoperations = {'+': add, '-': sub, '*': mul, '/': div}
weight = {'(': 3, '*': 2, '/': 2, '+': 1, '-': 1, None: 0}# Define the stack of data and the stack of operations
data_stack = []
operator_stack = []def deal_data():op = operator_stack.pop()num2 = float(data_stack.pop())num1 = float(data_stack.pop())result = operations[op](num1, num2)data_stack.append(result)return resultdef calculate(equation):global data_stackglobal operator_stacktry:while equation:cur = re.search(r"((^\d+\.?\d*)|(^\(\-\d+\.?\d*)|\(|\)|\+|\-|\*|/)", equation).group()if "(-" in cur:bracket = cur[0]operator_stack.append(bracket)equation = equation[1:]num = cur[1:]data_stack.append(num)equation = equation[len(num):]else:lenth = len(cur)if is_number(cur):data_stack.append(cur)elif cur == ")":if operator_stack[-1] == "(":operator_stack.pop()else:if operator_stack:data_stack = []operator_stack = []return '?'deal_data()while operator_stack[-1] != "(":deal_data()operator_stack.pop()else:if not (operator_stack):operator_stack.append(cur)else:if weight[cur] > weight[operator_stack[-1]]:operator_stack.append(cur)else:if operator_stack[-1] == "(":operator_stack.append(cur)else:deal_data()while operator_stack and weight[cur] == weight[operator_stack[-1]]:deal_data()operator_stack.append(cur)equation = equation[lenth:]result = deal_data()while operator_stack:result = deal_data()return resultexcept (KeyError,IndexError):data_stack = []operator_stack = []return '?'

4. django定义

应用框架采用的是基于Python的Django。由于本次项目仅仅涉及在一个网页中进行手写汉字的输入、提交和计算结果的返回，因此较为简单。在Django项目中，我们仅创建了`handwriting_calculator`一个APP，实现了该APP中的用于返回主界面的`main_page`view以及用于返回计算结果的`get_result`view，没有用到APP如登录和数据库等其他特性。其中，由于手写数字图片较大，无法作为GET方法的参数进行传递，因此在`get_result`view中我们采用了POST方法进行参数的传递，并通过ajax的方式将计算结果返回前端。

此外，为了调试的方便，我们在`handwriting_calculator`这个APP的`views.py`中还加入了`save_img`函数用于存储从前端传递过来的手写表达式图片。

``handwriting_calculator``中的`views.py`的详细代码如下：

from django.shortcuts import render
from django.http import JsonResponse
from django.views.decorators.csrf import csrf_exempt
from django.templatetags.static import static
from .utils.cnn_model import *
from .utils.image_processing import *
from .utils.calculator import *
from PIL import Imagemeta = static("./model/model-200.meta")
path = static("./model/")def main_page(request):return render(request, "hand_writing_calculator/index.html")def save_img(img_arr: np.ndarray, file_path: str) -> None:img = Image.fromarray(img_arr, 'L')img.save(file_path)@csrf_exempt
def get_result(request):img_str = request.POST["img_data"]global cnn_modelimg_arr = np.array(img_str.split(',')).reshape(200, 1000, 4).astype(np.uint8)binary_img_arr = img_arr[:, :, 3]save_img(binary_img_arr, "./target.png")data = cv2.imread('./target.png', 2)data = 255 - dataimages = get_image_cuts(data, is_data=True, n_lines=1, data_needed=True)equation = ''cnn_model = model()cnn_model.load_model(meta, path)digits = list(cnn_model.predict(images))for d in digits:equation += SYMBOL[d]print(equation)result = calculate(equation)return JsonResponse({"status": "{} = {}".format(equation, result)}, safe=False)

三、成员工作照片