计算机视觉-1.2手写字体识别

小白入门，计算机的知识反着学比较快，先用起来感受一下，再去思考用的是什么东西。从抄代码开始。https://blog.csdn.net/wsp_1138886114/article/details/82948880

第一次尝试做计算机视觉相关的东西，用到了以下工具。
sklearn 机器学习的工具包
skimage 图像处理的工具包
numpy 矩阵运算工具包
MNIST 一个入门级的计算机视觉数据集，数据集分为训练数据，训练标签，测试数据，测试标签。数据集下载地址
注意:数据文件二级制中的前几位是对数据的描述，例如train-images.idx3-ubyte前16位记录了图片数量，图片宽高。

流程
1.导入数据
2.训练数据
3.得到模型
4.测试模型
5.校验图片

一、提取hog特征，生成模型

import os
import struct
import joblib
from skimage.feature import hog
from sklearn.svm import LinearSVC
import numpy"""
从图片数据集提取hog特征列表
"""
def get_hog(images):list_hog = []for image in images:fd = hog(image.reshape(28, 28),pixels_per_cell=(7, 7),cells_per_block=(2, 2))list_hog.append(fd)hog_features = numpy.array(list_hog, 'float64')return hog_features"""
训练模型
"""
def train():labels_path = os.path.join("./", 'train-labels.idx1-ubyte')images_path = os.path.join("./", 'train-images.idx3-ubyte')with open(labels_path, 'rb') as lbpath:magic, n = struct.unpack('>II', lbpath.read(8))labels = numpy.fromfile(lbpath, dtype=numpy.uint8)with open(images_path, 'rb') as imgpath:magic, num, rows, cols = struct.unpack(">IIII", imgpath.read(16))   # 文件的描述信息print("magic, num, rows, cols", magic, num, rows, cols)  # 2051 60000 28 28images = numpy.fromfile(imgpath, dtype=numpy.uint8).reshape(len(labels), 784)hog_features = get_hog(images)clf = LinearSVC()clf.fit(hog_features, labels)  # 训练,60000个hog特征和对应的60000个数字joblib.dump(clf, "digits_cls.pkl", compress=3)  # 保存模型print("模型训练结束")"""
测试数据集
"""
def t10k():labels_path = os.path.join("./", 't10k-labels.idx1-ubyte')images_path = os.path.join("./", 't10k-images.idx3-ubyte')with open(labels_path, 'rb') as lbpath:magic, n = struct.unpack('>II', lbpath.read(8))t10k_labels = numpy.fromfile(lbpath, dtype=numpy.uint8)with open(images_path, 'rb') as imgpath:magic, num, rows, cols = struct.unpack(">IIII", imgpath.read(16))t10k_images = numpy.fromfile(imgpath, dtype=numpy.uint8).reshape(len(t10k_labels), 784)hog_features = get_hog(t10k_images)clf = joblib.load("digits_cls.pkl")  # 读取模型，生成分类器predictions = clf.predict(hog_features)     # 利用模型进行预测num_correct = 0for i in range(len(predictions)):       # 比照预测数据与标签if predictions[i] == t10k_labels[i]:num_correct += 1print("%s of %s values correct." % (num_correct, len(predictions)))train()
t10k()

https://zhuanlan.zhihu.com/p/85829145
Navneet Dalal and Bill Triggs在05年用HOG技术进行行人检测

计算梯度直方图，要进行很多流程和计算，这里根据api的调用简化说明。
1，计算每个像素的梯度。
2，把整个图像划分为若干个小单元，每个小单元称为cell，我们这里设置的cell大小是7x7像素。统计每个cell的梯度直方图。
3，将几个相邻cell的直方图串联起来，这几个cell的大小称为block。例如2x2，就代表两行两列的四个cell组成一个block。
4，最后在将所有block 的直方图统一起来，得到整个图像的hog特征。是一串长的数组。

分类器是机器学习中的概念，在这暂时就封装起来当作一个工具，作用是在标记好类别的训练数据基础上判断一个新的观察样本所属的类别。

二、用真实图片进行测试

import cv2
import joblib
from skimage.feature import hog
import numpy as npclf = joblib.load("digits_cls.pkl")  # 读取分类器
img = cv2.imread("9.jpg")  # 读取输入图片
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)  # 灰度图化
img_gray = cv2.GaussianBlur(img_gray, (5, 5), 0)  # 高斯模糊（去噪）
ret, img_thresh = cv2.threshold(img_gray, 90, 255, cv2.THRESH_BINARY_INV)  # 将图片变为二值图
contours, hierarchy = cv2.findContours(img_thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)  # 查询图像轮廓
bounding_boxes = [cv2.boundingRect(ctr) for ctr in contours]  # 用最小的矩形把每一个目标框起来，矩形框的集合for box in bounding_boxes:[x, y, w, h] = box  # 分别是框左上角的坐标下x,y和框的宽高w,h.左上角是原点cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)  # 在原图上画出矩形框# 新建一个黑色的正方形，以便缩小为28x28if h > w:square_leng = int(h*1.6)else:square_leng = int(w*1.6)roi = np.zeros([square_leng, square_leng])# 把box中的内容放到正方形的中间square_y = int(square_leng // 2 - h // 2 )square_x = int(square_leng // 2 - w // 2 )roi[square_y:square_y+h,square_x:square_x+w] = img_thresh[y:y+h,x:x+w]try:roi = cv2.resize(roi, (28, 28), interpolation = cv2.INTER_AREA)roi = cv2.dilate(roi, (3, 3))  # 膨胀# 计算 HOG 特征roi_hog_fd ,hog_image= hog(roi, pixels_per_cell=(7, 7), cells_per_block=(2, 2),visualize=True)nbr = clf.predict(np.array([roi_hog_fd], 'float64'))    # nbr[0]就是预测的数字,把预测值画在原图上cv2.putText(img, str(int(nbr[0])), (x, y), cv2.FONT_HERSHEY_DUPLEX, 2, (0, 255, 255), 3)cv2.imwrite("nums/"+str(nbr[0]) + ".jpg", roi)except Exception as e:print(e)continuecv2.imshow("hog", img)
cv2.imwrite("result.jpg", img)
cv2.waitKey(0)
cv2.destroyAllWindows()

三.结论

用测试集进行比照的时候大概能得到95%的正确率，但是用真是手写的照片测试的时候，只有80%的正确，这可能和将图片缩放至28*28的时候像素变化太大有关。
这里只是实验，真正的两个重点其实被代码封装了，一个是HOG，一个是SVM.LinearSVC。