基于卷积神经网络的手写数字识别(附数据集+完整代码+操作说明)
基于卷积神经网络的手写数字识别(附数据集+完整代码+操作说明)
- 配置环境
- 1.前言
- 2.问题描述
- 3.解决方案
- 4.实现步骤
- 4.1数据集选择
- 4.2构建网络
- 4.3训练网络
- 4.4测试网络
- 4.5图像预处理
- 4.6传入网络进行计算
- 5.代码实现
- 5.1文件说明
- 5.2使用方法
- 5.3 训练模型
- 5.4使用训练好的模型测试网络
- 5.5调用摄像头实时检测
- 6.附录
- 7.结束语
配置环境
使用环境:python3.8
平台:Windows10
IDE:PyCharm
1.前言
手写数字识别,作为机器视觉入门项目,无论是基于传统的OpenCV方法还是基于目前火热的深度学习、神经网络的方法都有这不错的训练效果。当然,这个项目也常常被作为大学/研究生阶段的课程实验。可惜的是,目前网络上关于手写数字识别的项目代码很多,但是普遍不完整,对于初学者提出了不小的挑战。为此,博主撰写本文,无论你是希望借此完成课程实验或者学习机器视觉,本文或许对你都有帮助。
2.问题描述
本文针对的问题为:随机在黑板上写一个数字,通过调用电脑摄像头实时检测出数字是0-9哪个数字
3.解决方案
基于Python的深度学习方法:
检测流程如下:
4.实现步骤
4.1数据集选择
手写数字识别经典数据集:本文数据集选择的FishionMint数据集中的t10k,共含有一万张28*28的手写图片(二值图片)
数据集下载地址见:https://github.com/Hurri-cane/Hand_wrtten/tree/master/dataset
4.2构建网络
采用Resnt(残差网络),残差网络的优势在于:
- 更易捕捉模型细微波动
- 更快的收敛速度
本文的网络结构如下图所示,代码见第五节:
4.3训练网络
本文设置训练次数为100个循环,其实网络的训练过程是这样的:
- 给网络模型“喂”数据(图像+标签)
- 网络根据“喂”来的数据不断自我修正权重
- 本文一共“喂”100次1万张图像
- RTX2070上耗时2h
训练结果如下:
4.4测试网络
- 随机选取数据集中37张图片进行检测
- 正确率为36/37
- 选取其中6张进行展示
4.5图像预处理
- 全部采取传统机器视觉的方法
- 速度“飞快”,仅做以上操作处理速度高达200fps
4.6传入网络进行计算
- 手写0-9的数字除了3识别不了其余均能识别
- 检测速度高达60fps
5.代码实现
本文所有代码都已经上传至Github上https://github.com/Hurri-cane/Hand_wrtten/tree/master
5.1文件说明
- dataset文件夹存放的是训练数据集
- logs文件夹为训练结束后权重文件所在
- real_img、real_img_resize、test_imgs为用来测试的图片文件夹
- 下面的py文件为本文代码
5.2使用方法
按照博主的环境配置自己的Python环境
其中主要的包有:numpy、struct、matplotlib、OpenCV、Pytorch、torchvision、tqdm
5.3 训练模型
本文提供了训练好的模型,大家可以直接调用,已经上传至GitHub,如果不想训练的话,可以跳过训练这一步骤
下面是训练的流程:
打开hand_wrtten_train.py文件,点击运行(博主使用的是PyCharm,大家根据自己喜好选择IDLE即可)
值得注意的是,数据集路径需要修改为自己的路径,即这一段
训练过程没报错会出现以下显示
训练得到的权重会保存在logs文件夹下
模型训练需要时间,此时等待训练结束即可(RTX2070上训练了1h左右)
5.4使用训练好的模型测试网络
测试采用图片进行测试,代码见main_pthoto.py文件,使用方法与上面训练代码一直,代开后运行即可
同样值得注意的是,main_pthoto.py文件中图片路径需要修改为自己的路径,即这一段
以及predict.py文件中权重片路径需要修改为自己在5.3步中训练得到的.pth文件路径,如图所示
运行结果如下
5.5调用摄像头实时检测
代码存在于main.py文件下,使用方法和5.4节图片检测一致,修改predict.py文件中权重片路径需要修改为自己在5.3步中训练得到的.pth文件路径,如图所示
再运行main.py文件即可,可以看到载入网络模型后开始调用摄像头,并开始检测
6.附录
在此附上本文核心代码:
hand_wrtten_train.py
# author:Hurricane
# date: 2020/11/4
# E-mail:hurri_cane@qq.comimport numpy as np
import struct
import matplotlib.pyplot as plt
import cv2 as cv
import random
import torch
from torch import nn, optim
import torch.nn.functional as F
# import d2lzh_pytorch as d2l
import time
from tqdm import tqdm# 训练集文件
train_images_idx3_ubyte_file = 'F:/PyCharm/Practice/hand_wrtten/dataset/train-images.idx3-ubyte'
# 训练集标签文件
train_labels_idx1_ubyte_file = 'F:/PyCharm/Practice/hand_wrtten/dataset/train-labels.idx1-ubyte'# 测试集文件
test_images_idx3_ubyte_file = 'F:/PyCharm/Practice/hand_wrtten/dataset/t10k-images.idx3-ubyte'
# 测试集标签文件
test_labels_idx1_ubyte_file = 'F:/PyCharm/Practice/hand_wrtten/dataset/t10k-labels.idx1-ubyte'# 读取数据部分
def decode_idx3_ubyte(idx3_ubyte_file):bin_data = open(idx3_ubyte_file, 'rb').read()offset = 0fmt_header = '>iiii' # 因为数据结构中前4行的数据类型都是32位整型,所以采用i格式,但我们需要读取前4行数据,所以需要4个i。我们后面会看到标签集中,只使用2个ii。magic_number, num_images, num_rows, num_cols = struct.unpack_from(fmt_header, bin_data, offset)print('图片数量: %d张, 图片大小: %d*%d' % (num_images, num_rows, num_cols))# 解析数据集image_size = num_rows * num_colsoffset += struct.calcsize(fmt_header) # 获得数据在缓存中的指针位置,从前面介绍的数据结构可以看出,读取了前4行之后,指针位置(即偏移位置offset)指向0016。print(offset)fmt_image = '>' + str(image_size) + 'B' # 图像数据像素值的类型为unsigned char型,对应的format格式为B。这里还有加上图像大小784,是为了读取784个B格式数据,如果没有则只会读取一个值(即一副图像中的一个像素值)print(fmt_image, offset, struct.calcsize(fmt_image))images = np.empty((num_images, 28, 28))# plt.figure()for i in tqdm(range(num_images)):image = np.array(struct.unpack_from(fmt_image, bin_data, offset)).reshape((num_rows, num_cols)).astype(np.uint8)# images[i] = cv.resize(image, (96, 96))images[i] = image# print(images[i])offset += struct.calcsize(fmt_image)return imagesdef decode_idx1_ubyte(idx1_ubyte_file):bin_data = open(idx1_ubyte_file, 'rb').read()offset = 0fmt_header = '>ii'magic_number, num_images = struct.unpack_from(fmt_header, bin_data, offset)print('图片数量: %d张' % (num_images))# 解析数据集offset += struct.calcsize(fmt_header)fmt_image = '>B'labels = np.empty(num_images)for i in tqdm(range(num_images)):labels[i] = struct.unpack_from(fmt_image, bin_data, offset)[0]offset += struct.calcsize(fmt_image)return labelsdef load_train_images(idx_ubyte_file=train_images_idx3_ubyte_file):return decode_idx3_ubyte(idx_ubyte_file)def load_train_labels(idx_ubyte_file=train_labels_idx1_ubyte_file):return decode_idx1_ubyte(idx_ubyte_file)def load_test_images(idx_ubyte_file=test_images_idx3_ubyte_file):return decode_idx3_ubyte(idx_ubyte_file)def load_test_labels(idx_ubyte_file=test_labels_idx1_ubyte_file):return decode_idx1_ubyte(idx_ubyte_file)# 构建网络部分
class Residual(nn.Module): # 本类已保存在d2lzh_pytorch包中方便以后使用def __init__(self, in_channels, out_channels, use_1x1conv=False, stride=1):super(Residual, self).__init__()self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1, stride=stride)self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1)if use_1x1conv:self.conv3 = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride)else:self.conv3 = Noneself.bn1 = nn.BatchNorm2d(out_channels)self.bn2 = nn.BatchNorm2d(out_channels)def forward(self, X):Y = F.relu(self.bn1(self.conv1(X)))Y = self.bn2(self.conv2(Y))if self.conv3:X = self.conv3(X)return F.relu(Y + X)class GlobalAvgPool2d(nn.Module):# 全局平均池化层可通过将池化窗口形状设置成输入的高和宽实现def __init__(self):super(GlobalAvgPool2d, self).__init__()def forward(self, x):return F.avg_pool2d(x, kernel_size=x.size()[2:])def resnet_block(in_channels, out_channels, num_residuals, first_block=False):# num_residuals:残差数if first_block:assert in_channels == out_channels # 第一个模块的通道数同输入通道数一致blk = []for i in range(num_residuals):if i == 0 and not first_block:blk.append(Residual(in_channels, out_channels, use_1x1conv=True, stride=2))else:blk.append(Residual(out_channels, out_channels))return nn.Sequential(*blk)def evaluate_accuracy(img, label, net):device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')acc_sum, n = 0.0, 0with torch.no_grad():X = torch.unsqueeze(img, 1)if isinstance(net, torch.nn.Module):net.eval() # 评估模式, 这会关闭dropoutacc_sum += (net(X.to(device)).argmax(dim=1) == label.to(device)).float().sum().cpu().item()net.train() # 改回训练模式else: # 自定义的模型, 3.13节之后不会用到, 不考虑GPUif ('is_training' in net.__code__.co_varnames): # 如果有is_training这个参数# 将is_training设置成Falseacc_sum += (net(X, is_training=False).argmax(dim=1) == label).float().sum().item()else:acc_sum += (net(X).argmax(dim=1) == label).float().sum().item()n += label.shape[0]return acc_sum / nclass FlattenLayer(torch.nn.Module):def __init__(self):super(FlattenLayer, self).__init__()def forward(self, x): # x shape: (batch, *, *, ...)return x.view(x.shape[0], -1)if __name__ == '__main__':print("train:")train_images_org = load_train_images().astype(np.float32)train_labels_org = load_train_labels().astype(np.int64)print("test")test_images = load_test_images().astype(np.float32)[0:1000]test_labels = load_test_labels().astype(np.int64)[0:1000]# 数据转换为Tensortrain_images = torch.from_numpy(train_images_org)train_labels = torch.from_numpy(train_labels_org)test_images = torch.from_numpy(test_images)test_labels = torch.from_numpy(test_labels)# test_images = load_test_images()# test_labels = load_test_labels()# 查看前十个数据及其标签以读取是否正确for i in range(5):j = random.randint(0, 60000)print("now, show the number of image[{}]:".format(j), int(train_labels_org[j]))img = train_images_org[j]img = cv.resize(img, (600, 600))cv.imshow("image", img)cv.waitKey(0)cv.destroyAllWindows()print('all done!')print("*" * 50)# ResNet模型net = nn.Sequential(nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),nn.BatchNorm2d(64),nn.ReLU(),nn.MaxPool2d(kernel_size=3, stride=2, padding=1))net.add_module("resnet_block1", resnet_block(64, 64, 2, first_block=True))net.add_module("resnet_block2", resnet_block(64, 128, 2))net.add_module("resnet_block3", resnet_block(128, 256, 2))net.add_module("global_avg_pool", GlobalAvgPool2d()) # GlobalAvgPool2d的输出: (Batch, 512, 1, 1)net.add_module("fc", nn.Sequential(FlattenLayer(), nn.Linear(256, 10)))# 测试网络X = torch.rand((1, 1, 28, 28))for name, layer in net.named_children():X = layer(X)print(name, ' output shape:/t', X.shape)# 训练device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')lr, num_epochs = 0.001, 100optimizer = torch.optim.Adam(net.parameters(), lr=lr)batch_size = 1000net = net.to(device)print("training on ", device)loss = torch.nn.CrossEntropyLoss()loop_times = round(60000 / batch_size)train_acc_plot = []test_acc_plot = []loss_plot = []for epoch in range(num_epochs):train_l_sum, train_acc_sum, n, batch_count, start = 0.0, 0.0, 0, 0, time.time()for i in tqdm(range(1, loop_times)):x = train_images[(i - 1) * batch_size:i * batch_size]y = train_labels[(i - 1) * batch_size:i * batch_size]x = torch.unsqueeze(x, 1) # 对齐维度X = x.to(device)y = y.to(device)y_hat = net(X)l = loss(y_hat, y)optimizer.zero_grad()l.backward()optimizer.step()train_l_sum += l.cpu().item()train_acc_sum += (y_hat.argmax(dim=1) == y).sum().cpu().item()n += y.shape[0]batch_count += 1test_acc = evaluate_accuracy(test_images, test_labels, net)print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f, time %.1f sec'% (epoch + 1, train_l_sum / batch_count, train_acc_sum / n, test_acc, time.time() - start))torch.save(net.state_dict(), 'logs/Epoch%d-Loss%.4f-train_acc%.4f-test_acc%.4f.pth' % ((epoch + 1), train_l_sum / batch_count, train_acc_sum / n, test_acc))print("save successfully")test_acc_plot.append(test_acc)train_acc_plot.append(train_acc_sum / n)loss_plot.append(train_l_sum / batch_count)x = range(0,100)plt.plot(x,test_acc_plot,'r')plt.plot(x, train_acc_plot, 'g')plt.plot(x, loss_plot, 'b')print("*" * 50)
main_pthoto.py
# author:Hurricane
# date: 2020/11/6
# E-mail:hurri_cane@qq.comimport cv2 as cv
import numpy as np
import os
from Pre_treatment import get_number as g_n
import predict as pt
from time import time
from Pre_treatment import softmax
net = pt.get_net()
orig_path = r"F:\PyCharm\Practice\hand_wrtten\real_img_resize"
img_list = os.listdir(orig_path)# img_path = r'F:\PyCharm\Practice\hand_wrtten\real_img\7.jpg'for img_name in img_list:since = time()img_path = os.path.join(orig_path, img_name)img = cv.imread(img_path)img_bw = g_n(img)img_bw_c = img_bw.sum(axis=1) / 255img_bw_r = img_bw.sum(axis=0) / 255r_ind, c_ind = [], []for k, r in enumerate(img_bw_r):if r >= 5:r_ind.append(k)for k, c in enumerate(img_bw_c):if c >= 5:c_ind.append(k)img_bw_sg = img_bw[ c_ind[0]:c_ind[-1],r_ind[0]:r_ind[-1]]leng_c = len(c_ind)leng_r = len(r_ind)side_len = leng_c + 20add_r = int((side_len-leng_r)/2)img_bw_sg_bord = cv.copyMakeBorder(img_bw_sg,10,10,add_r,add_r,cv.BORDER_CONSTANT,value=[0,0,0])# 展示图片cv.imshow("img", img_bw)cv.imshow("img_sg", img_bw_sg_bord)c = cv.waitKey(1) & 0xffimg_in = cv.resize(img_bw_sg_bord, (28, 28))result_org = pt.predict(img_in, net)result = softmax(result_org)best_result = result.argmax(dim=1).item()best_result_num = max(max(result)).cpu().detach().numpy()if best_result_num <= 0.5:best_result = None# 显示结果img_show = cv.resize(img, (600, 600))end_predict = time()fps = np.ceil(1 / (end_predict - since))font = cv.FONT_HERSHEY_SIMPLEXcv.putText(img_show, "The number is:" + str(best_result), (1, 30), font, 1, (0, 0, 255), 2)cv.putText(img_show, "Probability is:" + str(best_result_num), (1, 60), font, 1, (0, 255, 0), 2)cv.putText(img_show, "FPS:" + str(fps), (1, 90), font, 1, (255, 0, 0), 2)cv.imshow("result", img_show)cv.waitKey(1)print(result)print("*" * 50)print("The number is:", best_result)
main.py
# author:Hurricane
# date: 2020/11/6
# E-mail:hurri_cane@qq.comimport cv2 as cv
import numpy as np
import os
from Pre_treatment import get_number as g_n
from Pre_treatment import get_roi
import predict as pt
from time import time
from Pre_treatment import softmax# 实时检测视频
capture = cv.VideoCapture(0,cv.CAP_DSHOW)
capture.set(3, 1920)
capture.set(4, 1080)
net = pt.get_net()# img_path = r'F:\PyCharm\Practice\hand_wrtten\real_img\7.jpg'
while (True):ret, frame = capture.read()since = time()if ret:# frame = cv.imread(img_path)img_bw = g_n(frame)img_bw_sg = get_roi(img_bw)# 展示图片cv.imshow("img", img_bw_sg)c = cv.waitKey(1) & 0xffif c == 27:capture.release()breakimg_in = cv.resize(img_bw_sg, (28, 28))result_org = pt.predict(img_in, net)result = softmax(result_org)best_result = result.argmax(dim=1).item()best_result_num = max(max(result)).cpu().detach().numpy()if best_result_num <= 0.5:best_result = None# 显示结果img_show = cv.resize(frame, (600, 600))end_predict = time()fps = round(1/(end_predict-since))font = cv.FONT_HERSHEY_SIMPLEXcv.putText(img_show, "The number is:" + str(best_result), (1, 30), font, 1, (0, 0, 255), 2)cv.putText(img_show, "Probability is:" + str(best_result_num), (1, 60), font, 1, (0, 255, 0), 2)cv.putText(img_show, "FPS:" + str(fps), (1, 90), font, 1, (255, 0, 0), 2)cv.imshow("result", img_show)cv.waitKey(1)print(result)print("*" * 50)print("The number is:", best_result)else:print("please check camera!")break
Pre_treatment.py
# author:Hurricane
# date: 2020/11/6
# E-mail:hurri_cane@qq.comimport cv2 as cv
import numpy as np
import osdef get_number(img):img_gray = cv.cvtColor(img, cv.COLOR_RGB2GRAY)img_gray_resize = cv.resize(img_gray, (600, 600))ret, img_bw = cv.threshold(img_gray_resize, 200, 255, cv.THRESH_BINARY)kernel = np.ones((3, 3), np.uint8)# img_open = cv.morphologyEx(img_bw,cv.MORPH_CLOSE,kernel)img_open = cv.dilate(img_bw, kernel, iterations=2)num_labels, labels, stats, centroids = \cv.connectedComponentsWithStats(img_open, connectivity=8, ltype=None)for sta in stats:if sta[4] < 1000:cv.rectangle(img_open, tuple(sta[0:2]), tuple(sta[0:2] + sta[2:4]), (0, 0, 255), thickness=-1)return img_opendef get_roi(img_bw):img_bw_c = img_bw.sum(axis=1) / 255img_bw_r = img_bw.sum(axis=0) / 255all_sum = img_bw_c.sum(axis=0)if all_sum != 0:r_ind, c_ind = [], []for k, r in enumerate(img_bw_r):if r >= 5:r_ind.append(k)for k, c in enumerate(img_bw_c):if c >= 5:c_ind.append(k)if len(r_ind)==0 or len(c_ind)==0:return img_bwimg_bw_sg = img_bw[c_ind[0]:c_ind[-1], r_ind[0]:r_ind[-1]]leng_c = len(c_ind)leng_r = len(r_ind)side_len = max(leng_c, leng_r) + 20if leng_c == side_len:add_r = int((side_len - leng_r) / 2)add_c = 10else:add_r = 10add_c = int((side_len - leng_c) / 2)img_bw_sg_bord = cv.copyMakeBorder(img_bw_sg, add_c, add_c, add_r, add_r, cv.BORDER_CONSTANT, value=[0, 0, 0])return img_bw_sg_bordelse:return img_bwdef softmax(X):X_exp = X.exp()partition = X_exp.sum(dim=1, keepdim=True)return X_exp / partition
predict.py
# author:Hurricane
# date: 2020/11/5
# E-mail:hurri_cane@qq.com
# -------------------------------------#
# 对单张图片进行预测
# -------------------------------------#
import numpy as np
import struct
import matplotlib.pyplot as plt
import cv2 as cv
import random
import torch
from torch import nn, optim
import torch.nn.functional as Fclass Residual(nn.Module): # 本类已保存在d2lzh_pytorch包中方便以后使用def __init__(self, in_channels, out_channels, use_1x1conv=False, stride=1):super(Residual, self).__init__()self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1, stride=stride)self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1)if use_1x1conv:self.conv3 = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride)else:self.conv3 = Noneself.bn1 = nn.BatchNorm2d(out_channels)self.bn2 = nn.BatchNorm2d(out_channels)def forward(self, X):Y = F.relu(self.bn1(self.conv1(X)))Y = self.bn2(self.conv2(Y))if self.conv3:X = self.conv3(X)return F.relu(Y + X)class GlobalAvgPool2d(nn.Module):# 全局平均池化层可通过将池化窗口形状设置成输入的高和宽实现def __init__(self):super(GlobalAvgPool2d, self).__init__()def forward(self, x):return F.avg_pool2d(x, kernel_size=x.size()[2:])def resnet_block(in_channels, out_channels, num_residuals, first_block=False):# num_residuals:残差数if first_block:assert in_channels == out_channels # 第一个模块的通道数同输入通道数一致blk = []for i in range(num_residuals):if i == 0 and not first_block:blk.append(Residual(in_channels, out_channels, use_1x1conv=True, stride=2))else:blk.append(Residual(out_channels, out_channels))return nn.Sequential(*blk)class FlattenLayer(torch.nn.Module):def __init__(self):super(FlattenLayer, self).__init__()def forward(self, x): # x shape: (batch, *, *, ...)return x.view(x.shape[0], -1)
def get_net():# 构建网络# ResNet模型model_path = r"F:\PyCharm\Practice\hand_wrtten\logs\Epoch100-Loss0.0000-train_acc1.0000-test_acc0.9930.pth"device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')net = nn.Sequential(nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),nn.BatchNorm2d(64),nn.ReLU(),nn.MaxPool2d(kernel_size=3, stride=2, padding=1))net.add_module("resnet_block1", resnet_block(64, 64, 2, first_block=True))net.add_module("resnet_block2", resnet_block(64, 128, 2))net.add_module("resnet_block3", resnet_block(128, 256, 2))net.add_module("global_avg_pool", GlobalAvgPool2d()) # GlobalAvgPool2d的输出: (Batch, 512, 1, 1)net.add_module("fc", nn.Sequential(FlattenLayer(), nn.Linear(256, 10)))# 测试网络# X = torch.rand((1, 1, 28, 28))# for name, layer in net.named_children():# X = layer(X)# print(name, ' output shape:\t', X.shape)# 加载网络模型print("Load weight into state dict...")stat_dict = torch.load(model_path, map_location=device)net.load_state_dict(stat_dict)net.to(device)net.eval()print("Load finish!")return netdef predict(img, net):device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')img_in = torch.from_numpy(img)img_in = torch.unsqueeze(img_in, 0)img_in = torch.unsqueeze(img_in, 0).to(device)img_in = img_in.float()result_org = net(img_in)return result_org
7.结束语
如果本文对你有帮助的话还请点赞、收藏一键带走哦,你的支持是我最大的动力!(づ。◕ᴗᴗ◕。)づ
基于卷积神经网络的手写数字识别(附数据集+完整代码+操作说明)相关推荐
- 【图像识别】基于卷积神经网络CNN手写数字识别matlab代码
1 简介 针对传统手写数字的随机性,无规律性等问题,为了提高手写数字识别的检测准确性,本文在研究手写数字区域特点的基础上,提出了一种新的手写数字识别检测方法.首先,对采集的手写数字图像进行预处理,由于 ...
- 基于卷积神经网络的手写数字识别、python实现
一.CNN网络结构与构建 参数: 输入数据的维数,通道,高,长 input_dim=(1, 28, 28) 卷积层的超参数,filter_num:滤波器数量,filter_size:滤波器大小,str ...
- 读书笔记-深度学习入门之pytorch-第四章(含卷积神经网络实现手写数字识别)(详解)
1.卷积神经网络在图片识别上的应用 (1)局部性:对一张照片而言,需要检测图片中的局部特征来决定图片的类别 (2)相同性:可以用同样的模式去检测不同照片的相同特征,只不过这些特征处于图片中不同的位置, ...
- 深度学习 卷积神经网络-Pytorch手写数字识别
深度学习 卷积神经网络-Pytorch手写数字识别 一.前言 二.代码实现 2.1 引入依赖库 2.2 加载数据 2.3 数据分割 2.4 构造数据 2.5 迭代训练 三.测试数据 四.参考资料 一. ...
- MATLAB实现基于BP神经网络的手写数字识别+GUI界面+mnist数据集测试
文章目录 MATLAB实现基于BP神经网络的手写数字识别+GUI界面+mnist数据集测试 一.题目要求 二.完整的目录结构说明 三.Mnist数据集及数据格式转换 四.BP神经网络相关知识 4.1 ...
- 基于BP神经网络的手写数字识别
基于BP神经网络的手写数字识别 摘要 本文实现了基于MATLAB关于神经网络的手写数字识别算法的设计过程,采用神经网络中反向传播神经网络(即BP神经网络)对手写数字的识别,由MATLAB对图片进行读入 ...
- 卷积神经网络CNN 手写数字识别
1. 知识点准备 在了解 CNN 网络神经之前有两个概念要理解,第一是二维图像上卷积的概念,第二是 pooling 的概念. a. 卷积 关于卷积的概念和细节可以参考这里,卷积运算有两个非常重要特性, ...
- keras从入门到放弃(十三)卷积神经网络处理手写数字识别
今天来一个cnn例子 手写数字识别,因为是图像数据 import keras from keras import layers import numpy as np import matplotlib ...
- 【手写数字识别】基于Lenet网络实现手写数字识别附matlab代码
1 内容介绍 当今社会,人工智能得到快速发展,而模式识 别作为人工智能的一个重要应用领域也得到了飞 速发展,它利用计算机通过计算的方法根据样本的 特征对样本进行分类,其中的光学字符识别技术受 到广大研 ...
最新文章
- 汇编语言中寻址方式[bx + idata]
- DivCSS网页布局中CSS无效的十个常见原因
- 在线绘图|差异分析——在线做时序分析
- Javascript函数执行、new机制以及继承
- 使用Silverlight for Embedded开发绚丽的界面(2)
- request url换成ip地址_【协议粗讲】TTP协议之URL,不能不知道的协议技术点
- 近期总结:generator-web,前端自动化构建的解决方案
- 如何在Android主屏幕上添加热点快捷方式
- python 中文编码差异_Python 编码为什么那么蛋疼?
- 机器学习06神经网络--学习
- linux u盘 uid pid,linux下的pid文件的作用
- 现在程序员的工资是不是被高估了?不存在的!
- Shell Curses 函数库
- Flink查询关联Hbase输出
- oracle数据比对md5,MD5SUM的妙用
- python通达信自动交易_【其他】通达信程序化交易新发现,通达信dll下单
- 前端页面预览word_前端实现在线预览文档
- 微信公众号点击列表进入详情页
- linux添加有效群组,linux基础命令--groupadd 创建新的群组
- java中与时间有关的转换和校验