图片分类器

1 数据
- （1）数据集介绍与导入
- （2）transforms.Compose与transforms.Normalize
- （3）图片预览
- - a 反标准化
  - b 转化为plt.imshow能读取的尺寸
  - c 合并显示
2 定义卷积神经网络、损失函数和优化器
- （1）搭建神经网络
- （2）优化器和损失函数
3 模型的训练与保存
4 模型的测试
总结

1 数据

（1）数据集介绍与导入

对于视觉任务，可以通过torchvision模块导入公共数据集和其上的操作，这里我们以torchvision.datasets.CIFAR10作为数据集，训练图片分类器。
CIFAR10中的图像大小为3x32x32，即32x32像素的3通道彩色图像，数据集有10个类别，可以看下面的示意图

我们现在把数据集导入进来

import torch
import torchvision
import torchvision.transforms as transforms
import os
os.environ['KMP_DUPLICATE_LIB_OK'] = 'TRUE'    # 没有这句会报错，具体原因我也不知道transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])batch_size = 4
torch.manual_seed(1)  # 设置随机数种子
trainset = torchvision.datasets.CIFAR10(root='../data', train=True,download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,shuffle=True, num_workers=2)testset = torchvision.datasets.CIFAR10(root='../data', train=False,download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,shuffle=False, num_workers=2)classes = ('plane', 'car', 'bird', 'cat','deer', 'dog', 'frog', 'horse', 'ship', 'truck')

（2）transforms.Compose与transforms.Normalize

torchvision.transforms封装了对torchvision.datasets中数据集的操作，而torchvision.transforms.Compose类则是将多个操作串联起来，被串联的操作必须放到一个列表当中。
上面的程序中，transforms.Compose串联了transforms.ToTensor和transforms.Normalize，前者我们上节课讲过（1 数据—（2）数据预处理有介绍），这里说一下transforms.Normalize：
第一个(0.5, 0.5, 0.5)表示三个通道的“均值”（并非真实均值，这只是假设数据集三个通道的均值是这个样子），第二个(0.5, 0.5, 0.5)表示三个通道的“方差”（同样是假设的），假如原来的均值和方差都是(0.5, 0.5, 0.5)的话，经过之后，数据集各个通道上的分布将变成和标准正态分布有相同均值和方差，这就是Normalize的名称由来

input[channel] = (input[channel] - mean[channel]) / std[channel]

上面的程序中，数据图片像素值在经过ToTensor之后，已经被归一化为[0, 1]，区间左端点（0-0.5）/0.5=-1，区间右端点（1-0.5）/0.5=1，因此经过transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))变换之后，分布区间为[-1, 1]。

在pytorch的官方教程里，经常看到

normalize = T.Normalize(mean = [0.485, 0.456, 0.406], std = [0.229, 0.224, 0.225])

这里的mean和std，是从 ImageNet 数据集的数百万张图片中随机抽样计算得到的。

（3）图片预览

可以取几张经过转换后的图像看看

import matplotlib.pyplot as plt
import numpy as np# functions to show an imagedef imshow(img):img = img / 2 + 0.5     # unnormalizenpimg = img.numpy()       plt.imshow(np.transpose(npimg, (1, 2, 0)))plt.show()# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(batch_size)))

输出

可以看到，经过transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))变换之后，图片已经变得有些模糊

现在来解释一下上面的程序
首先看看自定义的imshow函数，它的内部调用了plt.imshow()，将张量显示成图片

a 反标准化

函数中的第一条语句是

img = img / 2 + 0.5     # unnormalize

这个功能是将数据还原，因为在导入数据的时候，使用了transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))，张量中的元素经过img = img / 2 + 0.5计算后，能够获得原始数据。

b 转化为plt.imshow能读取的尺寸

np.transpose(npimg, (1, 2, 0))的功能是将维度进行调换，用法如下：
假设a为

可以看到，np.transpose(a, (1, 2, 0))其实就是进行了下面的操作

b[k, i, j] = a[i, j, k]

为何不用torch.transpose？
因为torch.transpose只能从张量中挑选两个维度进行转置，它的参数列表是(Tensor input, int dim0, int dim1)，无法实现三个维度的重新排列，numpy的transpose可以实现多个维度的重新排列

c 合并显示

torchvision.utils.make_grid(images)是将若干张图片拼成一张，因为batch_size = 4，所以images包含了四张图片，将其拼成一张，具体的用法详可以看看这篇博客（博客很短，只有几十个字）：https://blog.csdn.net/zouxiaolv/article/details/105034512

2 定义卷积神经网络、损失函数和优化器

（1）搭建神经网络

这里我们使用下图所示的CNN（即LeNet-5，网络虽小，但各模块齐全）来进行识别

因为我们的数据集图片是彩色的，因此需要将LeNet-5的第一个卷积层改一下，改成通道是3，下面是我们要实现的结构

开始搭建CNN模型

import torch.nn as nn
import torch.nn.functional as Fclass Net(nn.Module):def __init__(self):super().__init__()self.conv1 = nn.Conv2d(3, 6, 5) # 3表示输入数据的通道, 6 表示输出的通道, 5表示卷积核的宽度self.pool = nn.MaxPool2d(2, 2)  # 池化窗口使用(2, 2) self.conv2 = nn.Conv2d(6, 16, 5)self.fc1 = nn.Linear(16 * 5 * 5, 120) # 定义全连接层self.fc2 = nn.Linear(120, 84)self.fc3 = nn.Linear(84, 10)def forward(self, x):x = self.pool(F.relu(self.conv1(x))) # 一口气完成卷积、激活、池化x = self.pool(F.relu(self.conv2(x)))x = torch.flatten(x, 1) # flatten all dimensions except batchx = F.relu(self.fc1(x))x = F.relu(self.fc2(x))x = self.fc3(x)return xnet = Net()

这里解释一下nn.Conv2d(3, 6, 5)和nn.MaxPool2d(2, 2) ：
nn.Conv2d(3, 6, 5)表示卷积层，第一个参数表示输入数据的通道，第二个参数表示输出数据的通道，第三个参数表示卷积核的尺寸为5×5，尺寸可以为单个整数，也可以为一个元组，stride和padding采用默认值，即stride=1, padding=0。
如果某个卷积层的输入有3个通道，输出有5个通道，卷积核大小为3×5，stride=(2, 1)，表示两个方向的步长，padding=(1, 2)，可以这样定义：

# non-square kernels and unequal stride and with padding
m = nn.Conv2d(3, 5, (3, 5), stride=(2, 1), padding=(1, 2))

nn.Conv2d的详细API可以看这个：
https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html?highlight=nn%20conv2d#torch.nn.Conv2d

nn.MaxPool2d(2, 2)表示池化层，第一个参数表示池化窗口为2×2（窗口尺寸参数可以是单个整数，也可以是一个元组），第二个参数表示stride=2，stride没有默认值，必须指定，padding的默认值为0。
假如池化窗口为3×4，stride为(2, 1)，padding为(1, 2)，可以这样定义

m = nn.MaxPool2d((3, 4), stride=(2, 1), padding=(1, 2))

nn.MaxPool2d的详细API可以看这个：
https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html?highlight=nn%20maxpool2d#torch.nn.MaxPool2d

（2）优化器和损失函数

这里我们用带动量的随机梯度下降，损失函数使用多分类交叉熵

import torch.optim as optimoptimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
criterion = nn.CrossEntropyLoss()

3 模型的训练与保存

这里我们使用2个epoch

for epoch in range(2):  # loop over the dataset multiple timesrunning_loss = 0.0for i, data in enumerate(trainloader, 0):# get the inputs; data is a list of [inputs, labels]inputs, labels = data# zero the parameter gradientsoptimizer.zero_grad()# forward + backward + optimizeoutputs = net(inputs)loss = criterion(outputs, labels)loss.backward()optimizer.step()# print statisticsrunning_loss += loss.item()if i % 2000 == 1999:    # print every 2000 mini-batchesprint('[%d, %5d] loss: %.3f' %(epoch + 1, i + 1, running_loss / 2000))running_loss = 0.0print('Finished Training')

训练结束后，将模型保存

PATH = './cifar_net.pth'
torch.save(net.state_dict(), PATH)

4 模型的测试

我们可以先导入一个batch的样本来做一下测试

dataiter = iter(testloader)
images, labels = dataiter.next()# print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))

输出

也可以使用导入的模型做测试

net = Net()
net.load_state_dict(torch.load(PATH))
outputs = net(images)
_, predicted = torch.max(outputs, 1)print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]for j in range(4)))

输出

Predicted:    cat   car   car plane

让我们看看模型在整个测试数据集上的效果如何

correct = 0
total = 0
# since we're not training, we don't need to calculate the gradients for our outputs
with torch.no_grad():for data in testloader:images, labels = data# calculate outputs by running images through the networkoutputs = net(images)# the class with the highest energy is what we choose as prediction_, predicted = torch.max(outputs.data, 1)total += labels.size(0)correct += (predicted == labels).sum().item()print('Accuracy of the network on the 10000 test images: %d %%' % (100 * correct / total))

输出

Accuracy of the network on the 10000 test images: 54 %

因为是10个类别，随机猜的话，准确率是10%，但我们的模型能达到54%，应该是学到了些东西。

这里要介绍一下torch.max函数的使用，否则上面可能会有地方不懂，这里我们定义一个三维张量pred

如果我们没有指定维度，那么torch.max(a)求的是整个张量的最大值，返回的是一个张量

如果指定了维度，则返回的是一个命名元组(values, indices)，其中values是给定维度中每一行（列）的最大值，indices是找到的每个最大值在指定维度中的索引位置(argmax)

在pytorch教程中，很多函数都有一个keepdim参数，这个torch.max也不例外，我们来看一下

keepdim的意思是保持维度，例子中，pred原来是三维的张量，那么它的max_value和argmax都是三维的

回到我们的程序，程序中有一条语句“_, predicted = torch.max(outputs.data, 1)”，经赋值后，predicted得到了每个样本输出值的argmax，即为模型各个样本的预测类别序号。

回到模型，我们看看模型在哪些类别上的预测效果比较好

# prepare to count predictions for each class
correct_pred = {classname: 0 for classname in classes}
# 字典生成器，生成之后每个键对应的值都是0
total_pred = {classname: 0 for classname in classes}# again no gradients needed
with torch.no_grad():for data in testloader:images, labels = dataoutputs = net(images)_, predictions = torch.max(outputs, 1)# collect the correct predictions for each classfor label, prediction in zip(labels, predictions):if label == prediction:correct_pred[classes[label]] += 1total_pred[classes[label]] += 1# print accuracy for each class
for classname, correct_count in correct_pred.items():accuracy = 100 * float(correct_count) / total_pred[classname]print("Accuracy for class {:5s} is: {:.1f} %".format(classname,accuracy))

输出

Accuracy for class plane is: 56.7 %
Accuracy for class car   is: 73.6 %
Accuracy for class bird  is: 22.5 %
Accuracy for class cat   is: 39.8 %
Accuracy for class deer  is: 58.7 %
Accuracy for class dog   is: 39.2 %
Accuracy for class frog  is: 64.6 %
Accuracy for class horse is: 56.3 %
Accuracy for class ship  is: 62.6 %
Accuracy for class truck is: 74.0 %

当然，我们也可以把训练和测试过程放到GPU上，这个可以参考上一篇博客。

总结

我们已经把所有程序都跑了一遍，中间穿插了对程序的讲解，我们这里去掉一些讲解用的程序，把整个流程串起来，程序如下：

# coding=utf-8
import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import os
os.environ['KMP_DUPLICATE_LIB_OK'] = 'TRUE'    # 没有这句会报错，具体原因我也不知道"""数据的准备与转换（预处理）"""
# 封装转换过程
transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])# 指定批大小和随机数种子
batch_size = 4
torch.manual_seed(1)  # 设置随机数种子# 导入数据
trainset = torchvision.datasets.CIFAR10(root='../data', train=True,download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,shuffle=True, num_workers=2)testset = torchvision.datasets.CIFAR10(root='../data', train=False,download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,shuffle=False, num_workers=2)"""建立模型"""
# 搭建神经网络（定义类）
class Net(nn.Module):def __init__(self):super().__init__()self.conv1 = nn.Conv2d(3, 6, 5) # 3表示输入数据的通道, 6 表示输出的通道, 5表示卷积核的宽度self.pool = nn.MaxPool2d(2, 2)  # 池化窗口使用(2, 2)self.conv2 = nn.Conv2d(6, 16, 5)self.fc1 = nn.Linear(16 * 5 * 5, 120) # 定义全连接层self.fc2 = nn.Linear(120, 84)self.fc3 = nn.Linear(84, 10)def forward(self, x):x = self.pool(F.relu(self.conv1(x))) # 一口气完成卷积、激活、池化x = self.pool(F.relu(self.conv2(x)))x = torch.flatten(x, 1) # flatten all dimensions except batchx = F.relu(self.fc1(x))x = F.relu(self.fc2(x))x = self.fc3(x)return x# 实例化神经网络模型
net = Net()"""训练并保存模型"""
# 指定优化器和损失函数
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
criterion = nn.CrossEntropyLoss()# 训练
for epoch in range(2):  # loop over the dataset multiple timesrunning_loss = 0.0for i, data in enumerate(trainloader, 0):# get the inputs; data is a list of [inputs, labels]inputs, labels = data# zero the parameter gradientsoptimizer.zero_grad()# forward + backward + optimizeoutputs = net(inputs)loss = criterion(outputs, labels)loss.backward()optimizer.step()# print statisticsrunning_loss += loss.item()if i % 2000 == 1999:    # print every 2000 mini-batchesprint('[%d, %5d] loss: %.3f' %(epoch + 1, i + 1, running_loss / 2000))running_loss = 0.0print('Finished Training')# 保存模型
PATH = './cifar_net.pth'
torch.save(net.state_dict(), PATH)"""测试模型"""
# 看看训练的模型在整个测试集上的变现情况
correct = 0
total = 0
# since we're not training, we don't need to calculate the gradients for our outputs
with torch.no_grad():for data in testloader:images, labels = data# calculate outputs by running images through the networkoutputs = net(images)# the class with the highest energy is what we choose as prediction_, predicted = torch.max(outputs.data, 1)total += labels.size(0)correct += (predicted == labels).sum().item()print('Accuracy of the network on the 10000 test images: %d %%' % (100 * correct / total))# 看看模型在各个类别上的准确率# 类别标签
classes = ('plane', 'car', 'bird', 'cat','deer', 'dog', 'frog', 'horse', 'ship', 'truck')# 字典生成器，生成之后每个键对应的值都是0
correct_pred = {classname: 0 for classname in classes}
total_pred = {classname: 0 for classname in classes}# again no gradients needed
with torch.no_grad():for data in testloader:images, labels = dataoutputs = net(images)_, predictions = torch.max(outputs, 1)# collect the correct predictions for each classfor label, prediction in zip(labels, predictions):if label == prediction:correct_pred[classes[label]] += 1total_pred[classes[label]] += 1# print accuracy for each class
for classname, correct_count in correct_pred.items():accuracy = 100 * float(correct_count) / total_pred[classname]print("Accuracy for class {:5s} is: {:.1f} %".format(classname,accuracy))

pytorch第06天图片分类器相关推荐

【入门】Pytorch实现简单的图片分类器
系列文章目录 [入门]Pytorch实现简单的图片分类器 [入门]GPU训练图片分类器文章目录系列文章目录前言导入库数据归一化查看训练集构造网络定义损失函数和优化器开始训练查看分类 ...
快速入门PyTorch(3)--训练一个图片分类器和多 GPUs 训练
2019 第 44 篇,总第 68 篇文章本文大约14000字,建议收藏阅读快速入门 PyTorch 教程前两篇文章: 快速入门Pytorch(1)--安装.张量以及梯度快速入门PyTorch( ...
turicreate 视频_人工智能图片分类器：turicreate在Windows环境下简明使用教程
近几天笔者深入学习了下机器学习.深度学习,不论是谷歌围棋AIAlphaGo.还是目前使用的阿里云智能语音合成,都非常吸引人.连续多天的理论学习后,总体而言,绝大多数教程都围绕数学算法展开,而实际上我们 ...
lda进行图片分类_基于SIFT+Kmeans+LDA的图片分类器的实现
题记:2012年4月1日回到家,南大计算机研究僧复试以后,等待着的就是独坐家中无聊的潇洒.不知哪日,无意中和未来的同学潘潘聊到了图像处理,聊到了她的论文<基于LDA的行人检测>,出于有一年 ...
pytorch 实现张量tensor,图片,CPU,GPU,数组等的转换
pytorch 实现张量tensor,图片,CPU,GPU,数组等的转换 1, 创建pytorch 的Tensor张量: torch.rand((3,224,224)) #创建随机值的三维张量,大小为 ...
python 图片自动分类机_用tensorflow神经网络实现一个简易的图片分类器
文章写的不清晰请大家原谅QAQ 这篇文章我们将用 CIFAR-10数据集做一个很简易的图片分类器. 在 CIFAR-10数据集包含了60,000张图片.在此数据集中,有10个不同的类别,每个类别中有6 ...
使用Tensorflow构建属于自己的图片分类器
近几年火热的AI领域吸引了众多有志之士加入,在一段时间的学习之后,不知道你是否有一个疑惑:我能够用AI来做点什么呢? 就拿AI最常见的应用而言,人脸识别已经相当成熟,由巨头把持,围棋AI也有好几个开源 ...
python图片分类器_使用 Tensorflow 构建属于自己的图片分类器
近几年火热的AI领域吸引了众多有志之士加入,在一段时间的学习之后,不知道你是否有一个疑惑:我能够用AI来做点什么呢? 就拿AI最常见的应用而言,人脸识别已经相当成熟,由巨头把持,围棋AI也有好几个开源 ...
图片分类器部署到Core ML
这篇文章是翻译官方IOS TuriCreate的图片分类器的使用. 原文链接:部署到Core ML 部署到Core ML 借助Core ML框架,您可以使用机器学习模型对输入数据进行分类. 可以使用e ...

pytorch第06天图片分类器

图片分类器

1 数据

（1）数据集介绍与导入

（2）transforms.Compose与transforms.Normalize

（3）图片预览

a 反标准化

b 转化为plt.imshow能读取的尺寸

c 合并显示

2 定义卷积神经网络、损失函数和优化器

（1）搭建神经网络

（2）优化器和损失函数

3 模型的训练与保存

4 模型的测试

总结

pytorch第06天图片分类器相关推荐

最新文章

热门文章

pytorch第06天 图片分类器

图片分类器

1 数据

（1）数据集介绍与导入

（2）transforms.Compose与transforms.Normalize

（3）图片预览

a 反标准化

b 转化为plt.imshow能读取的尺寸

c 合并显示

2 定义卷积神经网络、损失函数和优化器

（1）搭建神经网络

（2）优化器和损失函数

3 模型的训练与保存

4 模型的测试

总结

pytorch第06天 图片分类器相关推荐

最新文章

热门文章

pytorch第06天图片分类器

pytorch第06天图片分类器相关推荐